speech temporal structure: Topics by Science.gov

Sample records for speech temporal structure

Seeking Temporal Predictability in Speech: Comparing Statistical Approaches on 18 World Languages.

PubMed

Jadoul, Yannick; Ravignani, Andrea; Thompson, Bill; Filippi, Piera; de Boer, Bart

2016-01-01

Temporal regularities in speech, such as interdependencies in the timing of speech events, are thought to scaffold early acquisition of the building blocks in speech. By providing on-line clues to the location and duration of upcoming syllables, temporal structure may aid segmentation and clustering of continuous speech into separable units. This hypothesis tacitly assumes that learners exploit predictability in the temporal structure of speech. Existing measures of speech timing tend to focus on first-order regularities among adjacent units, and are overly sensitive to idiosyncrasies in the data they describe. Here, we compare several statistical methods on a sample of 18 languages, testing whether syllable occurrence is predictable over time. Rather than looking for differences between languages, we aim to find across languages (using clearly defined acoustic, rather than orthographic, measures), temporal predictability in the speech signal which could be exploited by a language learner. First, we analyse distributional regularities using two novel techniques: a Bayesian ideal learner analysis, and a simple distributional measure. Second, we model higher-order temporal structure-regularities arising in an ordered series of syllable timings-testing the hypothesis that non-adjacent temporal structures may explain the gap between subjectively-perceived temporal regularities, and the absence of universally-accepted lower-order objective measures. Together, our analyses provide limited evidence for predictability at different time scales, though higher-order predictability is difficult to reliably infer. We conclude that temporal predictability in speech may well arise from a combination of individually weak perceptual cues at multiple structural levels, but is challenging to pinpoint.
Seeking Temporal Predictability in Speech: Comparing Statistical Approaches on 18 World Languages

PubMed Central

Jadoul, Yannick; Ravignani, Andrea; Thompson, Bill; Filippi, Piera; de Boer, Bart

2016-01-01

Temporal regularities in speech, such as interdependencies in the timing of speech events, are thought to scaffold early acquisition of the building blocks in speech. By providing on-line clues to the location and duration of upcoming syllables, temporal structure may aid segmentation and clustering of continuous speech into separable units. This hypothesis tacitly assumes that learners exploit predictability in the temporal structure of speech. Existing measures of speech timing tend to focus on first-order regularities among adjacent units, and are overly sensitive to idiosyncrasies in the data they describe. Here, we compare several statistical methods on a sample of 18 languages, testing whether syllable occurrence is predictable over time. Rather than looking for differences between languages, we aim to find across languages (using clearly defined acoustic, rather than orthographic, measures), temporal predictability in the speech signal which could be exploited by a language learner. First, we analyse distributional regularities using two novel techniques: a Bayesian ideal learner analysis, and a simple distributional measure. Second, we model higher-order temporal structure—regularities arising in an ordered series of syllable timings—testing the hypothesis that non-adjacent temporal structures may explain the gap between subjectively-perceived temporal regularities, and the absence of universally-accepted lower-order objective measures. Together, our analyses provide limited evidence for predictability at different time scales, though higher-order predictability is difficult to reliably infer. We conclude that temporal predictability in speech may well arise from a combination of individually weak perceptual cues at multiple structural levels, but is challenging to pinpoint. PMID:27994544
Hierarchical organization in the temporal structure of infant-direct speech and song.

PubMed

Falk, Simone; Kello, Christopher T

2017-06-01

Caregivers alter the temporal structure of their utterances when talking and singing to infants compared with adult communication. The present study tested whether temporal variability in infant-directed registers serves to emphasize the hierarchical temporal structure of speech. Fifteen German-speaking mothers sang a play song and told a story to their 6-months-old infants, or to an adult. Recordings were analyzed using a recently developed method that determines the degree of nested clustering of temporal events in speech. Events were defined as peaks in the amplitude envelope, and clusters of various sizes related to periods of acoustic speech energy at varying timescales. Infant-directed speech and song clearly showed greater event clustering compared with adult-directed registers, at multiple timescales of hundreds of milliseconds to tens of seconds. We discuss the relation of this newly discovered acoustic property to temporal variability in linguistic units and its potential implications for parent-infant communication and infants learning the hierarchical structures of speech and language. Copyright © 2017 Elsevier B.V. All rights reserved.
[The role of temporal fine structure in tone recognition and music perception].

PubMed

Zhou, Q; Gu, X; Liu, B

2017-11-07

The sound signal can be decomposed into temporal envelope and temporal fine structure information. The temporal envelope information is crucial for speech perception in quiet environment, and the temporal fine structure information plays an important role in speech perception in noise, Mandarin tone recognition and music perception, especially the pitch and melody perception.
Musical intervention enhances infants’ neural processing of temporal structure in music and speech

PubMed Central

Zhao, T. Christina; Kuhl, Patricia K.

2016-01-01

Individuals with music training in early childhood show enhanced processing of musical sounds, an effect that generalizes to speech processing. However, the conclusions drawn from previous studies are limited due to the possible confounds of predisposition and other factors affecting musicians and nonmusicians. We used a randomized design to test the effects of a laboratory-controlled music intervention on young infants’ neural processing of music and speech. Nine-month-old infants were randomly assigned to music (intervention) or play (control) activities for 12 sessions. The intervention targeted temporal structure learning using triple meter in music (e.g., waltz), which is difficult for infants, and it incorporated key characteristics of typical infant music classes to maximize learning (e.g., multimodal, social, and repetitive experiences). Controls had similar multimodal, social, repetitive play, but without music. Upon completion, infants’ neural processing of temporal structure was tested in both music (tones in triple meter) and speech (foreign syllable structure). Infants’ neural processing was quantified by the mismatch response (MMR) measured with a traditional oddball paradigm using magnetoencephalography (MEG). The intervention group exhibited significantly larger MMRs in response to music temporal structure violations in both auditory and prefrontal cortical regions. Identical results were obtained for temporal structure changes in speech. The intervention thus enhanced temporal structure processing not only in music, but also in speech, at 9 mo of age. We argue that the intervention enhanced infants’ ability to extract temporal structure information and to predict future events in time, a skill affecting both music and speech processing. PMID:27114512
Musical intervention enhances infants' neural processing of temporal structure in music and speech.

PubMed

Zhao, T Christina; Kuhl, Patricia K

2016-05-10

Individuals with music training in early childhood show enhanced processing of musical sounds, an effect that generalizes to speech processing. However, the conclusions drawn from previous studies are limited due to the possible confounds of predisposition and other factors affecting musicians and nonmusicians. We used a randomized design to test the effects of a laboratory-controlled music intervention on young infants' neural processing of music and speech. Nine-month-old infants were randomly assigned to music (intervention) or play (control) activities for 12 sessions. The intervention targeted temporal structure learning using triple meter in music (e.g., waltz), which is difficult for infants, and it incorporated key characteristics of typical infant music classes to maximize learning (e.g., multimodal, social, and repetitive experiences). Controls had similar multimodal, social, repetitive play, but without music. Upon completion, infants' neural processing of temporal structure was tested in both music (tones in triple meter) and speech (foreign syllable structure). Infants' neural processing was quantified by the mismatch response (MMR) measured with a traditional oddball paradigm using magnetoencephalography (MEG). The intervention group exhibited significantly larger MMRs in response to music temporal structure violations in both auditory and prefrontal cortical regions. Identical results were obtained for temporal structure changes in speech. The intervention thus enhanced temporal structure processing not only in music, but also in speech, at 9 mo of age. We argue that the intervention enhanced infants' ability to extract temporal structure information and to predict future events in time, a skill affecting both music and speech processing.
The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure

PubMed Central

Stacey, Paula C.; Kitterick, Pádraig T.; Morris, Saffron D.; Sumner, Christian J.

2017-01-01

Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker's voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues. PMID:27085797
Temporal Context in Speech Processing and Attentional Stream Selection: A Behavioral and Neural perspective

PubMed Central

Zion Golumbic, Elana M.; Poeppel, David; Schroeder, Charles E.

2012-01-01

The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the ‘Cocktail Party’ effect. Yet, the neural mechanisms underlying on-line speech decoding and attentional stream selection are not well understood. We review findings from behavioral and neurophysiological investigations that underscore the importance of the temporal structure of speech for achieving these perceptual feats. We discuss the hypothesis that entrainment of ambient neuronal oscillations to speech’s temporal structure, across multiple time-scales, serves to facilitate its decoding and underlies the selection of an attended speech stream over other competing input. In this regard, speech decoding and attentional stream selection are examples of ‘active sensing’, emphasizing an interaction between proactive and predictive top-down modulation of neuronal dynamics and bottom-up sensory input. PMID:22285024
Prosodic Structure Shapes the Temporal Realization of Intonation and Manual Gesture Movements

ERIC Educational Resources Information Center

Esteve-Gibert, Nuria; Prieto, Pilar

2013-01-01

Purpose: Previous work on the temporal coordination between gesture and speech found that the prominence in gesture coordinates with speech prominence. In this study, the authors investigated the anchoring regions in speech and pointing gesture that align with each other. The authors hypothesized that (a) in contrastive focus conditions, the…
Temporal event structure and timing in schizophrenia: preserved binding in a longer "now".

PubMed

Martin, Brice; Giersch, Anne; Huron, Caroline; van Wassenhove, Virginie

2013-01-01

Patients with schizophrenia experience a loss of temporal continuity or subjective fragmentation along the temporal dimension. Here, we develop the hypothesis that impaired temporal awareness results from a perturbed structuring of events in time-i.e., canonical neural dynamics. To address this, 26 patients and their matched controls took part in two psychophysical studies using desynchronized audiovisual speech. Two tasks were used and compared: first, an identification task testing for multisensory binding impairments in which participants reported what they heard while looking at a speaker's face; in a second task, we tested the perceived simultaneity of the same audiovisual speech stimuli. In both tasks, we used McGurk fusion and combination that are classic ecologically valid multisensory illusions. First, and contrary to previous reports, our results show that patients do not significantly differ from controls in their rate of illusory reports. Second, the illusory reports of patients in the identification task were more sensitive to audiovisual speech desynchronies than those of controls. Third, and surprisingly, patients considered audiovisual speech to be synchronized for longer delays than controls. As such, the temporal tolerance profile observed in a temporal judgement task was less of a predictor for sensory binding in schizophrenia than for that obtained in controls. We interpret our results as an impairment of temporal event structuring in schizophrenia which does not specifically affect sensory binding operations but rather, the explicit access to timing information associated here with audiovisual speech processing. Our findings are discussed in the context of curent neurophysiological frameworks for the binding and the structuring of sensory events in time. Copyright © 2012 Elsevier Ltd. All rights reserved.
Study of Discussion Record Analysis Using Temporal Data Crystallization and Its Application to TV Scene Analysis

DTIC Science & Technology

2015-03-31

analysis. For scene analysis, we use Temporal Data Crystallization (TDC), and for logical analysis, we use Speech Act theory and Toulmin Argumentation...utterance in the discussion record. (i) An utterance ID, and a speaker ID (ii) Speech acts (iii) Argument structure Speech act denotes...mediator is expected to use more OQs than CQs. When the speech act of an utterance is an argument, furthermore, we recognize the conclusion part
Masking effects of speech and music: does the masker's hierarchical structure matter?

PubMed

Shi, Lu-Feng; Law, Yvonne

2010-04-01

Speech and music are time-varying signals organized by parallel hierarchical rules. Through a series of four experiments, this study compared the masking effects of single-talker speech and instrumental music on speech perception while manipulating the complexity of hierarchical and temporal structures of the maskers. Listeners' word recognition was found to be similar between hierarchically intact and disrupted speech or classical music maskers (Experiment 1). When sentences served as the signal, significantly greater masking effects were observed with disrupted than intact speech or classical music maskers (Experiment 2), although not with jazz or serial music maskers, which differed from the classical music masker in their hierarchical structures (Experiment 3). Removing the classical music masker's temporal dynamics or partially restoring it affected listeners' sentence recognition; yet, differences in performance between intact and disrupted maskers remained robust (Experiment 4). Hence, the effect of structural expectancy was largely present across maskers when comparing them before and after their hierarchical structure was purposefully disrupted. This effect seemed to lend support to the auditory stream segregation theory.
Cortical Tracking of Global and Local Variations of Speech Rhythm during Connected Natural Speech Perception.

PubMed

Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

2018-06-19

During natural speech perception, listeners must track the global speaking rate, that is, the overall rate of incoming linguistic information, as well as transient, local speaking rate variations occurring within the global speaking rate. Here, we address the hypothesis that this tracking mechanism is achieved through coupling of cortical signals to the amplitude envelope of the perceived acoustic speech signals. Cortical signals were recorded with magnetoencephalography (MEG) while participants perceived spontaneously produced speech stimuli at three global speaking rates (slow, normal/habitual, and fast). Inherently to spontaneously produced speech, these stimuli also featured local variations in speaking rate. The coupling between cortical and acoustic speech signals was evaluated using audio-MEG coherence. Modulations in audio-MEG coherence spatially differentiated between tracking of global speaking rate, highlighting the temporal cortex bilaterally and the right parietal cortex, and sensitivity to local speaking rate variations, emphasizing the left parietal cortex. Cortical tuning to the temporal structure of natural connected speech thus seems to require the joint contribution of both auditory and parietal regions. These findings suggest that cortical tuning to speech rhythm operates on two functionally distinct levels: one encoding the global rhythmic structure of speech and the other associated with online, rapidly evolving temporal predictions. Thus, it may be proposed that speech perception is shaped by evolutionary tuning, a preference for certain speaking rates, and predictive tuning, associated with cortical tracking of the constantly changing rate of linguistic information in a speech stream.
Revisiting place and temporal theories of pitch

PubMed Central

2014-01-01

The nature of pitch and its neural coding have been studied for over a century. A popular debate has revolved around the question of whether pitch is coded via “place” cues in the cochlea, or via timing cues in the auditory nerve. In the most recent incarnation of this debate, the role of temporal fine structure has been emphasized in conveying important pitch and speech information, particularly because the lack of temporal fine structure coding in cochlear implants might explain some of the difficulties faced by cochlear implant users in perceiving music and pitch contours in speech. In addition, some studies have postulated that hearing-impaired listeners may have a specific deficit related to processing temporal fine structure. This article reviews some of the recent literature surrounding the debate, and argues that much of the recent evidence suggesting the importance of temporal fine structure processing can also be accounted for using spectral (place) or temporal-envelope cues. PMID:25364292
Neural pathways for visual speech perception

PubMed Central

Bernstein, Lynne E.; Liebenthal, Einat

2014-01-01

This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611
The Temporal Organization of Syllabic Structure

ERIC Educational Resources Information Center

Shaw, Jason A.

2010-01-01

This dissertation develops analytical tools which enable rigorous evaluation of competing syllabic parses on the basis of temporal patterns in speech production data. The data come from the articulographic tracking of fleshpoints on target speech organs, e.g., tongue, lips, jaw, in experiments with native speakers of American English and Moroccan…
Temporal modulations in speech and music.

PubMed

Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David

2017-10-01

Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hierarchical temporal structure in music, speech and animal vocalizations: jazz is like a conversation, humpbacks sing like hermit thrushes.

PubMed

Kello, Christopher T; Bella, Simone Dalla; Médé, Butovens; Balasubramaniam, Ramesh

2017-10-01

Humans talk, sing and play music. Some species of birds and whales sing long and complex songs. All these behaviours and sounds exhibit hierarchical structure-syllables and notes are positioned within words and musical phrases, words and motives in sentences and musical phrases, and so on. We developed a new method to measure and compare hierarchical temporal structures in speech, song and music. The method identifies temporal events as peaks in the sound amplitude envelope, and quantifies event clustering across a range of timescales using Allan factor (AF) variance. AF variances were analysed and compared for over 200 different recordings from more than 16 different categories of signals, including recordings of speech in different contexts and languages, musical compositions and performances from different genres. Non-human vocalizations from two bird species and two types of marine mammals were also analysed for comparison. The resulting patterns of AF variance across timescales were distinct to each of four natural categories of complex sound: speech, popular music, classical music and complex animal vocalizations. Comparisons within and across categories indicated that nested clustering in longer timescales was more prominent when prosodic variation was greater, and when sounds came from interactions among individuals, including interactions between speakers, musicians, and even killer whales. Nested clustering also was more prominent for music compared with speech, and reflected beat structure for popular music and self-similarity across timescales for classical music. In summary, hierarchical temporal structures reflect the behavioural and social processes underlying complex vocalizations and musical performances. © 2017 The Author(s).
Spatial hearing benefits demonstrated with presentation of acoustic temporal fine structure cues in bilateral cochlear implant listeners.

PubMed

Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Litovsky, Ruth Y

2014-09-01

Most contemporary cochlear implant (CI) processing strategies discard acoustic temporal fine structure (TFS) information, and this may contribute to the observed deficits in bilateral CI listeners' ability to localize sounds when compared to normal hearing listeners. Additionally, for best speech envelope representation, most contemporary speech processing strategies use high-rate carriers (≥900 Hz) that exceed the limit for interaural pulse timing to provide useful binaural information. Many bilateral CI listeners are sensitive to interaural time differences (ITDs) in low-rate (<300 Hz) constant-amplitude pulse trains. This study explored the trade-off between superior speech temporal envelope representation with high-rate carriers and binaural pulse timing sensitivity with low-rate carriers. The effects of carrier pulse rate and pulse timing on ITD discrimination, ITD lateralization, and speech recognition in quiet were examined in eight bilateral CI listeners. Stimuli consisted of speech tokens processed at different electrical stimulation rates, and pulse timings that either preserved or did not preserve acoustic TFS cues. Results showed that CI listeners were able to use low-rate pulse timing cues derived from acoustic TFS when presented redundantly on multiple electrodes for ITD discrimination and lateralization of speech stimuli.
The role of temporal structure in the investigation of sensory memory, auditory scene analysis, and speech perception: a healthy-aging perspective.

PubMed

Rimmele, Johanna Maria; Sussman, Elyse; Poeppel, David

2015-02-01

Listening situations with multiple talkers or background noise are common in everyday communication and are particularly demanding for older adults. Here we review current research on auditory perception in aging individuals in order to gain insights into the challenges of listening under noisy conditions. Informationally rich temporal structure in auditory signals--over a range of time scales from milliseconds to seconds--renders temporal processing central to perception in the auditory domain. We discuss the role of temporal structure in auditory processing, in particular from a perspective relevant for hearing in background noise, and focusing on sensory memory, auditory scene analysis, and speech perception. Interestingly, these auditory processes, usually studied in an independent manner, show considerable overlap of processing time scales, even though each has its own 'privileged' temporal regimes. By integrating perspectives on temporal structure processing in these three areas of investigation, we aim to highlight similarities typically not recognized. Copyright © 2014 Elsevier B.V. All rights reserved.

The role of temporal structure in the investigation of sensory memory, auditory scene analysis, and speech perception: A healthy-aging perspective

PubMed Central

Rimmele, Johanna Maria; Sussman, Elyse; Poeppel, David

2014-01-01

Listening situations with multiple talkers or background noise are common in everyday communication and are particularly demanding for older adults. Here we review current research on auditory perception in aging individuals in order to gain insights into the challenges of listening under noisy conditions. Informationally rich temporal structure in auditory signals - over a range of time scales from milliseconds to seconds - renders temporal processing central to perception in the auditory domain. We discuss the role of temporal structure in auditory processing, in particular from a perspective relevant for hearing in background noise, and focusing on sensory memory, auditory scene analysis, and speech perception. Interestingly, these auditory processes, usually studied in an independent manner, show considerable overlap of processing time scales, even though each has its own ‚privileged‘ temporal regimes. By integrating perspectives on temporal structure processing in these three areas of investigation, we aim to highlight similarities typically not recognized. PMID:24956028
Phonological awareness predicts activation patterns for print and speech

PubMed Central

Frost, Stephen J.; Landi, Nicole; Mencl, W. Einar; Sandak, Rebecca; Fulbright, Robert K.; Tejada, Eleanor T.; Jacobsen, Leslie; Grigorenko, Elena L.; Constable, R. Todd; Pugh, Kenneth R.

2009-01-01

Using fMRI, we explored the relationship between phonological awareness (PA), a measure of metaphonological knowledge of the segmental structure of speech, and brain activation patterns during processing of print and speech in young readers from six to ten years of age. Behavioral measures of PA were positively correlated with activation levels for print relative to speech tokens in superior temporal and occipito-temporal regions. Differences between print-elicited activation levels in superior temporal and inferior frontal sites were also correlated with PA measures with the direction of the correlation depending on stimulus type: positive for pronounceable pseudowords and negative for consonant strings. These results support and extend the many indications in the behavioral and neurocognitive literature that PA is a major component of skill in beginning readers and point to a developmental trajectory by which written language engages areas originally shaped by speech for learners on the path toward successful literacy acquisition. PMID:19306061
Consonant identification in noise using Hilbert-transform temporal fine-structure speech and recovered-envelope speech for listeners with normal and impaired hearinga)

PubMed Central

Léger, Agnès C.; Reed, Charlotte M.; Desloge, Joseph G.; Swaminathan, Jayaganesh; Braida, Louis D.

2015-01-01

Consonant-identification ability was examined in normal-hearing (NH) and hearing-impaired (HI) listeners in the presence of steady-state and 10-Hz square-wave interrupted speech-shaped noise. The Hilbert transform was used to process speech stimuli (16 consonants in a-C-a syllables) to present envelope cues, temporal fine-structure (TFS) cues, or envelope cues recovered from TFS speech. The performance of the HI listeners was inferior to that of the NH listeners both in terms of lower levels of performance in the baseline condition and in the need for higher signal-to-noise ratio to yield a given level of performance. For NH listeners, scores were higher in interrupted noise than in steady-state noise for all speech types (indicating substantial masking release). For HI listeners, masking release was typically observed for TFS and recovered-envelope speech but not for unprocessed and envelope speech. For both groups of listeners, TFS and recovered-envelope speech yielded similar levels of performance and consonant confusion patterns. The masking release observed for TFS and recovered-envelope speech may be related to level effects associated with the manner in which the TFS processing interacts with the interrupted noise signal, rather than to the contributions of TFS cues per se. PMID:26233038
Neural Tuning to Low-Level Features of Speech throughout the Perisylvian Cortex.

PubMed

Berezutskaya, Julia; Freudenburg, Zachary V; Güçlü, Umut; van Gerven, Marcel A J; Ramsey, Nick F

2017-08-16

Despite a large body of research, we continue to lack a detailed account of how auditory processing of continuous speech unfolds in the human brain. Previous research showed the propagation of low-level acoustic features of speech from posterior superior temporal gyrus toward anterior superior temporal gyrus in the human brain (Hullett et al., 2016). In this study, we investigate what happens to these neural representations past the superior temporal gyrus and how they engage higher-level language processing areas such as inferior frontal gyrus. We used low-level sound features to model neural responses to speech outside of the primary auditory cortex. Two complementary imaging techniques were used with human participants (both males and females): electrocorticography (ECoG) and fMRI. Both imaging techniques showed tuning of the perisylvian cortex to low-level speech features. With ECoG, we found evidence of propagation of the temporal features of speech sounds along the ventral pathway of language processing in the brain toward inferior frontal gyrus. Increasingly coarse temporal features of speech spreading from posterior superior temporal cortex toward inferior frontal gyrus were associated with linguistic features such as voice onset time, duration of the formant transitions, and phoneme, syllable, and word boundaries. The present findings provide the groundwork for a comprehensive bottom-up account of speech comprehension in the human brain. SIGNIFICANCE STATEMENT We know that, during natural speech comprehension, a broad network of perisylvian cortical regions is involved in sound and language processing. Here, we investigated the tuning to low-level sound features within these regions using neural responses to a short feature film. We also looked at whether the tuning organization along these brain regions showed any parallel to the hierarchy of language structures in continuous speech. Our results show that low-level speech features propagate throughout the perisylvian cortex and potentially contribute to the emergence of "coarse" speech representations in inferior frontal gyrus typically associated with high-level language processing. These findings add to the previous work on auditory processing and underline a distinctive role of inferior frontal gyrus in natural speech comprehension. Copyright © 2017 the authors 0270-6474/17/377906-15$15.00/0.
Impaired extraction of speech rhythm from temporal modulation patterns in speech in developmental dyslexia

PubMed Central

Leong, Victoria; Goswami, Usha

2014-01-01

Dyslexia is associated with impaired neural representation of the sound structure of words (phonology). The “phonological deficit” in dyslexia may arise in part from impaired speech rhythm perception, thought to depend on neural oscillatory phase-locking to slow amplitude modulation (AM) patterns in the speech envelope. Speech contains AM patterns at multiple temporal rates, and these different AM rates are associated with phonological units of different grain sizes, e.g., related to stress, syllables or phonemes. Here, we assess the ability of adults with dyslexia to use speech AMs to identify rhythm patterns (RPs). We study 3 important temporal rates: “Stress” (~2 Hz), “Syllable” (~4 Hz) and “Sub-beat” (reduced syllables, ~14 Hz). 21 dyslexics and 21 controls listened to nursery rhyme sentences that had been tone-vocoded using either single AM rates from the speech envelope (Stress only, Syllable only, Sub-beat only) or pairs of AM rates (Stress + Syllable, Syllable + Sub-beat). They were asked to use the acoustic rhythm of the stimulus to identity the original nursery rhyme sentence. The data showed that dyslexics were significantly poorer at detecting rhythm compared to controls when they had to utilize multi-rate temporal information from pairs of AMs (Stress + Syllable or Syllable + Sub-beat). These data suggest that dyslexia is associated with a reduced ability to utilize AMs <20 Hz for rhythm recognition. This perceptual deficit in utilizing AM patterns in speech could be underpinned by less efficient neuronal phase alignment and cross-frequency neuronal oscillatory synchronization in dyslexia. Dyslexics' perceptual difficulties in capturing the full spectro-temporal complexity of speech over multiple timescales could contribute to the development of impaired phonological representations for words, the cognitive hallmark of dyslexia across languages. PMID:24605099
The Temporal Prediction of Stress in Speech and Its Relation to Musical Beat Perception.

PubMed

Beier, Eleonora J; Ferreira, Fernanda

2018-01-01

While rhythmic expectancies are thought to be at the base of beat perception in music, the extent to which stress patterns in speech are similarly represented and predicted during on-line language comprehension is debated. The temporal prediction of stress may be advantageous to speech processing, as stress patterns aid segmentation and mark new information in utterances. However, while linguistic stress patterns may be organized into hierarchical metrical structures similarly to musical meter, they do not typically present the same degree of periodicity. We review the theoretical background for the idea that stress patterns are predicted and address the following questions: First, what is the evidence that listeners can predict the temporal location of stress based on preceding rhythm? If they can, is it thanks to neural entrainment mechanisms similar to those utilized for musical beat perception? And lastly, what linguistic factors other than rhythm may account for the prediction of stress in natural speech? We conclude that while expectancies based on the periodic presentation of stresses are at play in some of the current literature, other processes are likely to affect the prediction of stress in more naturalistic, less isochronous speech. Specifically, aspects of prosody other than amplitude changes (e.g., intonation) as well as lexical, syntactic and information structural constraints on the realization of stress may all contribute to the probabilistic expectation of stress in speech.
Role of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening.

PubMed

Swaminathan, Jayaganesh; Mason, Christine R; Streeter, Timothy M; Best, Virginia; Roverud, Elin; Kidd, Gerald

2016-08-03

While conversing in a crowded social setting, a listener is often required to follow a target speech signal amid multiple competing speech signals (the so-called "cocktail party" problem). In such situations, separation of the target speech signal in azimuth from the interfering masker signals can lead to an improvement in target intelligibility, an effect known as spatial release from masking (SRM). This study assessed the contributions of two stimulus properties that vary with separation of sound sources, binaural envelope (ENV) and temporal fine structure (TFS), to SRM in normal-hearing (NH) human listeners. Target speech was presented from the front and speech maskers were either colocated with or symmetrically separated from the target in azimuth. The target and maskers were presented either as natural speech or as "noise-vocoded" speech in which the intelligibility was conveyed only by the speech ENVs from several frequency bands; the speech TFS within each band was replaced with noise carriers. The experiments were designed to preserve the spatial cues in the speech ENVs while retaining/eliminating them from the TFS. This was achieved by using the same/different noise carriers in the two ears. A phenomenological auditory-nerve model was used to verify that the interaural correlations in TFS differed across conditions, whereas the ENVs retained a high degree of correlation, as intended. Overall, the results from this study revealed that binaural TFS cues, especially for frequency regions below 1500 Hz, are critical for achieving SRM in NH listeners. Potential implications for studying SRM in hearing-impaired listeners are discussed. Acoustic signals received by the auditory system pass first through an array of physiologically based band-pass filters. Conceptually, at the output of each filter, there are two principal forms of temporal information: slowly varying fluctuations in the envelope (ENV) and rapidly varying fluctuations in the temporal fine structure (TFS). The importance of these two types of information in everyday listening (e.g., conversing in a noisy social situation; the "cocktail-party" problem) has not been established. This study assessed the contributions of binaural ENV and TFS cues for understanding speech in multiple-talker situations. Results suggest that, whereas the ENV cues are important for speech intelligibility, binaural TFS cues are critical for perceptually segregating the different talkers and thus for solving the cocktail party problem. Copyright © 2016 the authors 0270-6474/16/368250-08$15.00/0.
Are individuals with Parkinson's disease capable of speech-motor learning? - A preliminary evaluation.

PubMed

Kaipa, Ramesh; Jones, Richard D; Robb, Michael P

2016-07-01

The benefits of different practice conditions in limb-based rehabilitation of motor disorders are well documented. Conversely, the role of practice structure in the treatment of motor-based speech disorders has only been minimally investigated. Considering this limitation, the current study aimed to investigate the effectiveness of selected practice conditions in spatial and temporal learning of novel speech utterances in individuals with Parkinson's disease (PD). Participants included 16 individuals with PD who were randomly and equally assigned to constant, variable, random, and blocked practice conditions. Participants in all four groups practiced a speech phrase for two consecutive days, and reproduced the speech phrase on the third day without further practice or feedback. There were no significant differences (p > 0.05) between participants across the four practice conditions with respect to either spatial or temporal learning of the speech phrase. Overall, PD participants demonstrated diminished spatial and temporal learning in comparison to healthy controls. Tests of strength of association between participants' demographic/clinical characteristics and speech-motor learning outcomes did not reveal any significant correlations. The findings from the current study suggest that repeated practice facilitates speech-motor learning in individuals with PD irrespective of the type of practice. Clinicians need to be cautious in applying practice conditions to treat speech deficits associated with PD based on the findings of non-speech-motor learning tasks. Copyright © 2016 Elsevier Ltd. All rights reserved.
The functional and structural asymmetries of the superior temporal sulcus.

PubMed

Specht, Karsten; Wigglesworth, Philip

2018-02-01

The superior temporal sulcus (STS) is an anatomical structure that increasingly interests researchers. This structure appears to receive multisensory input and is involved in several perceptual and cognitive core functions, such as speech perception, audiovisual integration, (biological) motion processing and theory of mind capacities. In addition, the superior temporal sulcus is not only one of the longest sulci of the brain, but it also shows marked functional and structural asymmetries, some of which have only been found in humans. To explore the functional-structural relationships of these asymmetries in more detail, this study combines functional and structural magnetic resonance imaging. Using a speech perception task, an audiovisual integration task, and a theory of mind task, this study again demonstrated an involvement of the STS in these processes, with an expected strong leftward asymmetry for the speech perception task. Furthermore, this study confirmed the earlier described, human-specific asymmetries, namely that the left STS is longer than the right STS and that the right STS is deeper than the left STS. However, this study did not find any relationship between these structural asymmetries and the detected brain activations or their functional asymmetries. This can, on the other hand, give further support to the notion that the structural asymmetry of the STS is not directly related to the functional asymmetry of the speech perception and the language system as a whole, but that it may have other causes and functions. © 2018 The Authors. Scandinavian Journal of Psychology published by Scandinavian Psychological Associations and John Wiley & Sons Ltd.
Neural mechanisms underlying auditory feedback control of speech

PubMed Central

Reilly, Kevin J.; Guenther, Frank H.

2013-01-01

The neural substrates underlying auditory feedback control of speech were investigated using a combination of functional magnetic resonance imaging (fMRI) and computational modeling. Neural responses were measured while subjects spoke monosyllabic words under two conditions: (i) normal auditory feedback of their speech, and (ii) auditory feedback in which the first formant frequency of their speech was unexpectedly shifted in real time. Acoustic measurements showed compensation to the shift within approximately 135 ms of onset. Neuroimaging revealed increased activity in bilateral superior temporal cortex during shifted feedback, indicative of neurons coding mismatches between expected and actual auditory signals, as well as right prefrontal and Rolandic cortical activity. Structural equation modeling revealed increased influence of bilateral auditory cortical areas on right frontal areas during shifted speech, indicating that projections from auditory error cells in posterior superior temporal cortex to motor correction cells in right frontal cortex mediate auditory feedback control of speech. PMID:18035557
Rise Time and Formant Transition Duration in the Discrimination of Speech Sounds: The Ba-Wa Distinction in Developmental Dyslexia

ERIC Educational Resources Information Center

Goswami, Usha; Fosker, Tim; Huss, Martina; Mead, Natasha; Szucs, Denes

2011-01-01

Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory…
Spectrotemporal modulation sensitivity for hearing-impaired listeners: dependence on carrier center frequency and the relationship to speech intelligibility.

PubMed

Mehraei, Golbarg; Gallun, Frederick J; Leek, Marjorie R; Bernstein, Joshua G W

2014-07-01

Poor speech understanding in noise by hearing-impaired (HI) listeners is only partly explained by elevated audiometric thresholds. Suprathreshold-processing impairments such as reduced temporal or spectral resolution or temporal fine-structure (TFS) processing ability might also contribute. Although speech contains dynamic combinations of temporal and spectral modulation and TFS content, these capabilities are often treated separately. Modulation-depth detection thresholds for spectrotemporal modulation (STM) applied to octave-band noise were measured for normal-hearing and HI listeners as a function of temporal modulation rate (4-32 Hz), spectral ripple density [0.5-4 cycles/octave (c/o)] and carrier center frequency (500-4000 Hz). STM sensitivity was worse than normal for HI listeners only for a low-frequency carrier (1000 Hz) at low temporal modulation rates (4-12 Hz) and a spectral ripple density of 2 c/o, and for a high-frequency carrier (4000 Hz) at a high spectral ripple density (4 c/o). STM sensitivity for the 4-Hz, 4-c/o condition for a 4000-Hz carrier and for the 4-Hz, 2-c/o condition for a 1000-Hz carrier were correlated with speech-recognition performance in noise after partialling out the audiogram-based speech-intelligibility index. Poor speech-reception and STM-detection performance for HI listeners may be related to a combination of reduced frequency selectivity and a TFS-processing deficit limiting the ability to track spectral-peak movements.
Using Flanagan's phase vocoder to improve cochlear implant performance

NASA Astrophysics Data System (ADS)

Zeng, Fan-Gang

2004-10-01

The cochlear implant has restored partial hearing to more than 100000 deaf people worldwide, allowing the average user to talk on the telephone in quiet environment. However, significant difficulty still remains for speech recognition in noise, music perception, and tonal language understanding. This difficulty may be related to speech processing strategies in current cochlear implants that emphasized the extraction and encoding of the temporal envelope while ignoring the temporal fine structure in speech sounds. A novel strategy was developed based on Flanagan's phase vocoder [Flanagan and Golden, Bell Syst. Tech. 45, 1493-1509 (1966)], in which frequency modulation was extracted from the temporal fine structure and then added to amplitude modulation in the current cochlear implants. Acoustic simulation results showed that amplitude and frequency modulation contributed complementarily to speech perception with amplitude modulation contributing mainly to intelligibility whereas frequency modulation contributed to speaker identification and auditory grouping. The results also showed that the novel strategy significantly improved cochlear implant performance under realistic listening situations. Overall, the present result demonstrated that Flanagan's classic work on phase vocoder still shed insight on current problems of both theoretical and practical importance. [Work supported by NIH.
Effects of linear and nonlinear speech rate changes on speech intelligibility in stationary and fluctuating maskers

PubMed Central

Cooke, Martin; Aubanel, Vincent

2017-01-01

Algorithmic modifications to the durational structure of speech designed to avoid intervals of intense masking lead to increases in intelligibility, but the basis for such gains is not clear. The current study addressed the possibility that the reduced information load produced by speech rate slowing might explain some or all of the benefits of durational modifications. The study also investigated the influence of masker stationarity on the effectiveness of durational changes. Listeners identified keywords in sentences that had undergone linear and nonlinear speech rate changes resulting in overall temporal lengthening in the presence of stationary and fluctuating maskers. Relative to unmodified speech, a slower speech rate produced no intelligibility gains for the stationary masker, suggesting that a reduction in information rate does not underlie intelligibility benefits of durationally modified speech. However, both linear and nonlinear modifications led to substantial intelligibility increases in fluctuating noise. One possibility is that overall increases in speech duration provide no new phonetic information in stationary masking conditions, but that temporal fluctuations in the background increase the likelihood of glimpsing additional salient speech cues. Alternatively, listeners may have benefitted from an increase in the difference in speech rates between the target and background. PMID:28618803
A glimpsing account of the role of temporal fine structure information in speech recognition.

PubMed

Apoux, Frédéric; Healy, Eric W

2013-01-01

Many behavioral studies have reported a significant decrease in intelligibility when the temporal fine structure (TFS) of a sound mixture is replaced with noise or tones (i.e., vocoder processing). This finding has led to the conclusion that TFS information is critical for speech recognition in noise. How the normal -auditory system takes advantage of the original TFS, however, remains unclear. Three -experiments on the role of TFS in noise are described. All three experiments measured speech recognition in various backgrounds while manipulating the envelope, TFS, or both. One experiment tested the hypothesis that vocoder processing may artificially increase the apparent importance of TFS cues. Another experiment evaluated the relative contribution of the target and masker TFS by disturbing only the TFS of the target or that of the masker. Finally, a last experiment evaluated the -relative contribution of envelope and TFS information. In contrast to previous -studies, however, the original envelope and TFS were both preserved - to some extent - in all conditions. Overall, the experiments indicate a limited influence of TFS and suggest that little speech information is extracted from the TFS. Concomitantly, these experiments confirm that most speech information is carried by the temporal envelope in real-world conditions. When interpreted within the framework of the glimpsing model, the results of these experiments suggest that TFS is primarily used as a grouping cue to select the time-frequency regions -corresponding to the target speech signal.
Breath Group Analysis for Reading and Spontaneous Speech in Healthy Adults

PubMed Central

Wang, Yu-Tsai; Green, Jordan R.; Nip, Ignatius S.B.; Kent, Ray D.; Kent, Jane Finley

2010-01-01

Aims The breath group can serve as a functional unit to define temporal and fundamental frequency (f0) features in continuous speech. These features of the breath group are determined by the physiologic, linguistic, and cognitive demands of communication. Reading and spontaneous speech are two speaking tasks that vary in these demands and are commonly used to evaluate speech performance for research and clinical applications. The purpose of this study is to examine differences between reading and spontaneous speech in the temporal and f0 aspects of their breath groups. Methods Sixteen participants read two passages and answered six questions while wearing a circumferentially vented mask connected to a pneumotach. The aerodynamic signal was used to identify inspiratory locations. The audio signal was used to analyze task differences in breath group structure, including temporal and f0 components. Results The main findings were that spontaneous speech task exhibited significantly more grammatically inappropriate breath group locations and longer breath group duration than did the passage reading task. Conclusion The task differences in the percentage of grammatically inadequate breath group locations and in breath group duration for healthy adult speakers partly explain the differences in cognitive-linguistic load between the passage reading and spontaneous speech. PMID:20588052
Breath group analysis for reading and spontaneous speech in healthy adults.

PubMed

Wang, Yu-Tsai; Green, Jordan R; Nip, Ignatius S B; Kent, Ray D; Kent, Jane Finley

2010-01-01

The breath group can serve as a functional unit to define temporal and fundamental frequency (f0) features in continuous speech. These features of the breath group are determined by the physiologic, linguistic, and cognitive demands of communication. Reading and spontaneous speech are two speaking tasks that vary in these demands and are commonly used to evaluate speech performance for research and clinical applications. The purpose of this study is to examine differences between reading and spontaneous speech in the temporal and f0 aspects of their breath groups. Sixteen participants read two passages and answered six questions while wearing a circumferentially vented mask connected to a pneumotach. The aerodynamic signal was used to identify inspiratory locations. The audio signal was used to analyze task differences in breath group structure, including temporal and f0 components. The main findings were that spontaneous speech task exhibited significantly more grammatically inappropriate breath group locations and longer breath group duration than did the passage reading task. The task differences in the percentage of grammatically inadequate breath group locations and in breath group duration for healthy adult speakers partly explain the differences in cognitive-linguistic load between the passage reading and spontaneous speech. Copyright © 2010 S. Karger AG, Basel.
Combined electric and acoustic hearing performance with Zebra® speech processor: speech reception, place, and temporal coding evaluation.

PubMed

Vaerenberg, Bart; Péan, Vincent; Lesbros, Guillaume; De Ceulaer, Geert; Schauwers, Karen; Daemers, Kristin; Gnansia, Dan; Govaerts, Paul J

2013-06-01

To assess the auditory performance of Digisonic(®) cochlear implant users with electric stimulation (ES) and electro-acoustic stimulation (EAS) with special attention to the processing of low-frequency temporal fine structure. Six patients implanted with a Digisonic(®) SP implant and showing low-frequency residual hearing were fitted with the Zebra(®) speech processor providing both electric and acoustic stimulation. Assessment consisted of monosyllabic speech identification tests in quiet and in noise at different presentation levels, and a pitch discrimination task using harmonic and disharmonic intonating complex sounds ( Vaerenberg et al., 2011 ). These tests investigate place and time coding through pitch discrimination. All tasks were performed with ES only and with EAS. Speech results in noise showed significant improvement with EAS when compared to ES. Whereas EAS did not yield better results in the harmonic intonation test, the improvements in the disharmonic intonation test were remarkable, suggesting better coding of pitch cues requiring phase locking. These results suggest that patients with residual hearing in the low-frequency range still have good phase-locking capacities, allowing them to process fine temporal information. ES relies mainly on place coding but provides poor low-frequency temporal coding, whereas EAS also provides temporal coding in the low-frequency range. Patients with residual phase-locking capacities can make use of these cues.
Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing

PubMed Central

Rauschecker, Josef P; Scott, Sophie K

2010-01-01

Speech and language are considered uniquely human abilities: animals have communication systems, but they do not match human linguistic skills in terms of recursive structure and combinatorial power. Yet, in evolution, spoken language must have emerged from neural mechanisms at least partially available in animals. In this paper, we will demonstrate how our understanding of speech perception, one important facet of language, has profited from findings and theory in nonhuman primate studies. Chief among these are physiological and anatomical studies showing that primate auditory cortex, across species, shows patterns of hierarchical structure, topographic mapping and streams of functional processing. We will identify roles for different cortical areas in the perceptual processing of speech and review functional imaging work in humans that bears on our understanding of how the brain decodes and monitors speech. A new model connects structures in the temporal, frontal and parietal lobes linking speech perception and production. PMID:19471271
Sensory deprivation due to otitis media episodes in early childhood and its effect at later age: A psychoacoustic and speech perception measure.

PubMed

Shetty, Hemanth Narayan; Koonoor, Vishal

2016-11-01

Past research has reported that children with repeated occurrences of otitis media at an early age have a negative impact on speech perception at a later age. The present study necessitates documenting the temporal and spectral processing on speech perception in noise from normal and atypical groups. The present study evaluated the relation between speech perception in noise and temporal; and spectral processing abilities in children with normal and atypical groups. The study included two experiments. In the first experiment, temporal resolution and frequency discrimination of listeners with normal group and three subgroups of atypical groups (had a history of OM) a) less than four episodes b) four to nine episodes and c) More than nine episodes during their chronological age of 6 months to 2 years) were evaluated using measures of temporal modulation transfer function and frequency discrimination test. In the second experiment, SNR 50 was evaluated on each group of study participants. All participants had normal hearing and middle ear status during the course of testing. Demonstrated that children with atypical group had significantly poorer modulation detection threshold, peak sensitivity and bandwidth; and frequency discrimination to each F0 than normal hearing listeners. Furthermore, there was a significant correlation seen between measures of temporal resolution; frequency discrimination and speech perception in noise. It infers atypical groups have significant impairment in extracting envelope as well as fine structure cues from the signal. The results supported the idea that episodes of OM before 2 years of agecan produce periods of sensory deprivation that alters the temporal and spectral skills which in turn has negative consequences on speech perception in noise. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

Temporal Resolution Needed for Auditory Communication: Measurement With Mosaic Speech

PubMed Central

Nakajima, Yoshitaka; Matsuda, Mizuki; Ueda, Kazuo; Remijn, Gerard B.

2018-01-01

Temporal resolution needed for Japanese speech communication was measured. A new experimental paradigm that can reflect the spectro-temporal resolution necessary for healthy listeners to perceive speech is introduced. As a first step, we report listeners' intelligibility scores of Japanese speech with a systematically degraded temporal resolution, so-called “mosaic speech”: speech mosaicized in the coordinates of time and frequency. The results of two experiments show that mosaic speech cut into short static segments was almost perfectly intelligible with a temporal resolution of 40 ms or finer. Intelligibility dropped for a temporal resolution of 80 ms, but was still around 50%-correct level. The data are in line with previous results showing that speech signals separated into short temporal segments of <100 ms can be remarkably robust in terms of linguistic-content perception against drastic manipulations in each segment, such as partial signal omission or temporal reversal. The human perceptual system thus can extract meaning from unexpectedly rough temporal information in speech. The process resembles that of the visual system stringing together static movie frames of ~40 ms into vivid motion. PMID:29740295
Spectrotemporal modulation sensitivity for hearing-impaired listeners: Dependence on carrier center frequency and the relationship to speech intelligibility

PubMed Central

Mehraei, Golbarg; Gallun, Frederick J.; Leek, Marjorie R.; Bernstein, Joshua G. W.

2014-01-01

Poor speech understanding in noise by hearing-impaired (HI) listeners is only partly explained by elevated audiometric thresholds. Suprathreshold-processing impairments such as reduced temporal or spectral resolution or temporal fine-structure (TFS) processing ability might also contribute. Although speech contains dynamic combinations of temporal and spectral modulation and TFS content, these capabilities are often treated separately. Modulation-depth detection thresholds for spectrotemporal modulation (STM) applied to octave-band noise were measured for normal-hearing and HI listeners as a function of temporal modulation rate (4–32 Hz), spectral ripple density [0.5–4 cycles/octave (c/o)] and carrier center frequency (500–4000 Hz). STM sensitivity was worse than normal for HI listeners only for a low-frequency carrier (1000 Hz) at low temporal modulation rates (4–12 Hz) and a spectral ripple density of 2 c/o, and for a high-frequency carrier (4000 Hz) at a high spectral ripple density (4 c/o). STM sensitivity for the 4-Hz, 4-c/o condition for a 4000-Hz carrier and for the 4-Hz, 2-c/o condition for a 1000-Hz carrier were correlated with speech-recognition performance in noise after partialling out the audiogram-based speech-intelligibility index. Poor speech-reception and STM-detection performance for HI listeners may be related to a combination of reduced frequency selectivity and a TFS-processing deficit limiting the ability to track spectral-peak movements. PMID:24993215
Population responses in primary auditory cortex simultaneously represent the temporal envelope and periodicity features in natural speech.

PubMed

Abrams, Daniel A; Nicol, Trent; White-Schwoch, Travis; Zecker, Steven; Kraus, Nina

2017-05-01

Speech perception relies on a listener's ability to simultaneously resolve multiple temporal features in the speech signal. Little is known regarding neural mechanisms that enable the simultaneous coding of concurrent temporal features in speech. Here we show that two categories of temporal features in speech, the low-frequency speech envelope and periodicity cues, are processed by distinct neural mechanisms within the same population of cortical neurons. We measured population activity in primary auditory cortex of anesthetized guinea pig in response to three variants of a naturally produced sentence. Results show that the envelope of population responses closely tracks the speech envelope, and this cortical activity more closely reflects wider bandwidths of the speech envelope compared to narrow bands. Additionally, neuronal populations represent the fundamental frequency of speech robustly with phase-locked responses. Importantly, these two temporal features of speech are simultaneously observed within neuronal ensembles in auditory cortex in response to clear, conversation, and compressed speech exemplars. Results show that auditory cortical neurons are adept at simultaneously resolving multiple temporal features in extended speech sentences using discrete coding mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.
A correlational method to concurrently measure envelope and temporal fine structure weights: effects of age, cochlear pathology, and spectral shaping.

PubMed

Fogerty, Daniel; Humes, Larry E

2012-09-01

The speech signal may be divided into spectral frequency-bands, each band containing temporal properties of the envelope and fine structure. This study measured the perceptual weights for the envelope and fine structure in each of three frequency bands for sentence materials in young normal-hearing listeners, older normal-hearing listeners, aided older hearing-impaired listeners, and spectrally matched young normal-hearing listeners. The availability of each acoustic property was independently varied through noisy signal extraction. Thus, the full speech stimulus was presented with noise used to mask six different auditory channels. Perceptual weights were determined by correlating a listener's performance with the signal-to-noise ratio of each acoustic property on a trial-by-trial basis. Results demonstrate that temporal fine structure perceptual weights remain stable across the four listener groups. However, a different weighting typography was observed across the listener groups for envelope cues. Results suggest that spectral shaping used to preserve the audibility of the speech stimulus may alter the allocation of perceptual resources. The relative perceptual weighting of envelope cues may also change with age. Concurrent testing of sentences repeated once on a previous day demonstrated that weighting strategies for all listener groups can change, suggesting an initial stabilization period or susceptibility to auditory training.
Effects of Time-Compressed Speech Training on Multiple Functional and Structural Neural Mechanisms Involving the Left Superior Temporal Gyrus.

PubMed

Maruyama, Tsukasa; Takeuchi, Hikaru; Taki, Yasuyuki; Motoki, Kosuke; Jeong, Hyeonjeong; Kotozaki, Yuka; Nakagawa, Seishu; Nouchi, Rui; Iizuka, Kunio; Yokoyama, Ryoichi; Yamamoto, Yuki; Hanawa, Sugiko; Araki, Tsuyoshi; Sakaki, Kohei; Sasaki, Yukako; Magistro, Daniele; Kawashima, Ryuta

2018-01-01

Time-compressed speech is an artificial form of rapidly presented speech. Training with time-compressed speech (TCSSL) in a second language leads to adaptation toward TCSSL. Here, we newly investigated the effects of 4 weeks of training with TCSSL on diverse cognitive functions and neural systems using the fractional amplitude of spontaneous low-frequency fluctuations (fALFF), resting-state functional connectivity (RSFC) with the left superior temporal gyrus (STG), fractional anisotropy (FA), and regional gray matter volume (rGMV) of young adults by magnetic resonance imaging. There were no significant differences in change of performance of measures of cognitive functions or second language skills after training with TCSSL compared with that of the active control group. However, compared with the active control group, training with TCSSL was associated with increased fALFF, RSFC, and FA and decreased rGMV involving areas in the left STG. These results lacked evidence of a far transfer effect of time-compressed speech training on a wide range of cognitive functions and second language skills in young adults. However, these results demonstrated effects of time-compressed speech training on gray and white matter structures as well as on resting-state intrinsic activity and connectivity involving the left STG, which plays a key role in listening comprehension.
A multimodal spectral approach to characterize rhythm in natural speech.

PubMed

Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

2016-01-01

Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech.
Neuronal basis of speech comprehension.

PubMed

Specht, Karsten

2014-01-01

Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.
[Multidimensionality of inner speech and its relationship with abnormal perceptions].

PubMed

Tamayo-Agudelo, William; Vélez-Urrego, Juan David; Gaviria-Castaño, Gilberto; Perona-Garcelán, Salvador

Inner speech is a common human experience. Recently, there have been studies linking this experience with cognitive functions, such as problem solving, reading, writing, autobiographical memory, and some disorders, such as anxiety and depression. In addition, inner speech is recognised as the main source of auditory hallucinations. The main purpose of this study is to establish the factor structure of Varieties of Inner Speech Questionnaire (VISQ) in a sample of the Colombian population. Furthermore, it aims at establishing a link between VISQ and abnormal perceptions. This was a cross-sectional study in which 232 college students were assessed using the VISQ and the Cardiff Anomalous Perceptions Scale (CAPS). Through an exploratory factor analysis, a structure of three factors was found: Other Voices in the Internal Speech, Condensed Inner speech, and Dialogical/Evaluative Inner speech, all of them with acceptable levels of reliability. Gender differences were found in the second and third factor, with higher averages for women. Positive correlations were found among the three VISQ and the two CAPS factors: Multimodal Perceptual Alterations and Experiences Associated with the Temporal Lobe. The results are consistent with previous findings linking the factors of inner speech with the propensity to auditory hallucination, a phenomenon widely associated with temporal lobe abnormalities. The hallucinations associated with other perceptual systems, however, are still weakly explained. Copyright © 2016 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system.

PubMed

Shamma, Shihab; Lorenzi, Christian

2013-05-01

There is much debate on how the spectrotemporal modulations of speech (or its spectrogram) are encoded in the responses of the auditory nerve, and whether speech intelligibility is best conveyed via the "envelope" (E) or "temporal fine-structure" (TFS) of the neural responses. Wide use of vocoders to resolve this question has commonly assumed that manipulating the amplitude-modulation and frequency-modulation components of the vocoded signal alters the relative importance of E or TFS encoding on the nerve, thus facilitating assessment of their relative importance to intelligibility. Here we argue that this assumption is incorrect, and that the vocoder approach is ineffective in differentially altering the neural E and TFS. In fact, we demonstrate using a simplified model of early auditory processing that both neural E and TFS encode the speech spectrogram with constant and comparable relative effectiveness regardless of the vocoder manipulations. However, we also show that neural TFS cues are less vulnerable than their E counterparts under severe noisy conditions, and hence should play a more prominent role in cochlear stimulation strategies.
Learning Pitch with STDP: A Computational Model of Place and Temporal Pitch Perception Using Spiking Neural Networks.

PubMed

Erfanian Saeedi, Nafise; Blamey, Peter J; Burkitt, Anthony N; Grayden, David B

2016-04-01

Pitch perception is important for understanding speech prosody, music perception, recognizing tones in tonal languages, and perceiving speech in noisy environments. The two principal pitch perception theories consider the place of maximum neural excitation along the auditory nerve and the temporal pattern of the auditory neurons' action potentials (spikes) as pitch cues. This paper describes a biophysical mechanism by which fine-structure temporal information can be extracted from the spikes generated at the auditory periphery. Deriving meaningful pitch-related information from spike times requires neural structures specialized in capturing synchronous or correlated activity from amongst neural events. The emergence of such pitch-processing neural mechanisms is described through a computational model of auditory processing. Simulation results show that a correlation-based, unsupervised, spike-based form of Hebbian learning can explain the development of neural structures required for recognizing the pitch of simple and complex tones, with or without the fundamental frequency. The temporal code is robust to variations in the spectral shape of the signal and thus can explain the phenomenon of pitch constancy.
Learning Pitch with STDP: A Computational Model of Place and Temporal Pitch Perception Using Spiking Neural Networks

PubMed Central

Erfanian Saeedi, Nafise; Blamey, Peter J.; Burkitt, Anthony N.; Grayden, David B.

2016-01-01

Pitch perception is important for understanding speech prosody, music perception, recognizing tones in tonal languages, and perceiving speech in noisy environments. The two principal pitch perception theories consider the place of maximum neural excitation along the auditory nerve and the temporal pattern of the auditory neurons’ action potentials (spikes) as pitch cues. This paper describes a biophysical mechanism by which fine-structure temporal information can be extracted from the spikes generated at the auditory periphery. Deriving meaningful pitch-related information from spike times requires neural structures specialized in capturing synchronous or correlated activity from amongst neural events. The emergence of such pitch-processing neural mechanisms is described through a computational model of auditory processing. Simulation results show that a correlation-based, unsupervised, spike-based form of Hebbian learning can explain the development of neural structures required for recognizing the pitch of simple and complex tones, with or without the fundamental frequency. The temporal code is robust to variations in the spectral shape of the signal and thus can explain the phenomenon of pitch constancy. PMID:27049657
A correlational method to concurrently measure envelope and temporal fine structure weights: Effects of age, cochlear pathology, and spectral shaping1

PubMed Central

Fogerty, Daniel; Humes, Larry E.

2012-01-01

The speech signal may be divided into spectral frequency-bands, each band containing temporal properties of the envelope and fine structure. This study measured the perceptual weights for the envelope and fine structure in each of three frequency bands for sentence materials in young normal-hearing listeners, older normal-hearing listeners, aided older hearing-impaired listeners, and spectrally matched young normal-hearing listeners. The availability of each acoustic property was independently varied through noisy signal extraction. Thus, the full speech stimulus was presented with noise used to mask six different auditory channels. Perceptual weights were determined by correlating a listener’s performance with the signal-to-noise ratio of each acoustic property on a trial-by-trial basis. Results demonstrate that temporal fine structure perceptual weights remain stable across the four listener groups. However, a different weighting typography was observed across the listener groups for envelope cues. Results suggest that spectral shaping used to preserve the audibility of the speech stimulus may alter the allocation of perceptual resources. The relative perceptual weighting of envelope cues may also change with age. Concurrent testing of sentences repeated once on a previous day demonstrated that weighting strategies for all listener groups can change, suggesting an initial stabilization period or susceptibility to auditory training. PMID:22978896
Effects of Time-Compressed Speech Training on Multiple Functional and Structural Neural Mechanisms Involving the Left Superior Temporal Gyrus

PubMed Central

Maruyama, Tsukasa; Taki, Yasuyuki; Motoki, Kosuke; Jeong, Hyeonjeong; Kotozaki, Yuka; Nakagawa, Seishu; Iizuka, Kunio; Yokoyama, Ryoichi; Yamamoto, Yuki; Hanawa, Sugiko; Araki, Tsuyoshi; Sakaki, Kohei; Sasaki, Yukako; Magistro, Daniele; Kawashima, Ryuta

2018-01-01

Time-compressed speech is an artificial form of rapidly presented speech. Training with time-compressed speech (TCSSL) in a second language leads to adaptation toward TCSSL. Here, we newly investigated the effects of 4 weeks of training with TCSSL on diverse cognitive functions and neural systems using the fractional amplitude of spontaneous low-frequency fluctuations (fALFF), resting-state functional connectivity (RSFC) with the left superior temporal gyrus (STG), fractional anisotropy (FA), and regional gray matter volume (rGMV) of young adults by magnetic resonance imaging. There were no significant differences in change of performance of measures of cognitive functions or second language skills after training with TCSSL compared with that of the active control group. However, compared with the active control group, training with TCSSL was associated with increased fALFF, RSFC, and FA and decreased rGMV involving areas in the left STG. These results lacked evidence of a far transfer effect of time-compressed speech training on a wide range of cognitive functions and second language skills in young adults. However, these results demonstrated effects of time-compressed speech training on gray and white matter structures as well as on resting-state intrinsic activity and connectivity involving the left STG, which plays a key role in listening comprehension. PMID:29675038
Why Movement Is Captured by Music, but Less by Speech: Role of Temporal Regularity

PubMed Central

Dalla Bella, Simone; Białuńska, Anita; Sowiński, Jakub

2013-01-01

Music has a pervasive tendency to rhythmically engage our body. In contrast, synchronization with speech is rare. Music’s superiority over speech in driving movement probably results from isochrony of musical beats, as opposed to irregular speech stresses. Moreover, the presence of regular patterns of embedded periodicities (i.e., meter) may be critical in making music particularly conducive to movement. We investigated these possibilities by asking participants to synchronize with isochronous auditory stimuli (target), while music and speech distractors were presented at one of various phase relationships with respect to the target. In Exp. 1, familiar musical excerpts and fragments of children poetry were used as distractors. The stimuli were manipulated in terms of beat/stress isochrony and average pitch to achieve maximum comparability. In Exp. 2, the distractors were well-known songs performed with lyrics, on a reiterated syllable, and spoken lyrics, all having the same meter. Music perturbed synchronization with the target stimuli more than speech fragments. However, music superiority over speech disappeared when distractors shared isochrony and the same meter. Music’s peculiar and regular temporal structure is likely to be the main factor fostering tight coupling between sound and movement. PMID:23936534
Temporal variability in sung productions of adolescents who stutter.

PubMed

Falk, Simone; Maslow, Elena; Thum, Georg; Hoole, Philip

2016-01-01

Singing has long been used as a technique to enhance and reeducate temporal aspects of articulation in speech disorders. In the present study, differences in temporal structure of sung versus spoken speech were investigated in stuttering. In particular, the question was examined if singing helps to reduce VOT variability of voiceless plosives, which would indicate enhanced temporal coordination of oral and laryngeal processes. Eight German adolescents who stutter and eight typically fluent peers repeatedly spoke and sang a simple German congratulation formula in which a disyllabic target word (e.g., /'ki:ta/) was repeated five times. Every trial, the first syllable of the word was varied starting equally often with one of the three voiceless German stops /p/, /t/, /k/. Acoustic analyses showed that mean VOT and stop gap duration reduced during singing compared to speaking while mean vowel and utterance duration was prolonged in singing in both groups. Importantly, adolescents who stutter significantly reduced VOT variability (measured as the Coefficient of Variation) during sung productions compared to speaking in word-initial stressed positions while the control group showed a slight increase in VOT variability. However, in unstressed syllables, VOT variability increased in both adolescents who do and do not stutter from speech to song. In addition, vowel and utterance durational variability decreased in both groups, yet, adolescents who stutter were still more variable in utterance duration independent of the form of vocalization. These findings shed new light on how singing alters temporal structure and in particular, the coordination of laryngeal-oral timing in stuttering. Future perspectives for investigating how rhythmic aspects could aid the management of fluent speech in stuttering are discussed. Readers will be able to describe (1) current perspectives on singing and its effects on articulation and fluency in stuttering and (2) acoustic parameters such as VOT variability which indicate the efficiency of control and coordination of laryngeal-oral movements. They will understand and be able to discuss (3) how singing reduces temporal variability in the productions of adolescents who do and do not stutter and 4) how this is linked to altered articulatory patterns in singing as well as to its rhythmic structure. Copyright © 2016 Elsevier Inc. All rights reserved.
Age-Related Neural Oscillation Patterns During the Processing of Temporally Manipulated Speech.

PubMed

Rufener, Katharina S; Oechslin, Mathias S; Wöstmann, Malte; Dellwo, Volker; Meyer, Martin

2016-05-01

This EEG-study aims to investigate age-related differences in the neural oscillation patterns during the processing of temporally modulated speech. Viewing from a lifespan perspective, we recorded the electroencephalogram (EEG) data of three age samples: young adults, middle-aged adults and older adults. Stimuli consisted of temporally degraded sentences in Swedish-a language unfamiliar to all participants. We found age-related differences in phonetic pattern matching when participants were presented with envelope-degraded sentences, whereas no such age-effect was observed in the processing of fine-structure-degraded sentences. Irrespective of age, during speech processing the EEG data revealed a relationship between envelope information and the theta band (4-8 Hz) activity. Additionally, an association between fine-structure information and the gamma band (30-48 Hz) activity was found. No interaction, however, was found between acoustic manipulation of stimuli and age. Importantly, our main finding was paralleled by an overall enhanced power in older adults in high frequencies (gamma: 30-48 Hz). This occurred irrespective of condition. For the most part, this result is in line with the Asymmetric Sampling in Time framework (Poeppel in Speech Commun 41:245-255, 2003), which assumes an isomorphic correspondence between frequency modulations in neurophysiological patterns and acoustic oscillations in spoken language. We conclude that speech-specific neural networks show strong stability over adulthood, despite initial processes of cortical degeneration indicated by enhanced gamma power. The results of our study therefore confirm the concept that sensory and cognitive processes undergo multidirectional trajectories within the context of healthy aging.
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users

PubMed Central

Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y.

2018-01-01

Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband “ripple” stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects’ spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. PMID:28601530
Adaptation to delayed auditory feedback induces the temporal recalibration effect in both speech perception and production.

PubMed

Yamamoto, Kosuke; Kawabata, Hideaki

2014-12-01

We ordinarily speak fluently, even though our perceptions of our own voices are disrupted by various environmental acoustic properties. The underlying mechanism of speech is supposed to monitor the temporal relationship between speech production and the perception of auditory feedback, as suggested by a reduction in speech fluency when the speaker is exposed to delayed auditory feedback (DAF). While many studies have reported that DAF influences speech motor processing, its relationship to the temporal tuning effect on multimodal integration, or temporal recalibration, remains unclear. We investigated whether the temporal aspects of both speech perception and production change due to adaptation to the delay between the motor sensation and the auditory feedback. This is a well-used method of inducing temporal recalibration. Participants continually read texts with specific DAF times in order to adapt to the delay. Then, they judged the simultaneity between the motor sensation and the vocal feedback. We measured the rates of speech with which participants read the texts in both the exposure and re-exposure phases. We found that exposure to DAF changed both the rate of speech and the simultaneity judgment, that is, participants' speech gained fluency. Although we also found that a delay of 200 ms appeared to be most effective in decreasing the rates of speech and shifting the distribution on the simultaneity judgment, there was no correlation between these measurements. These findings suggest that both speech motor production and multimodal perception are adaptive to temporal lag but are processed in distinct ways.
Age-group differences in speech identification despite matched audiometrically normal hearing: contributions from auditory temporal processing and cognition

PubMed Central

Füllgrabe, Christian; Moore, Brian C. J.; Stone, Michael A.

2015-01-01

Hearing loss with increasing age adversely affects the ability to understand speech, an effect that results partly from reduced audibility. The aims of this study were to establish whether aging reduces speech intelligibility for listeners with normal audiograms, and, if so, to assess the relative contributions of auditory temporal and cognitive processing. Twenty-one older normal-hearing (ONH; 60–79 years) participants with bilateral audiometric thresholds ≤ 20 dB HL at 0.125–6 kHz were matched to nine young (YNH; 18–27 years) participants in terms of mean audiograms, years of education, and performance IQ. Measures included: (1) identification of consonants in quiet and in noise that was unmodulated or modulated at 5 or 80 Hz; (2) identification of sentences in quiet and in co-located or spatially separated two-talker babble; (3) detection of modulation of the temporal envelope (TE) at frequencies 5–180 Hz; (4) monaural and binaural sensitivity to temporal fine structure (TFS); (5) various cognitive tests. Speech identification was worse for ONH than YNH participants in all types of background. This deficit was not reflected in self-ratings of hearing ability. Modulation masking release (the improvement in speech identification obtained by amplitude modulating a noise background) and spatial masking release (the benefit obtained from spatially separating masker and target speech) were not affected by age. Sensitivity to TE and TFS was lower for ONH than YNH participants, and was correlated positively with speech-in-noise (SiN) identification. Many cognitive abilities were lower for ONH than YNH participants, and generally were correlated positively with SiN identification scores. The best predictors of the intelligibility of SiN were composite measures of cognition and TFS sensitivity. These results suggest that declines in speech perception in older persons are partly caused by cognitive and perceptual changes separate from age-related changes in audiometric sensitivity. PMID:25628563
The role of temporal speech cues in facilitating the fluency of adults who stutter.

PubMed

Park, Jin; Logan, Kenneth J

2015-12-01

Adults who stutter speak more fluently during choral speech contexts than they do during solo speech contexts. The underlying mechanisms for this effect remain unclear, however. In this study, we examined the extent to which the choral speech effect depended on presentation of intact temporal speech cues. We also examined whether speakers who stutter followed choral signals more closely than typical speakers did. 8 adults who stuttered and 8 adults who did not stutter read 60 sentences aloud during a solo speaking condition and three choral speaking conditions (240 total sentences), two of which featured either temporally altered or indeterminate word duration patterns. Effects of these manipulations on speech fluency, rate, and temporal entrainment with the choral speech signal were assessed. Adults who stutter spoke more fluently in all choral speaking conditions than they did when speaking solo. They also spoke slower and exhibited closer temporal entrainment with the choral signal during the mid- to late-stages of sentence production than the adults who did not stutter. Both groups entrained more closely with unaltered choral signals than they did with altered choral signals. Findings suggest that adults who stutter make greater use of speech-related information in choral signals when talking than adults with typical fluency do. The presence of fluency facilitation during temporally altered choral speech and conversation babble, however, suggests that temporal/gestural cueing alone cannot account for fluency facilitation in speakers who stutter. Other potential fluency enhancing mechanisms are discussed. The reader will be able to (a) summarize competing views on stuttering as a speech timing disorder, (b) describe the extent to which adults who stutter depend on an accurate rendering of temporal information in order to benefit from choral speech, and (c) discuss possible explanations for fluency facilitation in the presence of inaccurate or indeterminate temporal cues. Copyright © 2015 Elsevier Inc. All rights reserved.

Amplitude modulation detection with concurrent frequency modulation.

PubMed

Nagaraj, Naveen K

2016-09-01

Human speech consists of concomitant temporal modulations in amplitude and frequency that are crucial for speech perception. In this study, amplitude modulation (AM) detection thresholds were measured for 550 and 5000 Hz carriers with and without concurrent frequency modulation (FM), at AM rates crucial for speech perception. Results indicate that adding 40 Hz FM interferes with AM detection, more so for 5000 Hz carrier and for frequency deviations exceeding the critical bandwidth of the carrier frequency. These findings suggest that future cochlear implant processors, encoding speech fine-structures may consider limiting the FM to narrow bandwidth and to low frequencies.
A Network Model of Observation and Imitation of Speech

PubMed Central

Mashal, Nira; Solodkin, Ana; Dick, Anthony Steven; Chen, E. Elinor; Small, Steven L.

2012-01-01

Much evidence has now accumulated demonstrating and quantifying the extent of shared regional brain activation for observation and execution of speech. However, the nature of the actual networks that implement these functions, i.e., both the brain regions and the connections among them, and the similarities and differences across these networks has not been elucidated. The current study aims to characterize formally a network for observation and imitation of syllables in the healthy adult brain and to compare their structure and effective connectivity. Eleven healthy participants observed or imitated audiovisual syllables spoken by a human actor. We constructed four structural equation models to characterize the networks for observation and imitation in each of the two hemispheres. Our results show that the network models for observation and imitation comprise the same essential structure but differ in important ways from each other (in both hemispheres) based on connectivity. In particular, our results show that the connections from posterior superior temporal gyrus and sulcus to ventral premotor, ventral premotor to dorsal premotor, and dorsal premotor to primary motor cortex in the left hemisphere are stronger during imitation than during observation. The first two connections are implicated in a putative dorsal stream of speech perception, thought to involve translating auditory speech signals into motor representations. Thus, the current results suggest that flow of information during imitation, starting at the posterior superior temporal cortex and ending in the motor cortex, enhances input to the motor cortex in the service of speech execution. PMID:22470360
Temporal predictive mechanisms modulate motor reaction time during initiation and inhibition of speech and hand movement.

PubMed

Johari, Karim; Behroozmand, Roozbeh

2017-08-01

Skilled movement is mediated by motor commands executed with extremely fine temporal precision. The question of how the brain incorporates temporal information to perform motor actions has remained unanswered. This study investigated the effect of stimulus temporal predictability on response timing of speech and hand movement. Subjects performed a randomized vowel vocalization or button press task in two counterbalanced blocks in response to temporally-predictable and unpredictable visual cues. Results indicated that speech and hand reaction time was decreased for predictable compared with unpredictable stimuli. This finding suggests that a temporal predictive code is established to capture temporal dynamics of sensory cues in order to produce faster movements in responses to predictable stimuli. In addition, results revealed a main effect of modality, indicating faster hand movement compared with speech. We suggest that this effect is accounted for by the inherent complexity of speech production compared with hand movement. Lastly, we found that movement inhibition was faster than initiation for both hand and speech, suggesting that movement initiation requires a longer processing time to coordinate activities across multiple regions in the brain. These findings provide new insights into the mechanisms of temporal information processing during initiation and inhibition of speech and hand movement. Copyright © 2017 Elsevier B.V. All rights reserved.
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users.

PubMed

Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y

2017-08-01

Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband "ripple" stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects' spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. Copyright © 2017 Elsevier B.V. All rights reserved.
The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise.

PubMed

Shen, Jing; Souza, Pamela E

2017-09-18

This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for speech recognition in noise. Last, we explored the individual factors that predict the amount of dynamic-pitch benefit for speech recognition in noise. Younger listeners with normal hearing and older listeners with varying levels of hearing sensitivity participated in the study, in which speech reception thresholds were measured with sentences in nonspeech noise. The younger listeners benefited more from dynamic pitch for speech recognition in temporally modulated noise than unmodulated noise. Older listeners were able to benefit from the dynamic-pitch cues but received less benefit from noise modulation than the younger listeners. For those older listeners with hearing loss, the amount of hearing loss strongly predicted the dynamic-pitch benefit for speech recognition in noise. Dynamic-pitch cues aid speech recognition in noise, particularly when noise has temporal modulation. Hearing loss negatively affects the dynamic-pitch benefit to older listeners with significant hearing loss.
The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise

PubMed Central

Souza, Pamela E.

2017-01-01

Purpose This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for speech recognition in noise. Last, we explored the individual factors that predict the amount of dynamic-pitch benefit for speech recognition in noise. Method Younger listeners with normal hearing and older listeners with varying levels of hearing sensitivity participated in the study, in which speech reception thresholds were measured with sentences in nonspeech noise. Results The younger listeners benefited more from dynamic pitch for speech recognition in temporally modulated noise than unmodulated noise. Older listeners were able to benefit from the dynamic-pitch cues but received less benefit from noise modulation than the younger listeners. For those older listeners with hearing loss, the amount of hearing loss strongly predicted the dynamic-pitch benefit for speech recognition in noise. Conclusions Dynamic-pitch cues aid speech recognition in noise, particularly when noise has temporal modulation. Hearing loss negatively affects the dynamic-pitch benefit to older listeners with significant hearing loss. PMID:28800370
Disentangling syntax and intelligibility in auditory language comprehension.

PubMed

Friederici, Angela D; Kotz, Sonja A; Scott, Sophie K; Obleser, Jonas

2010-03-01

Studies of the neural basis of spoken language comprehension typically focus on aspects of auditory processing by varying signal intelligibility, or on higher-level aspects of language processing such as syntax. Most studies in either of these threads of language research report brain activation including peaks in the superior temporal gyrus (STG) and/or the superior temporal sulcus (STS), but it is not clear why these areas are recruited in functionally different studies. The current fMRI study aims to disentangle the functional neuroanatomy of intelligibility and syntax in an orthogonal design. The data substantiate functional dissociations between STS and STG in the left and right hemispheres: first, manipulations of speech intelligibility yield bilateral mid-anterior STS peak activation, whereas syntactic phrase structure violations elicit strongly left-lateralized mid STG and posterior STS activation. Second, ROI analyses indicate all interactions of speech intelligibility and syntactic correctness to be located in the left frontal and temporal cortex, while the observed right-hemispheric activations reflect less specific responses to intelligibility and syntax. Our data demonstrate that the mid-to-anterior STS activation is associated with increasing speech intelligibility, while the mid-to-posterior STG/STS is more sensitive to syntactic information within the speech. 2009 Wiley-Liss, Inc.
A simulation study of harmonics regeneration in noise reduction for electric and acoustic stimulation.

PubMed

Hu, Yi

2010-05-01

Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions.
Multisensory speech perception without the left superior temporal sulcus.

PubMed

Baum, Sarah H; Martin, Randi C; Hamilton, A Cris; Beauchamp, Michael S

2012-09-01

Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. Copyright © 2012 Elsevier Inc. All rights reserved.
Multisensory Speech Perception Without the Left Superior Temporal Sulcus

PubMed Central

Baum, Sarah H.; Martin, Randi C.; Hamilton, A. Cris; Beauchamp, Michael S.

2012-01-01

Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. PMID:22634292
Assessing the effect of physical differences in the articulation of consonants and vowels on audiovisual temporal perception

PubMed Central

Vatakis, Argiro; Maragos, Petros; Rodomagoulakis, Isidoros; Spence, Charles

2012-01-01

We investigated how the physical differences associated with the articulation of speech affect the temporal aspects of audiovisual speech perception. Video clips of consonants and vowels uttered by three different speakers were presented. The video clips were analyzed using an auditory-visual signal saliency model in order to compare signal saliency and behavioral data. Participants made temporal order judgments (TOJs) regarding which speech-stream (auditory or visual) had been presented first. The sensitivity of participants' TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. We expected that in the case of the place of articulation and roundedness, where the visual-speech signal is more salient, temporal perception of speech would be modulated by the visual-speech signal. No such effect was expected for the manner of articulation or height. The results demonstrate that for place and manner of articulation, participants' temporal percept was affected (although not always significantly) by highly-salient speech-signals with the visual-signals requiring smaller visual-leads at the PSS. This was not the case when height was evaluated. These findings suggest that in the case of audiovisual speech perception, a highly salient visual-speech signal may lead to higher probabilities regarding the identity of the auditory-signal that modulate the temporal window of multisensory integration of the speech-stimulus. PMID:23060756
Spatio-Temporal Progression of Cortical Activity Related to Continuous Overt and Covert Speech Production in a Reading Task.

PubMed

Brumberg, Jonathan S; Krusienski, Dean J; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L; Schalk, Gerwin

2016-01-01

How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain.
Spatio-Temporal Progression of Cortical Activity Related to Continuous Overt and Covert Speech Production in a Reading Task

PubMed Central

Brumberg, Jonathan S.; Krusienski, Dean J.; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L.; Schalk, Gerwin

2016-01-01

How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain. PMID:27875590
Discriminating between auditory and motor cortical responses to speech and non-speech mouth sounds

PubMed Central

Agnew, Z.K.; McGettigan, C.; Scott, S.K.

2012-01-01

Several perspectives on speech perception posit a central role for the representation of articulations in speech comprehension, supported by evidence for premotor activation when participants listen to speech. However no experiments have directly tested whether motor responses mirror the profile of selective auditory cortical responses to native speech sounds, or whether motor and auditory areas respond in different ways to sounds. We used fMRI to investigate cortical responses to speech and non-speech mouth (ingressive click) sounds. Speech sounds activated bilateral superior temporal gyri more than other sounds, a profile not seen in motor and premotor cortices. These results suggest that there are qualitative differences in the ways that temporal and motor areas are activated by speech and click sounds: anterior temporal lobe areas are sensitive to the acoustic/phonetic properties while motor responses may show more generalised responses to the acoustic stimuli. PMID:21812557
Audiovisual Asynchrony Detection in Human Speech

ERIC Educational Resources Information Center

Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

2011-01-01

Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…
Speak on time! Effects of a musical rhythmic training on children with hearing loss.

PubMed

Hidalgo, Céline; Falk, Simone; Schön, Daniele

2017-08-01

This study investigates temporal adaptation in speech interaction in children with normal hearing and in children with cochlear implants (CIs) and/or hearing aids (HAs). We also address the question of whether musical rhythmic training can improve these skills in children with hearing loss (HL). Children named pictures presented on the screen in alternation with a virtual partner. Alternation rate (fast or slow) and the temporal predictability (match vs mismatch of stress occurrences) were manipulated. One group of children with normal hearing (NH) and one with HL were tested. The latter group was tested twice: once after 30 min of speech therapy and once after 30 min of musical rhythmic training. Both groups of children (NH and with HL) can adjust their speech production to the rate of alternation of the virtual partner. Moreover, while children with normal hearing benefit from the temporal regularity of stress occurrences, children with HL become sensitive to this manipulation only after rhythmic training. Rhythmic training may help children with HL to structure the temporal flow of their verbal interactions. Copyright © 2017 Elsevier B.V. All rights reserved.
The Effect of Speech Repetition Rate on Neural Activation in Healthy Adults: Implications for Treatment of Aphasia and Other Fluency Disorders.

PubMed

Marchina, Sarah; Norton, Andrea; Kumar, Sandeep; Schlaug, Gottfried

2018-01-01

Functional imaging studies have provided insight into the effect of rate on production of syllables, pseudowords, and naturalistic speech, but the influence of rate on repetition of commonly-used words/phrases suitable for therapeutic use merits closer examination. Aim: To identify speech-motor regions responsive to rate and test the hypothesis that those regions would provide greater support as rates increase, we used an overt speech repetition task and functional magnetic resonance imaging (fMRI) to capture rate-modulated activation within speech-motor regions and determine whether modulations occur linearly and/or show hemispheric preference. Methods: Twelve healthy, right-handed adults participated in an fMRI task requiring overt repetition of commonly-used words/phrases at rates of 1, 2, and 3 syllables/second (syll./sec.). Results: Across all rates, bilateral activation was found both in ventral portions of primary sensorimotor cortex and middle and superior temporal regions. A repeated measures analysis of variance with pairwise comparisons revealed an overall difference between rates in temporal lobe regions of interest (ROIs) bilaterally ( p < 0.001); all six comparisons reached significance ( p < 0.05). Five of the six were highly significant ( p < 0.008), while the left-hemisphere 2- vs. 3-syll./sec. comparison, though still significant, was less robust ( p = 0.037). Temporal ROI mean beta-values increased linearly across the three rates bilaterally. Significant rate effects observed in the temporal lobes were slightly more pronounced in the right-hemisphere. No significant overall rate differences were seen in sensorimotor ROIs, nor was there a clear hemispheric effect. Conclusion: Linear effects in superior temporal ROIs suggest that sensory feedback corresponds directly to task demands. The lesser degree of significance in left-hemisphere activation at the faster, closer-to-normal rate may represent an increase in neural efficiency (and therefore, decreased demand) when the task so closely approximates a highly-practiced function. The presence of significant bilateral activation during overt repetition of words/phrases at all three rates suggests that repetition-based speech production may draw support from either or both hemispheres. This bihemispheric redundancy in regions associated with speech-motor control and their sensitivity to changes in rate may play an important role in interventions for nonfluent aphasia and other fluency disorders, particularly when right-hemisphere structures are the sole remaining pathway for production of meaningful speech.
Neural systems for speech and song in autism

PubMed Central

Pantazatos, Spiro P.; Schneider, Harry

2012-01-01

Despite language disabilities in autism, music abilities are frequently preserved. Paradoxically, brain regions associated with these functions typically overlap, enabling investigation of neural organization supporting speech and song in autism. Neural systems sensitive to speech and song were compared in low-functioning autistic and age-matched control children using passive auditory stimulation during functional magnetic resonance and diffusion tensor imaging. Activation in left inferior frontal gyrus was reduced in autistic children relative to controls during speech stimulation, but was greater than controls during song stimulation. Functional connectivity for song relative to speech was also increased between left inferior frontal gyrus and superior temporal gyrus in autism, and large-scale connectivity showed increased frontal–posterior connections. Although fractional anisotropy of the left arcuate fasciculus was decreased in autistic children relative to controls, structural terminations of the arcuate fasciculus in inferior frontal gyrus were indistinguishable between autistic and control groups. Fractional anisotropy correlated with activity in left inferior frontal gyrus for both speech and song conditions. Together, these findings indicate that in autism, functional systems that process speech and song were more effectively engaged for song than for speech and projections of structural pathways associated with these functions were not distinguishable from controls. PMID:22298195
Visual Phonetic Processing Localized Using Speech and Non-Speech Face Gestures in Video and Point-Light Displays

PubMed Central

Bernstein, Lynne E.; Jiang, Jintao; Pantazis, Dimitrios; Lu, Zhong-Lin; Joshi, Anand

2011-01-01

The talking face affords multiple types of information. To isolate cortical sites with responsibility for integrating linguistically relevant visual speech cues, speech and non-speech face gestures were presented in natural video and point-light displays during fMRI scanning at 3.0T. Participants with normal hearing viewed the stimuli and also viewed localizers for the fusiform face area (FFA), the lateral occipital complex (LOC), and the visual motion (V5/MT) regions of interest (ROIs). The FFA, the LOC, and V5/MT were significantly less activated for speech relative to non-speech and control stimuli. Distinct activation of the posterior superior temporal sulcus and the adjacent middle temporal gyrus to speech, independent of media, was obtained in group analyses. Individual analyses showed that speech and non-speech stimuli were associated with adjacent but different activations, with the speech activations more anterior. We suggest that the speech activation area is the temporal visual speech area (TVSA), and that it can be localized with the combination of stimuli used in this study. PMID:20853377
Auditory-neurophysiological responses to speech during early childhood: Effects of background noise

PubMed Central

White-Schwoch, Travis; Davies, Evan C.; Thompson, Elaine C.; Carr, Kali Woodruff; Nicol, Trent; Bradlow, Ann R.; Kraus, Nina

2015-01-01

Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But learning rarely occurs under ideal listening conditions—children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3–5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features—even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response properties in this age group. These normative metrics may be useful clinically to evaluate auditory processing difficulties during early childhood. PMID:26113025

Decreased integrity of the fronto-temporal fibers of the left inferior occipito-frontal fasciculus associated with auditory verbal hallucinations in schizophrenia.

PubMed

Oestreich, Lena K L; McCarthy-Jones, Simon; Whitford, Thomas J

2016-06-01

Auditory verbal hallucinations (AVH) have been proposed to result from altered connectivity between frontal speech production regions and temporal speech perception regions. Whilst the dorsal language pathway, serviced by the arcuate fasciculus, has been extensively studied in relation to AVH, the ventral language pathway, serviced by the inferior occipito-frontal fasciculus (IOFF) has been rarely studied in relation to AVH. This study examined whether structural changes in anatomically defined subregions of the IOFF were associated with AVH in patients with schizophrenia. Diffusion tensor imaging scans and clinical data were obtained from the Australian Schizophrenia Research Bank for 113 schizophrenia patients, of whom 39 had lifetime experience of AVH (18 had current AVH, 21 had remitted AVH), 74 had no lifetime experience of AVH, and 40 healthy controls. Schizophrenia patients with a lifetime experience of AVH exhibited reduced fractional anisotropy (FA) in the fronto-temporal fibers of the left IOFF compared to both healthy controls and schizophrenia patients without AVH. In contrast, structural abnormalities in the temporal and occipital regions of the IOFF were observed bilaterally in both patient groups, relative to the healthy controls. These results suggest that while changes in the structural integrity of the bilateral IOFF are associated with schizophrenia per se, integrity reductions in the fronto-temporal fibers of the left IOFF may be specifically associated with AVH.
The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise

ERIC Educational Resources Information Center

Shen, Jung; Souza, Pamela E.

2017-01-01

Purpose: This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for…
Auditory hallucinations and the temporal cortical response to speech in schizophrenia: a functional magnetic resonance imaging study.

PubMed

Woodruff, P W; Wright, I C; Bullmore, E T; Brammer, M; Howard, R J; Williams, S C; Shapleske, J; Rossell, S; David, A S; McGuire, P K; Murray, R M

1997-12-01

The authors explored whether abnormal functional lateralization of temporal cortical language areas in schizophrenia was associated with a predisposition to auditory hallucinations and whether the auditory hallucinatory state would reduce the temporal cortical response to external speech. Functional magnetic resonance imaging was used to measure the blood-oxygenation-level-dependent signal induced by auditory perception of speech in three groups of male subjects: eight schizophrenic patients with a history of auditory hallucinations (trait-positive), none of whom was currently hallucinating; seven schizophrenic patients without such a history (trait-negative); and eight healthy volunteers. Seven schizophrenic patients were also examined while they were actually experiencing severe auditory verbal hallucinations and again after their hallucinations had diminished. Voxel-by-voxel comparison of the median power of subjects' responses to periodic external speech revealed that this measure was reduced in the left superior temporal gyrus but increased in the right middle temporal gyrus in the combined schizophrenic groups relative to the healthy comparison group. Comparison of the trait-positive and trait-negative patients revealed no clear difference in the power of temporal cortical activation. Comparison of patients when experiencing severe hallucinations and when hallucinations were mild revealed reduced responsivity of the temporal cortex, especially the right middle temporal gyrus, to external speech during the former state. These results suggest that schizophrenia is associated with a reduced left and increased right temporal cortical response to auditory perception of speech, with little distinction between patients who differ in their vulnerability to hallucinations. The auditory hallucinatory state is associated with reduced activity in temporal cortical regions that overlap with those that normally process external speech, possibly because of competition for common neurophysiological resources.
Prosodic structure shapes the temporal realization of intonation and manual gesture movements.

PubMed

Esteve-Gibert, Núria; Prieto, Pilar

2013-06-01

Previous work on the temporal coordination between gesture and speech found that the prominence in gesture coordinates with speech prominence. In this study, the authors investigated the anchoring regions in speech and pointing gesture that align with each other. The authors hypothesized that (a) in contrastive focus conditions, the gesture apex is anchored in the intonation peak and (b) the upcoming prosodic boundary influences the timing of gesture and intonation movements. Fifteen Catalan speakers pointed at a screen while pronouncing a target word with different metrical patterns in a contrastive focus condition and followed by a phrase boundary. A total of 702 co-speech deictic gestures were acoustically and gesturally analyzed. Intonation peaks and gesture apexes showed parallel behavior with respect to their position within the accented syllable: They occurred at the end of the accented syllable in non-phrase-final position, whereas they occurred well before the end of the accented syllable in phrase-final position. Crucially, the position of intonation peaks and gesture apexes was correlated and was bound by prosodic structure. The results refine the phonological synchronization rule (McNeill, 1992), showing that gesture apexes are anchored in intonation peaks and that gesture and prosodic movements are bound by prosodic phrasing.
Rise time and formant transition duration in the discrimination of speech sounds: the Ba-Wa distinction in developmental dyslexia.

PubMed

Goswami, Usha; Fosker, Tim; Huss, Martina; Mead, Natasha; Szucs, Dénes

2011-01-01

Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory processing of brief, rapidly successive acoustic changes is compromised in dyslexia, thereby affecting phonetic discrimination (e.g. discriminating /b/ from /d/) via impaired discrimination of formant transitions (rapid acoustic changes in frequency and intensity). However, an alternative auditory temporal hypothesis is that the basic auditory processing of the slower amplitude modulation cues in speech is compromised (Goswami et al., 2002). Here, we contrast children's perception of a synthetic speech contrast (ba/wa) when it is based on the speed of the rate of change of frequency information (formant transition duration) versus the speed of the rate of change of amplitude modulation (rise time). We show that children with dyslexia have excellent phonetic discrimination based on formant transition duration, but poor phonetic discrimination based on envelope cues. The results explain why phonetic discrimination may be allophonic in developmental dyslexia (Serniclaes et al., 2004), and suggest new avenues for the remediation of developmental dyslexia. © 2010 Blackwell Publishing Ltd.
Rhythmic Priming Enhances the Phonological Processing of Speech

ERIC Educational Resources Information Center

Cason, Nia; Schon, Daniele

2012-01-01

While natural speech does not possess the same degree of temporal regularity found in music, there is recent evidence to suggest that temporal regularity enhances speech processing. The aim of this experiment was to examine whether speech processing would be enhanced by the prior presentation of a rhythmical prime. We recorded electrophysiological…
Why middle-aged listeners have trouble hearing in everyday settings.

PubMed

Ruggles, Dorea; Bharadwaj, Hari; Shinn-Cunningham, Barbara G

2012-08-07

Anecdotally, middle-aged listeners report difficulty conversing in social settings, even when they have normal audiometric thresholds [1-3]. Moreover, young adult listeners with "normal" hearing vary in their ability to selectively attend to speech amid similar streams of speech. Ignoring age, these individual differences correlate with physiological differences in temporal coding precision present in the auditory brainstem, suggesting that the fidelity of encoding of suprathreshold sound helps explain individual differences [4]. Here, we revisit the conundrum of whether early aging influences an individual's ability to communicate in everyday settings. Although absolute selective attention ability is not predicted by age, reverberant energy interferes more with selective attention as age increases. Breaking the brainstem response down into components corresponding to coding of stimulus fine structure and envelope, we find that age alters which brainstem component predicts performance. Specifically, middle-aged listeners appear to rely heavily on temporal fine structure, which is more disrupted by reverberant energy than temporal envelope structure is. In contrast, the fidelity of envelope cues predicts performance in younger adults. These results hint that temporal envelope cues influence spatial hearing in reverberant settings more than is commonly appreciated and help explain why middle-aged listeners have particular difficulty communicating in daily life. Copyright © 2012 Elsevier Ltd. All rights reserved.
Spatiotemporal imaging of cortical activation during verb generation and picture naming.

PubMed

Edwards, Erik; Nagarajan, Srikantan S; Dalal, Sarang S; Canolty, Ryan T; Kirsch, Heidi E; Barbaro, Nicholas M; Knight, Robert T

2010-03-01

One hundred and fifty years of neurolinguistic research has identified the key structures in the human brain that support language. However, neither the classic neuropsychological approaches introduced by Broca (1861) and Wernicke (1874), nor modern neuroimaging employing PET and fMRI has been able to delineate the temporal flow of language processing in the human brain. We recorded the electrocorticogram (ECoG) from indwelling electrodes over left hemisphere language cortices during two common language tasks, verb generation and picture naming. We observed that the very high frequencies of the ECoG (high-gamma, 70-160 Hz) track language processing with spatial and temporal precision. Serial progression of activations is seen at a larger timescale, showing distinct stages of perception, semantic association/selection, and speech production. Within the areas supporting each of these larger processing stages, parallel (or "incremental") processing is observed. In addition to the traditional posterior vs. anterior localization for speech perception vs. production, we provide novel evidence for the role of premotor cortex in speech perception and of Wernicke's and surrounding cortex in speech production. The data are discussed with regards to current leading models of speech perception and production, and a "dual ventral stream" hybrid of leading speech perception models is given. Copyright (c) 2009 Elsevier Inc. All rights reserved.
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

PubMed

Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
An attention-gating recurrent working memory architecture for emergent speech representation

NASA Astrophysics Data System (ADS)

Elshaw, Mark; Moore, Roger K.; Klein, Michael

2010-06-01

This paper describes an attention-gating recurrent self-organising map approach for emergent speech representation. Inspired by evidence from human cognitive processing, the architecture combines two main neural components. The first component, the attention-gating mechanism, uses actor-critic learning to perform selective attention towards speech. Through this selective attention approach, the attention-gating mechanism controls access to working memory processing. The second component, the recurrent self-organising map memory, develops a temporal-distributed representation of speech using phone-like structures. Representing speech in terms of phonetic features in an emergent self-organised fashion, according to research on child cognitive development, recreates the approach found in infants. Using this representational approach, in a fashion similar to infants, should improve the performance of automatic recognition systems through aiding speech segmentation and fast word learning.
Effects of sensorineural hearing loss on temporal coding of narrowband and broadband signals in the auditory periphery

PubMed Central

Henry, Kenneth S.; Heinz, Michael G.

2013-01-01

People with sensorineural hearing loss have substantial difficulty understanding speech under degraded listening conditions. Behavioral studies suggest that this difficulty may be caused by changes in auditory processing of the rapidly-varying temporal fine structure (TFS) of acoustic signals. In this paper, we review the presently known effects of sensorineural hearing loss on processing of TFS and slower envelope modulations in the peripheral auditory system of mammals. Cochlear damage has relatively subtle effects on phase locking by auditory-nerve fibers to the temporal structure of narrowband signals under quiet conditions. In background noise, however, sensorineural loss does substantially reduce phase locking to the TFS of pure-tone stimuli. For auditory processing of broadband stimuli, sensorineural hearing loss has been shown to severely alter the neural representation of temporal information along the tonotopic axis of the cochlea. Notably, auditory-nerve fibers innervating the high-frequency part of the cochlea grow increasingly responsive to low-frequency TFS information and less responsive to temporal information near their characteristic frequency (CF). Cochlear damage also increases the correlation of the response to TFS across fibers of varying CF, decreases the traveling-wave delay between TFS responses of fibers with different CFs, and can increase the range of temporal modulation frequencies encoded in the periphery for broadband sounds. Weaker neural coding of temporal structure in background noise and degraded coding of broadband signals along the tonotopic axis of the cochlea are expected to contribute considerably to speech perception problems in people with sensorineural hearing loss. PMID:23376018
Cortical activation patterns correlate with speech understanding after cochlear implantation

PubMed Central

Olds, Cristen; Pollonini, Luca; Abaya, Homer; Larky, Jannine; Loy, Megan; Bortfeld, Heather; Beauchamp, Michael S.; Oghalai, John S.

2015-01-01

Objectives Cochlear implants are a standard therapy for deafness, yet the ability of implanted patients to understand speech varies widely. To better understand this variability in outcomes, we used functional near-infrared spectroscopy (fNIRS) to image activity within regions of the auditory cortex and compare the results to behavioral measures of speech perception. Design We studied 32 deaf adults hearing through cochlear implants and 35 normal-hearing controls. We used fNIRS to measure responses within the lateral temporal lobe and the superior temporal gyrus to speech stimuli of varying intelligibility. The speech stimuli included normal speech, channelized speech (vocoded into 20 frequency bands), and scrambled speech (the 20 frequency bands were shuffled in random order). We also used environmental sounds as a control stimulus. Behavioral measures consisted of the Speech Reception Threshold, CNC words, and AzBio Sentence tests measured in quiet. Results Both control and implanted participants with good speech perception exhibited greater cortical activations to natural speech than to unintelligible speech. In contrast, implanted participants with poor speech perception had large, indistinguishable cortical activations to all stimuli. The ratio of cortical activation to normal speech to that of scrambled speech directly correlated with the CNC Words and AzBio Sentences scores. This pattern of cortical activation was not correlated with auditory threshold, age, side of implantation, or time after implantation. Turning off the implant reduced cortical activations in all implanted participants. Conclusions Together, these data indicate that the responses we measured within the lateral temporal lobe and the superior temporal gyrus correlate with behavioral measures of speech perception, demonstrating a neural basis for the variability in speech understanding outcomes after cochlear implantation. PMID:26709749
Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem

PubMed Central

Liu, Xunying; Zhang, Chao; Woodland, Phil; Fonteneau, Elisabeth

2017-01-01

There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR) systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental ‘machine states’, generated as the ASR analysis progresses over time, to the incremental ‘brain states’, measured using combined electro- and magneto-encephalography (EMEG), generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain. PMID:28945744
Speech and motor disturbances in Rett syndrome.

PubMed

Bashina, V M; Simashkova, N V; Grachev, V V; Gorbachevskaya, N L

2002-01-01

Rett syndrome is a severe, genetically determined disease of early childhood which produces a defined clinical phenotype in girls. The main clinical manifestations include lesions affecting speech functions, involving both expressive and receptive speech, as well as motor functions, producing apraxia of the arms and profound abnormalities of gait in the form of ataxia-apraxia. Most investigators note that patients have variability in the severity of derangement to large motor acts and in the damage to fine hand movements and speech functions. The aims of the present work were to study disturbances of speech and motor functions over 2-5 years in 50 girls aged 12 months to 14 years with Rett syndrome and to analyze the correlations between these disturbances. The results of comparing clinical data and EEG traces supported the stepwise involvement of frontal and parietal-temporal cortical structures in the pathological process. The ability to organize speech and motor activity is affected first, with subsequent development of lesions to gnostic functions, which are in turn followed by derangement of subcortical structures and the cerebellum and later by damage to structures in the spinal cord. A clear correlation was found between the severity of lesions to motor and speech functions and neurophysiological data: the higher the level of preservation of elements of speech and motor functions, the smaller were the contributions of theta activity and the greater the contributions of alpha and beta activities to the EEG. The possible pathogenetic mechanisms underlying the motor and speech disturbances in Rett syndrome are discussed.
Auditory Temporal Structure Processing in Dyslexia: Processing of Prosodic Phrase Boundaries Is Not Impaired in Children with Dyslexia

ERIC Educational Resources Information Center

Geiser, Eveline; Kjelgaard, Margaret; Christodoulou, Joanna A.; Cyr, Abigail; Gabrieli, John D. E.

2014-01-01

Reading disability in children with dyslexia has been proposed to reflect impairment in auditory timing perception. We investigated one aspect of timing perception--"temporal grouping"--as present in prosodic phrase boundaries of natural speech, in age-matched groups of children, ages 6-8 years, with and without dyslexia. Prosodic phrase…
Perception of temporally modified speech in auditory neuropathy.

PubMed

Hassan, Dalia Mohamed

2011-01-01

Disrupted auditory nerve activity in auditory neuropathy (AN) significantly impairs the sequential processing of auditory information, resulting in poor speech perception. This study investigated the ability of AN subjects to perceive temporally modified consonant-vowel (CV) pairs and shed light on their phonological awareness skills. Four Arabic CV pairs were selected: /ki/-/gi/, /to/-/do/, /si/-/sti/ and /so/-/zo/. The formant transitions in consonants and the pauses between CV pairs were prolonged. Rhyming, segmentation and blending skills were tested using words at a natural rate of speech and with prolongation of the speech stream. Fourteen adult AN subjects were compared to a matched group of cochlear-impaired patients in their perception of acoustically processed speech. The AN group distinguished the CV pairs at a low speech rate, in particular with modification of the consonant duration. Phonological awareness skills deteriorated in adult AN subjects but improved with prolongation of the speech inter-syllabic time interval. A rehabilitation program for AN should consider temporal modification of speech, training for auditory temporal processing and the use of devices with innovative signal processing schemes. Verbal modifications as well as visual imaging appear to be promising compensatory strategies for remediating the affected phonological processing skills.
High-gamma band fronto-temporal coherence as a measure of functional connectivity in speech motor control.

PubMed

Kingyon, J; Behroozmand, R; Kelley, R; Oya, H; Kawasaki, H; Narayanan, N S; Greenlee, J D W

2015-10-01

The neural basis of human speech is unclear. Intracranial electrophysiological recordings have revealed that high-gamma band oscillations (70-150Hz) are observed in the frontal lobe during speech production and in the temporal lobe during speech perception. Here, we tested the hypothesis that the frontal and temporal brain regions had high-gamma coherence during speech. We recorded electrocorticography (ECoG) from the frontal and temporal cortices of five humans who underwent surgery for medically intractable epilepsy, and studied coherence between the frontal and temporal cortex during vocalization and playback of vocalization. We report two novel results. First, we observed high-gamma band as well as theta (4-8Hz) coherence between frontal and temporal lobes. Second, both high-gamma and theta coherence were stronger when subjects were actively vocalizing as compared to playback of the same vocalizations. These findings provide evidence that coupling between sensory-motor networks measured by high-gamma coherence plays a key role in feedback-based monitoring and control of vocal output for human vocalization. Copyright © 2015 IBRO. Published by Elsevier Ltd. All rights reserved.
Flexible retrospective selection of temporal resolution in real-time speech MRI using a golden-ratio spiral view order.

PubMed

Kim, Yoon-Chul; Narayanan, Shrikanth S; Nayak, Krishna S

2011-05-01

In speech production research using real-time magnetic resonance imaging (MRI), the analysis of articulatory dynamics is performed retrospectively. A flexible selection of temporal resolution is highly desirable because of natural variations in speech rate and variations in the speed of different articulators. The purpose of the study is to demonstrate a first application of golden-ratio spiral temporal view order to real-time speech MRI and investigate its performance by comparison with conventional bit-reversed temporal view order. Golden-ratio view order proved to be more effective at capturing the dynamics of rapid tongue tip motion. A method for automated blockwise selection of temporal resolution is presented that enables the synthesis of a single video from multiple temporal resolution videos and potentially facilitates subsequent vocal tract shape analysis. Copyright © 2010 Wiley-Liss, Inc.
Assessment of spectral and temporal resolution in cochlear implant users using psychoacoustic discrimination and speech cue categorization

PubMed Central

Winn, Matthew B.; Won, Jong Ho; Moon, Il Joon

2016-01-01

Objectives This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). We hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. We further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Design Nineteen CI listeners and 10 listeners with normal hearing (NH) participated in a suite of tasks that included spectral ripple discrimination (SRD), temporal modulation detection (TMD), and syllable categorization, which was split into a spectral-cue-based task (targeting the /ba/-/da/ contrast) and a timing-cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated in order to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression in order to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for CI listeners. Results CI users were generally less successful at utilizing both spectral and temporal cues for categorization compared to listeners with normal hearing. For the CI listener group, SRD was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. TMD using 100 Hz and 10 Hz modulated noise was not correlated with the CI subjects’ categorization of VOT, nor with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. Conclusions When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart non-linguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (VOT) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language. PMID:27438871
S-Ketamine-Induced NMDA Receptor Blockade during Natural Speech Production and Its Implications for Formal Thought Disorder in Schizophrenia: A Pharmaco-fMRI Study.

PubMed

Nagels, Arne; Cabanis, Maurice; Oppel, Andrea; Kirner-Veselinovic, Andre; Schales, Christian; Kircher, Tilo

2018-05-01

Structural and functional changes in the lateral temporal language areas have been related to formal thought disorder (FTD) in schizophrenia. Continuous, natural speech production activates the right lateral temporal lobe in schizophrenia, as opposed to the left in healthy subjects. Positive and negative FTD can be elicited in healthy subjects by glutamatergic NMDA blockade with ketamine. It is unclear whether the glutamate system is related to the reversed hemispheric lateralization during speaking in patients. In a double-blind, crossover, placebo-controlled study, 15 healthy, male, right-handed volunteers overtly described 7 pictures for 3 min each while BOLD signal changes were acquired with fMRI. As a measure of linguistic demand, the number of words within 20 s epochs was correlated with BOLD responses. Participants developed S-ketamine-induced psychotic symptoms, particularly positive FTD. Ketamine vs placebo was associated with enhanced neural responses in the right middle and inferior temporal gyri. Similar to a previous fMRI study in schizophrenia patients vs healthy controls applying the same design, S-ketamine reversed functional lateralization during speech production in healthy subjects. Results demonstrate an association between glutamatergic imbalance, dysactivations in lateral temporal brain areas, and FTD symptom formation.

Cross-Modal Matching of Audio-Visual German and French Fluent Speech in Infancy

PubMed Central

Kubicek, Claudia; Hillairet de Boisferon, Anne; Dupierrix, Eve; Pascalis, Olivier; Lœvenbruck, Hélène; Gervain, Judit; Schwarzer, Gudrun

2014-01-01

The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants’ audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matching native as well as non-native audio and visual speech stimuli, whereas 6-month-olds perceived the audio-visual correspondence of native language stimuli only. This suggests that intersensory matching narrows for fluent speech between 4.5 and 6 months of age. In Experiment 2, auditory and visual speech information was presented simultaneously, therefore, providing temporal synchrony cues. Here, 6-month-olds were found to match native as well as non-native speech indicating facilitation of temporal synchrony cues on the intersensory perception of non-native fluent speech. Intriguingly, despite the fact that audio and visual stimuli cohered temporally, 12-month-olds matched the non-native language only. Results were discussed with regard to multisensory perceptual narrowing during the first year of life. PMID:24586651
Correlation of vocals and lyrics with left temporal musicogenic epilepsy.

PubMed

Tseng, Wei-En J; Lim, Siew-Na; Chen, Lu-An; Jou, Shuo-Bin; Hsieh, Hsiang-Yao; Cheng, Mei-Yun; Chang, Chun-Wei; Li, Han-Tao; Chiang, Hsing-I; Wu, Tony

2018-03-15

Whether the cognitive processing of music and speech relies on shared or distinct neuronal mechanisms remains unclear. Music and language processing in the brain are right and left temporal functions, respectively. We studied patients with musicogenic epilepsy (ME) that was specifically triggered by popular songs to analyze brain hyperexcitability triggered by specific stimuli. The study included two men and one woman (all right-handed, aged 35-55 years). The patients had sound-triggered left temporal ME in response to popular songs with vocals, but not to instrumental, classical, or nonvocal piano solo versions of the same song. Sentimental lyrics, high-pitched singing, specificity/familiarity, and singing in the native language were the most significant triggering factors. We found that recognition of the human voice and analysis of lyrics are important causal factors in left temporal ME and provide observational evidence that sounds with speech structure are predominantly processed in the left temporal lobe. A literature review indicated that language-associated stimuli triggered ME in the left temporal epileptogenic zone at a nearly twofold higher rate compared with the right temporal region. Further research on ME may enhance understanding of the cognitive neuroscience of music. © 2018 New York Academy of Sciences.
Primitive Auditory Memory Is Correlated with Spatial Unmasking That Is Based on Direct-Reflection Integration

PubMed Central

Li, Huahui; Kong, Lingzhi; Wu, Xihong; Li, Liang

2013-01-01

In reverberant rooms with multiple-people talking, spatial separation between speech sources improves recognition of attended speech, even though both the head-shadowing and interaural-interaction unmasking cues are limited by numerous reflections. It is the perceptual integration between the direct wave and its reflections that bridges the direct-reflection temporal gaps and results in the spatial unmasking under reverberant conditions. This study further investigated (1) the temporal dynamic of the direct-reflection-integration-based spatial unmasking as a function of the reflection delay, and (2) whether this temporal dynamic is correlated with the listeners’ auditory ability to temporally retain raw acoustic signals (i.e., the fast decaying primitive auditory memory, PAM). The results showed that recognition of the target speech against the speech-masker background is a descending exponential function of the delay of the simulated target reflection. In addition, the temporal extent of PAM is frequency dependent and markedly longer than that for perceptual fusion. More importantly, the temporal dynamic of the speech-recognition function is significantly correlated with the temporal extent of the PAM of low-frequency raw signals. Thus, we propose that a chain process, which links the earlier-stage PAM with the later-stage correlation computation, perceptual integration, and attention facilitation, plays a role in spatially unmasking target speech under reverberant conditions. PMID:23658664
Visual Benefits in Apparent Motion Displays: Automatically Driven Spatial and Temporal Anticipation Are Partially Dissociated

PubMed Central

Ahrens, Merle-Marie; Veniero, Domenica; Gross, Joachim; Harvey, Monika; Thut, Gregor

2015-01-01

Many behaviourally relevant sensory events such as motion stimuli and speech have an intrinsic spatio-temporal structure. This will engage intentional and most likely unintentional (automatic) prediction mechanisms enhancing the perception of upcoming stimuli in the event stream. Here we sought to probe the anticipatory processes that are automatically driven by rhythmic input streams in terms of their spatial and temporal components. To this end, we employed an apparent visual motion paradigm testing the effects of pre-target motion on lateralized visual target discrimination. The motion stimuli either moved towards or away from peripheral target positions (valid vs. invalid spatial motion cueing) at a rhythmic or arrhythmic pace (valid vs. invalid temporal motion cueing). Crucially, we emphasized automatic motion-induced anticipatory processes by rendering the motion stimuli non-predictive of upcoming target position (by design) and task-irrelevant (by instruction), and by creating instead endogenous (orthogonal) expectations using symbolic cueing. Our data revealed that the apparent motion cues automatically engaged both spatial and temporal anticipatory processes, but that these processes were dissociated. We further found evidence for lateralisation of anticipatory temporal but not spatial processes. This indicates that distinct mechanisms may drive automatic spatial and temporal extrapolation of upcoming events from rhythmic event streams. This contrasts with previous findings that instead suggest an interaction between spatial and temporal attention processes when endogenously driven. Our results further highlight the need for isolating intentional from unintentional processes for better understanding the various anticipatory mechanisms engaged in processing behaviourally relevant stimuli with predictable spatio-temporal structure such as motion and speech. PMID:26623650
Temporal Sensitivity Measured Shortly After Cochlear Implantation Predicts 6-Month Speech Recognition Outcome.

PubMed

Erb, Julia; Ludwig, Alexandra Annemarie; Kunke, Dunja; Fuchs, Michael; Obleser, Jonas

2018-04-24

Psychoacoustic tests assessed shortly after cochlear implantation are useful predictors of the rehabilitative speech outcome. While largely independent, both spectral and temporal resolution tests are important to provide an accurate prediction of speech recognition. However, rapid tests of temporal sensitivity are currently lacking. Here, we propose a simple amplitude modulation rate discrimination (AMRD) paradigm that is validated by predicting future speech recognition in adult cochlear implant (CI) patients. In 34 newly implanted patients, we used an adaptive AMRD paradigm, where broadband noise was modulated at the speech-relevant rate of ~4 Hz. In a longitudinal study, speech recognition in quiet was assessed using the closed-set Freiburger number test shortly after cochlear implantation (t0) as well as the open-set Freiburger monosyllabic word test 6 months later (t6). Both AMRD thresholds at t0 (r = -0.51) and speech recognition scores at t0 (r = 0.56) predicted speech recognition scores at t6. However, AMRD and speech recognition at t0 were uncorrelated, suggesting that those measures capture partially distinct perceptual abilities. A multiple regression model predicting 6-month speech recognition outcome with deafness duration and speech recognition at t0 improved from adjusted R = 0.30 to adjusted R = 0.44 when AMRD threshold was added as a predictor. These findings identify AMRD thresholds as a reliable, nonredundant predictor above and beyond established speech tests for CI outcome. This AMRD test could potentially be developed into a rapid clinical temporal-resolution test to be integrated into the postoperative test battery to improve the reliability of speech outcome prognosis.
Making sense of progressive non-fluent aphasia: an analysis of conversational speech

PubMed Central

Woollams, Anna M.; Hodges, John R.; Patterson, Karalyn

2009-01-01

The speech of patients with progressive non-fluent aphasia (PNFA) has often been described clinically, but these descriptions lack support from quantitative data. The clinical classification of the progressive aphasic syndromes is also debated. This study selected 15 patients with progressive aphasia on broad criteria, excluding only those with clear semantic dementia. It aimed to provide a detailed quantitative description of their conversational speech, along with cognitive testing and visual rating of structural brain imaging, and to examine which, if any features were consistently present throughout the group; as well as looking for sub-syndromic associations between these features. A consistent increase in grammatical and speech sound errors and a simplification of spoken syntax relative to age-matched controls were observed, though telegraphic speech was rare; slow speech was common but not universal. Almost all patients showed impairments in picture naming, syntactic comprehension and executive function. The degree to which speech was affected was independent of the severity of the other cognitive deficits. A partial dissociation was also observed between slow speech with simplified grammar on the one hand, and grammatical and speech sound errors on the other. Overlap between these sets of impairments was however, the rule rather than the exception, producing continuous variation within a single consistent syndrome. The distribution of atrophy was remarkably variable, with frontal, temporal and medial temporal areas affected, either symmetrically or asymmetrically. The study suggests that PNFA is a coherent, well-defined syndrome and that varieties such as logopaenic progressive aphasia and progressive apraxia of speech may be seen as points in a space of continuous variation within progressive non-fluent aphasia. PMID:19696033
Expertise with artificial non-speech sounds recruits speech-sensitive cortical regions

PubMed Central

Leech, Robert; Holt, Lori L.; Devlin, Joseph T.; Dick, Frederic

2009-01-01

Regions of the human temporal lobe show greater activation for speech than for other sounds. These differences may reflect intrinsically specialized domain-specific adaptations for processing speech, or they may be driven by the significant expertise we have in listening to the speech signal. To test the expertise hypothesis, we used a video-game-based paradigm that tacitly trained listeners to categorize acoustically complex, artificial non-linguistic sounds. Before and after training, we used functional MRI to measure how expertise with these sounds modulated temporal lobe activation. Participants’ ability to explicitly categorize the non-speech sounds predicted the change in pre- to post-training activation in speech-sensitive regions of the left posterior superior temporal sulcus, suggesting that emergent auditory expertise may help drive this functional regionalization. Thus, seemingly domain-specific patterns of neural activation in higher cortical regions may be driven in part by experience-based restructuring of high-dimensional perceptual space. PMID:19386919
Contributions of local speech encoding and functional connectivity to audio-visual speech perception

PubMed Central

Giordano, Bruno L; Ince, Robin A A; Gross, Joachim; Schyns, Philippe G; Panzeri, Stefano; Kayser, Christoph

2017-01-01

Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments. DOI: http://dx.doi.org/10.7554/eLife.24763.001 PMID:28590903
Cortical Integration of Audio-Visual Information

PubMed Central

Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.

2013-01-01

We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442
Preferred Compression Speed for Speech and Music and Its Relationship to Sensitivity to Temporal Fine Structure.

PubMed

Moore, Brian C J; Sęk, Aleksander

2016-09-07

Multichannel amplitude compression is widely used in hearing aids. The preferred compression speed varies across individuals. Moore (2008) suggested that reduced sensitivity to temporal fine structure (TFS) may be associated with preference for slow compression. This idea was tested using a simulated hearing aid. It was also assessed whether preferences for compression speed depend on the type of stimulus: speech or music. Twenty-two hearing-impaired subjects were tested, and the stimulated hearing aid was fitted individually using the CAM2A method. On each trial, a given segment of speech or music was presented twice. One segment was processed with fast compression and the other with slow compression, and the order was balanced across trials. The subject indicated which segment was preferred and by how much. On average, slow compression was preferred over fast compression, more so for music, but there were distinct individual differences, which were highly correlated for speech and music. Sensitivity to TFS was assessed using the difference limen for frequency at 2000 Hz and by two measures of sensitivity to interaural phase at low frequencies. The results for the difference limens for frequency, but not the measures of sensitivity to interaural phase, supported the suggestion that preference for compression speed is affected by sensitivity to TFS. © The Author(s) 2016.
Temporal processing of speech in a time-feature space

NASA Astrophysics Data System (ADS)

Avendano, Carlos

1997-09-01

The performance of speech communication systems often degrades under realistic environmental conditions. Adverse environmental factors include additive noise sources, room reverberation, and transmission channel distortions. This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances. Speech reflects the geometry of the vocal organs, and the linguistically dominant component is in the shape of the vocal tract. At any given point in time, the shape of the vocal tract is reflected in the short-time spectral envelope of the speech signal. The rate of change of the vocal tract shape appears to be important for the identification of linguistic components. This rate of change, or the rate of change of the short-time spectral envelope can be described by the modulation spectrum, i.e. the spectrum of the time trajectories described by the short-time spectral envelope. For a wide range of frequency bands, the modulation spectrum of speech exhibits a maximum at about 4 Hz, the average syllabic rate. Disturbances often have modulation frequency components outside the speech range, and could in principle be attenuated without significantly affecting the range with relevant linguistic information. Early efforts for exploiting the modulation spectrum domain (temporal processing), such as the dynamic cepstrum or the RASTA processing, used ad hoc designed processing and appear to be suboptimal. As a major contribution, in this dissertation we aim for a systematic data-driven design of temporal processing. First we analytically derive and discuss some properties and merits of temporal processing for speech signals. We attempt to formalize the concept and provide a theoretical background which has been lacking in the field. In the experimental part we apply temporal processing to a number of problems including adaptive noise reduction in cellular telephone environments, reduction of reverberation for speech enhancement, and improvements on automatic recognition of speech degraded by linear distortions and reverberation.
Near-Term Fetuses Process Temporal Features of Speech

ERIC Educational Resources Information Center

Granier-Deferre, Carolyn; Ribeiro, Aurelie; Jacquet, Anne-Yvonne; Bassereau, Sophie

2011-01-01

The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or…
Status report on speech research. A report on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications

NASA Astrophysics Data System (ADS)

Liberman, A. M.

1984-08-01

This report (1 January-30 June) is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: Sources of variability in early speech development; Invariance: Functional or descriptive?; Brief comments on invariance in phonetic perception; Phonetic category boundaries are flexible; On categorizing asphasic speech errors; Universal and language particular aspects of vowel-to-vowel coarticulation; Functional specific articulatory cooperation following jaw perturbation; during speech: Evidence for coordinative structures; Formant integration and the perception of nasal vowel height; Relative power of cues: FO shifts vs. voice timing; Laryngeal management at utterance-internal word boundary in American English; Closure duration and release burst amplitude cues to stop consonant manner and place of articulation; Effects of temporal stimulus properties on perception of the (sl)-(spl) distinction; The physics of controlled conditions: A reverie about locomotion; On the perception of intonation from sinusoidal sentences; Speech Perception; Speech Articulation; Motor Control; Speech Development.
The functional neuroanatomy of language

NASA Astrophysics Data System (ADS)

Hickok, Gregory

2009-09-01

There has been substantial progress over the last several years in understanding aspects of the functional neuroanatomy of language. Some of these advances are summarized in this review. It will be argued that recognizing speech sounds is carried out in the superior temporal lobe bilaterally, that the superior temporal sulcus bilaterally is involved in phonological-level aspects of this process, that the frontal/motor system is not central to speech recognition although it may modulate auditory perception of speech, that conceptual access mechanisms are likely located in the lateral posterior temporal lobe (middle and inferior temporal gyri), that speech production involves sensory-related systems in the posterior superior temporal lobe in the left hemisphere, that the interface between perceptual and motor systems is supported by a sensory-motor circuit for vocal tract actions (not dedicated to speech) that is very similar to sensory-motor circuits found in primate parietal lobe, and that verbal short-term memory can be understood as an emergent property of this sensory-motor circuit. These observations are considered within the context of a dual stream model of speech processing in which one pathway supports speech comprehension and the other supports sensory-motor integration. Additional topics of discussion include the functional organization of the planum temporale for spatial hearing and speech-related sensory-motor processes, the anatomical and functional basis of a form of acquired language disorder, conduction aphasia, the neural basis of vocabulary development, and sentence-level/grammatical processing.
Abnormal Brain Dynamics Underlie Speech Production in Children with Autism Spectrum Disorder.

PubMed

Pang, Elizabeth W; Valica, Tatiana; MacDonald, Matt J; Taylor, Margot J; Brian, Jessica; Lerch, Jason P; Anagnostou, Evdokia

2016-02-01

A large proportion of children with autism spectrum disorder (ASD) have speech and/or language difficulties. While a number of structural and functional neuroimaging methods have been used to explore the brain differences in ASD with regards to speech and language comprehension and production, the neurobiology of basic speech function in ASD has not been examined. Magnetoencephalography (MEG) is a neuroimaging modality with high spatial and temporal resolution that can be applied to the examination of brain dynamics underlying speech as it can capture the fast responses fundamental to this function. We acquired MEG from 21 children with high-functioning autism (mean age: 11.43 years) and 21 age- and sex-matched controls as they performed a simple oromotor task, a phoneme production task and a phonemic sequencing task. Results showed significant differences in activation magnitude and peak latencies in primary motor cortex (Brodmann Area 4), motor planning areas (BA 6), temporal sequencing and sensorimotor integration areas (BA 22/13) and executive control areas (BA 9). Our findings of significant functional brain differences between these two groups on these simple oromotor and phonemic tasks suggest that these deficits may be foundational and could underlie the language deficits seen in ASD. © 2015 The Authors Autism Research published by Wiley Periodicals, Inc. on behalf of International Society for Autism Research.
The Neurobiological Grounding of Persistent Stuttering: from Structure to Function.

PubMed

Neef, Nicole E; Anwander, Alfred; Friederici, Angela D

2015-09-01

Neuroimaging and transcranial magnetic stimulation provide insights into the neuronal mechanisms underlying speech disfluencies in chronic persistent stuttering. In the present paper, the goal is not to provide an exhaustive review of existing literature, but rather to highlight robust findings. We, therefore, conducted a meta-analysis of diffusion tensor imaging studies which have recently implicated disrupted white matter connectivity in stuttering. A reduction of fractional anisotropy in persistent stuttering has been reported at several different loci. Our meta-analysis revealed consistent deficits in the left dorsal stream and in the interhemispheric connections between the sensorimotor cortices. In addition, recent fMRI meta-analyses link stuttering to reduced left fronto-parieto-temporal activation while greater fluency is associated with boosted co-activations of right fronto-parieto-temporal areas. However, the physiological foundation of these irregularities is not accessible with MRI. Complementary, transcranial magnetic stimulation (TMS) reveals local excitatory and inhibitory regulation of cortical dynamics. Applied to a speech motor area, TMS revealed reduced speech-planning-related neuronal dynamics at the level of the primary motor cortex in stuttering. Together, this review provides a focused view of the neurobiology of stuttering to date and may guide the rational design of future research. This future needs to account for the perpetual dynamic interactions between auditory, somatosensory, and speech motor circuits that shape fluent speech.
Do not throw out the baby with the bath water: choosing an effective baseline for a functional localizer of speech processing.

PubMed

Stoppelman, Nadav; Harpaz, Tamar; Ben-Shachar, Michal

2013-05-01

Speech processing engages multiple cortical regions in the temporal, parietal, and frontal lobes. Isolating speech-sensitive cortex in individual participants is of major clinical and scientific importance. This task is complicated by the fact that responses to sensory and linguistic aspects of speech are tightly packed within the posterior superior temporal cortex. In functional magnetic resonance imaging (fMRI), various baseline conditions are typically used in order to isolate speech-specific from basic auditory responses. Using a short, continuous sampling paradigm, we show that reversed ("backward") speech, a commonly used auditory baseline for speech processing, removes much of the speech responses in frontal and temporal language regions of adult individuals. On the other hand, signal correlated noise (SCN) serves as an effective baseline for removing primary auditory responses while maintaining strong signals in the same language regions. We show that the response to reversed speech in left inferior frontal gyrus decays significantly faster than the response to speech, thus suggesting that this response reflects bottom-up activation of speech analysis followed up by top-down attenuation once the signal is classified as nonspeech. The results overall favor SCN as an auditory baseline for speech processing.
Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications, January 1-March 31, 1981.

ERIC Educational Resources Information Center

Haskins Labs., New Haven, CT.

Research reports on the nature of speech, instrumentation for the investigation of speech, and practical application of research are included in this status report for January 1-March 31, 1981. The reports deal with the following topics: (1) distinguishing temporal information for speaking rate from temporal information for intervocalic stop…
Premotor neural correlates of predictive motor timing for speech production and hand movement: evidence for a temporal predictive code in the motor system.

PubMed

Johari, Karim; Behroozmand, Roozbeh

2017-05-01

The predictive coding model suggests that neural processing of sensory information is facilitated for temporally-predictable stimuli. This study investigated how temporal processing of visually-presented sensory cues modulates movement reaction time and neural activities in speech and hand motor systems. Event-related potentials (ERPs) were recorded in 13 subjects while they were visually-cued to prepare to produce a steady vocalization of a vowel sound or press a button in a randomized order, and to initiate the cued movement following the onset of a go signal on the screen. Experiment was conducted in two counterbalanced blocks in which the time interval between visual cue and go signal was temporally-predictable (fixed delay at 1000 ms) or unpredictable (variable between 1000 and 2000 ms). Results of the behavioral response analysis indicated that movement reaction time was significantly decreased for temporally-predictable stimuli in both speech and hand modalities. We identified premotor ERP activities with a left-lateralized parietal distribution for hand and a frontocentral distribution for speech that were significantly suppressed in response to temporally-predictable compared with unpredictable stimuli. The premotor ERPs were elicited approximately -100 ms before movement and were significantly correlated with speech and hand motor reaction times only in response to temporally-predictable stimuli. These findings suggest that the motor system establishes a predictive code to facilitate movement in response to temporally-predictable sensory stimuli. Our data suggest that the premotor ERP activities are robust neurophysiological biomarkers of such predictive coding mechanisms. These findings provide novel insights into the temporal processing mechanisms of speech and hand motor systems.
Integrating speech in time depends on temporal expectancies and attention.

PubMed

Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro

2017-08-01

Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.

Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

NASA Astrophysics Data System (ADS)

Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.
Situational influences on rhythmicity in speech, music, and their interaction

PubMed Central

Hawkins, Sarah

2014-01-01

Brain processes underlying the production and perception of rhythm indicate considerable flexibility in how physical signals are interpreted. This paper explores how that flexibility might play out in rhythmicity in speech and music. There is much in common across the two domains, but there are also significant differences. Interpretations are explored that reconcile some of the differences, particularly with respect to how functional properties modify the rhythmicity of speech, within limits imposed by its structural constraints. Functional and structural differences mean that music is typically more rhythmic than speech, and that speech will be more rhythmic when the emotions are more strongly engaged, or intended to be engaged. The influence of rhythmicity on attention is acknowledged, and it is suggested that local increases in rhythmicity occur at times when attention is required to coordinate joint action, whether in talking or music-making. Evidence is presented which suggests that while these short phases of heightened rhythmical behaviour are crucial to the success of transitions in communicative interaction, their modality is immaterial: they all function to enhance precise temporal prediction and hence tightly coordinated joint action. PMID:25385776
Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs

PubMed Central

ten Oever, Sanne; Sack, Alexander T.; Wheat, Katherine L.; Bien, Nina; van Atteveldt, Nienke

2013-01-01

Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception. PMID:23805110
Temporal and speech processing skills in normal hearing individuals exposed to occupational noise.

PubMed

Kumar, U Ajith; Ameenudin, Syed; Sangamanatha, A V

2012-01-01

Prolonged exposure to high levels of occupational noise can cause damage to hair cells in the cochlea and result in permanent noise-induced cochlear hearing loss. Consequences of cochlear hearing loss on speech perception and psychophysical abilities have been well documented. Primary goal of this research was to explore temporal processing and speech perception Skills in individuals who are exposed to occupational noise of more than 80 dBA and not yet incurred clinically significant threshold shifts. Contribution of temporal processing skills to speech perception in adverse listening situation was also evaluated. A total of 118 participants took part in this research. Participants comprised three groups of train drivers in the age range of 30-40 (n= 13), 41 50 ( = 13), 41-50 (n = 9), and 51-60 (n = 6) years and their non-noise-exposed counterparts (n = 30 in each age group). Participants of all the groups including the train drivers had hearing sensitivity within 25 dB HL in the octave frequencies between 250 and 8 kHz. Temporal processing was evaluated using gap detection, modulation detection, and duration pattern tests. Speech recognition was tested in presence multi-talker babble at -5dB SNR. Differences between experimental and control groups were analyzed using ANOVA and independent sample t-tests. Results showed a trend of reduced temporal processing skills in individuals with noise exposure. These deficits were observed despite normal peripheral hearing sensitivity. Speech recognition scores in the presence of noise were also significantly poor in noise-exposed group. Furthermore, poor temporal processing skills partially accounted for the speech recognition difficulties exhibited by the noise-exposed individuals. These results suggest that noise can cause significant distortions in the processing of suprathreshold temporal cues which may add to difficulties in hearing in adverse listening conditions.
Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs.

PubMed

Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

2013-01-01

Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.
Irregular Speech Rate Dissociates Auditory Cortical Entrainment, Evoked Responses, and Frontal Alpha

PubMed Central

Kayser, Stephanie J.; Ince, Robin A.A.; Gross, Joachim

2015-01-01

The entrainment of slow rhythmic auditory cortical activity to the temporal regularities in speech is considered to be a central mechanism underlying auditory perception. Previous work has shown that entrainment is reduced when the quality of the acoustic input is degraded, but has also linked rhythmic activity at similar time scales to the encoding of temporal expectations. To understand these bottom-up and top-down contributions to rhythmic entrainment, we manipulated the temporal predictive structure of speech by parametrically altering the distribution of pauses between syllables or words, thereby rendering the local speech rate irregular while preserving intelligibility and the envelope fluctuations of the acoustic signal. Recording EEG activity in human participants, we found that this manipulation did not alter neural processes reflecting the encoding of individual sound transients, such as evoked potentials. However, the manipulation significantly reduced the fidelity of auditory delta (but not theta) band entrainment to the speech envelope. It also reduced left frontal alpha power and this alpha reduction was predictive of the reduced delta entrainment across participants. Our results show that rhythmic auditory entrainment in delta and theta bands reflect functionally distinct processes. Furthermore, they reveal that delta entrainment is under top-down control and likely reflects prefrontal processes that are sensitive to acoustical regularities rather than the bottom-up encoding of acoustic features. SIGNIFICANCE STATEMENT The entrainment of rhythmic auditory cortical activity to the speech envelope is considered to be critical for hearing. Previous work has proposed divergent views in which entrainment reflects either early evoked responses related to sound encoding or high-level processes related to expectation or cognitive selection. Using a manipulation of speech rate, we dissociated auditory entrainment at different time scales. Specifically, our results suggest that delta entrainment is controlled by frontal alpha mechanisms and thus support the notion that rhythmic auditory cortical entrainment is shaped by top-down mechanisms. PMID:26538641
Atypical speech versus non-speech detection and discrimination in 4- to 6- yr old children with autism spectrum disorder: An ERP study.

PubMed

Galilee, Alena; Stefanidou, Chrysi; McCleery, Joseph P

2017-01-01

Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6-year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age.
Atypical speech versus non-speech detection and discrimination in 4- to 6- yr old children with autism spectrum disorder: An ERP study

PubMed Central

Stefanidou, Chrysi; McCleery, Joseph P.

2017-01-01

Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6—year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age. PMID:28738063
Timing in audiovisual speech perception: A mini review and new psychophysical data.

PubMed

Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory

2016-02-01

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.
Timing in Audiovisual Speech Perception: A Mini Review and New Psychophysical Data

PubMed Central

Venezia, Jonathan H.; Thurman, Steven M.; Matchin, William; George, Sahara E.; Hickok, Gregory

2015-01-01

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually-relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (∼35% identification of /apa/ compared to ∼5% in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually-relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (∼130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309
Distributed Processing and Cortical Specialization for Speech and Environmental Sounds in Human Temporal Cortex

ERIC Educational Resources Information Center

Leech, Robert; Saygin, Ayse Pinar

2011-01-01

Using functional MRI, we investigated whether auditory processing of both speech and meaningful non-linguistic environmental sounds in superior and middle temporal cortex relies on a complex and spatially distributed neural system. We found that evidence for spatially distributed processing of speech and environmental sounds in a substantial…
Effect of Age on Silent Gap Discrimination in Synthetic Speech Stimuli.

ERIC Educational Resources Information Center

Lister, Jennifer; Tarver, Kenton

2004-01-01

The difficulty that older listeners experience understanding conversational speech may be related to their limited ability to use information present in the silent intervals (i.e., temporal gaps) between dynamic speech sounds. When temporal gaps are present between nonspeech stimuli that are spectrally invariant (e.g., noise bands or sinusoids),…
Conveying Movement in Music and Prosody

PubMed Central

Hedger, Stephen C.; Nusbaum, Howard C.; Hoeckner, Berthold

2013-01-01

We investigated whether acoustic variation of musical properties can analogically convey descriptive information about an object. Specifically, we tested whether information from the temporal structure in music interacts with perception of a visual image to form an analog perceptual representation as a natural part of music perception. In Experiment 1, listeners heard music with an accelerating or decelerating temporal pattern, and then saw a picture of a still or moving object and decided whether it was animate or inanimate – a task unrelated to the patterning of the music. Object classification was faster when musical motion matched visually depicted motion. In Experiment 2, participants heard spoken sentences that were accompanied by accelerating or decelerating music, and then were presented with a picture of a still or moving object. When motion information in the music matched motion information in the picture, participants were similarly faster to respond. Fast and slow temporal patterns without acceleration and deceleration, however, did not make participants faster when they saw a picture depicting congruent motion information (Experiment 3), suggesting that understanding temporal structure information in music may depend on specific metaphors about motion in music. Taken together, these results suggest that visuo-spatial referential information can be analogically conveyed and represented by music and can be integrated with speech or influence the understanding of speech. PMID:24146920
Exploring the roles of spectral detail and intonation contour in speech intelligibility: an FMRI study.

PubMed

Kyong, Jeong S; Scott, Sophie K; Rosen, Stuart; Howe, Timothy B; Agnew, Zarinah K; McGettigan, Carolyn

2014-08-01

The melodic contour of speech forms an important perceptual aspect of tonal and nontonal languages and an important limiting factor on the intelligibility of speech heard through a cochlear implant. Previous work exploring the neural correlates of speech comprehension identified a left-dominant pathway in the temporal lobes supporting the extraction of an intelligible linguistic message, whereas the right anterior temporal lobe showed an overall preference for signals clearly conveying dynamic pitch information [Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123, 155-163, 2000; Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000]. The current study combined modulations of overall intelligibility (through vocoding and spectral inversion) with a manipulation of pitch contour (normal vs. falling) to investigate the processing of spoken sentences in functional MRI. Our overall findings replicate and extend those of Scott et al. [Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000], where greater sentence intelligibility was predominately associated with increased activity in the left STS, and the greatest response to normal sentence melody was found in right superior temporal gyrus. These data suggest a spatial distinction between brain areas associated with intelligibility and those involved in the processing of dynamic pitch information in speech. By including a set of complexity-matched unintelligible conditions created by spectral inversion, this is additionally the first study reporting a fully factorial exploration of spectrotemporal complexity and spectral inversion as they relate to the neural processing of speech intelligibility. Perhaps surprisingly, there was little evidence for an interaction between the two factors-we discuss the implications for the processing of sound and speech in the dorsolateral temporal lobes.
Identification of a pathway for intelligible speech in the left temporal lobe

PubMed Central

Scott, Sophie K.; Blank, C. Catrin; Rosen, Stuart; Wise, Richard J. S.

2017-01-01

Summary It has been proposed that the identification of sounds, including species-specific vocalizations, by primates depends on anterior projections from the primary auditory cortex, an auditory pathway analogous to the ventral route proposed for the visual identification of objects. We have identified a similar route in the human for understanding intelligible speech. Using PET imaging to identify separable neural subsystems within the human auditory cortex, we used a variety of speech and speech-like stimuli with equivalent acoustic complexity but varying intelligibility. We have demonstrated that the left superior temporal sulcus responds to the presence of phonetic information, but its anterior part only responds if the stimulus is also intelligible. This novel observation demonstrates a left anterior temporal pathway for speech comprehension. PMID:11099443
Perceptual weighting of the envelope and fine structure across frequency bands for sentence intelligibility: Effect of interruption at the syllabic-rate and periodic-rate of speech

PubMed Central

Fogerty, Daniel

2011-01-01

Listeners often only have fragments of speech available to understand the intended message due to competing background noise. In order to maximize successful speech recognition, listeners must allocate their perceptual resources to the most informative acoustic properties. The speech signal contains temporally-varying acoustics in the envelope and fine structure that are present across the frequency spectrum. Understanding how listeners perceptually weigh these acoustic properties in different frequency regions during interrupted speech is essential for the design of assistive listening devices. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for interrupted sentence materials. Perceptual weights were obtained during interruption at the syllabic rate (i.e., 4 Hz) and the periodic rate (i.e., 128 Hz) of speech. Potential interruption interactions with fundamental frequency information were investigated by shifting the natural pitch contour higher relative to the interruption rate. The availability of each acoustic property was varied independently by adding noise at different levels. Perceptual weights were determined by correlating a listener’s performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated similar relative weights across the interruption conditions, with emphasis on the envelope in high-frequencies. PMID:21786914
Spatio-temporal Dynamics of Audiovisual Speech Processing

PubMed Central

Bernstein, Lynne E.; Auer, Edward T.; Wagner, Michael; Ponton, Curtis W.

2007-01-01

The cortical processing of auditory-alone, visual-alone, and audiovisual speech information is temporally and spatially distributed, and functional magnetic resonance imaging (fMRI) cannot adequately resolve its temporal dynamics. In order to investigate a hypothesized spatio-temporal organization for audiovisual speech processing circuits, event-related potentials (ERPs) were recorded using electroencephalography (EEG). Stimuli were congruent audiovisual /bα/, incongruent auditory /bα/ synchronized with visual /gα/, auditory-only /bα/, and visual-only /bα/ and /gα/. Current density reconstructions (CDRs) of the ERP data were computed across the latency interval of 50-250 milliseconds. The CDRs demonstrated complex spatio-temporal activation patterns that differed across stimulus conditions. The hypothesized circuit that was investigated here comprised initial integration of audiovisual speech by the middle superior temporal sulcus (STS), followed by recruitment of the intraparietal sulcus (IPS), followed by activation of Broca's area (Miller and d'Esposito, 2005). The importance of spatio-temporally sensitive measures in evaluating processing pathways was demonstrated. Results showed, strikingly, early (< 100 msec) and simultaneous activations in areas of the supramarginal and angular gyrus (SMG/AG), the IPS, the inferior frontal gyrus, and the dorsolateral prefrontal cortex. Also, emergent left hemisphere SMG/AG activation, not predicted based on the unisensory stimulus conditions was observed at approximately 160 to 220 msec. The STS was neither the earliest nor most prominent activation site, although it is frequently considered the sine qua non of audiovisual speech integration. As discussed here, the relatively late activity of the SMG/AG solely under audiovisual conditions is a possible candidate audiovisual speech integration response. PMID:17920933
Universal and language-specific sublexical cues in speech perception: a novel electroencephalography-lesion approach.

PubMed

Obrig, Hellmuth; Mentzel, Julia; Rossi, Sonja

2016-06-01

SEE CAPPA DOI101093/BRAIN/AWW090 FOR A SCIENTIFIC COMMENTARY ON THIS ARTICLE : The phonological structure of speech supports the highly automatic mapping of sound to meaning. While it is uncontroversial that phonotactic knowledge acts upon lexical access, it is unclear at what stage these combinatorial rules, governing phonological well-formedness in a given language, shape speech comprehension. Moreover few studies have investigated the neuronal network affording this important step in speech comprehension. Therefore we asked 70 participants-half of whom suffered from a chronic left hemispheric lesion-to listen to 252 different monosyllabic pseudowords. The material models universal preferences of phonotactic well-formedness by including naturally spoken pseudowords and digitally reversed exemplars. The latter partially violate phonological structure of all human speech and are rich in universally dispreferred phoneme sequences while preserving basic auditory parameters. Language-specific constraints were modelled in that half of the naturally spoken pseudowords complied with the phonotactics of the native language of the monolingual participants (German) while the other half did not. To ensure universal well-formedness and naturalness, the latter stimuli comply with Slovak phonotactics and all stimuli were produced by an early bilingual speaker. To maximally attenuate lexico-semantic influences, transparent pseudowords were avoided and participants had to detect immediate repetitions, a task orthogonal to the contrasts of interest. The results show that phonological 'well-formedness' modulates implicit processing of speech at different levels: universally dispreferred phonological structure elicits early, medium and late latency differences in the evoked potential. On the contrary, the language-specific phonotactic contrast selectively modulates a medium latency component of the event-related potentials around 400 ms. Using a novel event-related potential-lesion approach allowed us to furthermore supply first evidence that implicit processing of these different phonotactic levels relies on partially separable brain areas in the left hemisphere: contrasting forward to reversed speech the approach delineated an area comprising supramarginal and angular gyri. Conversely, the contrast between legal versus illegal phonotactics consistently projected to anterior and middle portions of the middle temporal and superior temporal gyri. Our data support the notion that phonological structure acts on different stages of phonologically and lexically driven steps of speech comprehension. In the context of previous work we propose context-dependent sensitivity to different levels of phonotactic well-formedness. © The Author (2016). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Hearing and seeing meaning in speech and gesture: insights from brain and behaviour

PubMed Central

Özyürek, Aslı

2014-01-01

As we speak, we use not only the arbitrary form–meaning mappings of the speech channel but also motivated form–meaning correspondences, i.e. iconic gestures that accompany speech (e.g. inverted V-shaped hand wiggling across gesture space to demonstrate walking). This article reviews what we know about processing of semantic information from speech and iconic gestures in spoken languages during comprehension of such composite utterances. Several studies have shown that comprehension of iconic gestures involves brain activations known to be involved in semantic processing of speech: i.e. modulation of the electrophysiological recording component N400, which is sensitive to the ease of semantic integration of a word to previous context, and recruitment of the left-lateralized frontal–posterior temporal network (left inferior frontal gyrus (IFG), medial temporal gyrus (MTG) and superior temporal gyrus/sulcus (STG/S)). Furthermore, we integrate the information coming from both channels recruiting brain areas such as left IFG, posterior superior temporal sulcus (STS)/MTG and even motor cortex. Finally, this integration is flexible: the temporal synchrony between the iconic gesture and the speech segment, as well as the perceived communicative intent of the speaker, modulate the integration process. Whether these findings are special to gestures or are shared with actions or other visual accompaniments to speech (e.g. lips) or other visual symbols such as pictures are discussed, as well as the implications for a multimodal view of language. PMID:25092664
Hearing and seeing meaning in speech and gesture: insights from brain and behaviour.

PubMed

Özyürek, Aslı

2014-09-19

As we speak, we use not only the arbitrary form-meaning mappings of the speech channel but also motivated form-meaning correspondences, i.e. iconic gestures that accompany speech (e.g. inverted V-shaped hand wiggling across gesture space to demonstrate walking). This article reviews what we know about processing of semantic information from speech and iconic gestures in spoken languages during comprehension of such composite utterances. Several studies have shown that comprehension of iconic gestures involves brain activations known to be involved in semantic processing of speech: i.e. modulation of the electrophysiological recording component N400, which is sensitive to the ease of semantic integration of a word to previous context, and recruitment of the left-lateralized frontal-posterior temporal network (left inferior frontal gyrus (IFG), medial temporal gyrus (MTG) and superior temporal gyrus/sulcus (STG/S)). Furthermore, we integrate the information coming from both channels recruiting brain areas such as left IFG, posterior superior temporal sulcus (STS)/MTG and even motor cortex. Finally, this integration is flexible: the temporal synchrony between the iconic gesture and the speech segment, as well as the perceived communicative intent of the speaker, modulate the integration process. Whether these findings are special to gestures or are shared with actions or other visual accompaniments to speech (e.g. lips) or other visual symbols such as pictures are discussed, as well as the implications for a multimodal view of language. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

The right hemisphere supports but does not replace left hemisphere auditory function in patients with persisting aphasia.

PubMed

Teki, Sundeep; Barnes, Gareth R; Penny, William D; Iverson, Paul; Woodhead, Zoe V J; Griffiths, Timothy D; Leff, Alexander P

2013-06-01

In this study, we used magnetoencephalography and a mismatch paradigm to investigate speech processing in stroke patients with auditory comprehension deficits and age-matched control subjects. We probed connectivity within and between the two temporal lobes in response to phonemic (different word) and acoustic (same word) oddballs using dynamic causal modelling. We found stronger modulation of self-connections as a function of phonemic differences for control subjects versus aphasics in left primary auditory cortex and bilateral superior temporal gyrus. The patients showed stronger modulation of connections from right primary auditory cortex to right superior temporal gyrus (feed-forward) and from left primary auditory cortex to right primary auditory cortex (interhemispheric). This differential connectivity can be explained on the basis of a predictive coding theory which suggests increased prediction error and decreased sensitivity to phonemic boundaries in the aphasics' speech network in both hemispheres. Within the aphasics, we also found behavioural correlates with connection strengths: a negative correlation between phonemic perception and an inter-hemispheric connection (left superior temporal gyrus to right superior temporal gyrus), and positive correlation between semantic performance and a feedback connection (right superior temporal gyrus to right primary auditory cortex). Our results suggest that aphasics with impaired speech comprehension have less veridical speech representations in both temporal lobes, and rely more on the right hemisphere auditory regions, particularly right superior temporal gyrus, for processing speech. Despite this presumed compensatory shift in network connectivity, the patients remain significantly impaired.
The right hemisphere supports but does not replace left hemisphere auditory function in patients with persisting aphasia

PubMed Central

Barnes, Gareth R.; Penny, William D.; Iverson, Paul; Woodhead, Zoe V. J.; Griffiths, Timothy D.; Leff, Alexander P.

2013-01-01

In this study, we used magnetoencephalography and a mismatch paradigm to investigate speech processing in stroke patients with auditory comprehension deficits and age-matched control subjects. We probed connectivity within and between the two temporal lobes in response to phonemic (different word) and acoustic (same word) oddballs using dynamic causal modelling. We found stronger modulation of self-connections as a function of phonemic differences for control subjects versus aphasics in left primary auditory cortex and bilateral superior temporal gyrus. The patients showed stronger modulation of connections from right primary auditory cortex to right superior temporal gyrus (feed-forward) and from left primary auditory cortex to right primary auditory cortex (interhemispheric). This differential connectivity can be explained on the basis of a predictive coding theory which suggests increased prediction error and decreased sensitivity to phonemic boundaries in the aphasics’ speech network in both hemispheres. Within the aphasics, we also found behavioural correlates with connection strengths: a negative correlation between phonemic perception and an inter-hemispheric connection (left superior temporal gyrus to right superior temporal gyrus), and positive correlation between semantic performance and a feedback connection (right superior temporal gyrus to right primary auditory cortex). Our results suggest that aphasics with impaired speech comprehension have less veridical speech representations in both temporal lobes, and rely more on the right hemisphere auditory regions, particularly right superior temporal gyrus, for processing speech. Despite this presumed compensatory shift in network connectivity, the patients remain significantly impaired. PMID:23715097
Speech Perception in Tones and Noise via Cochlear Implants Reveals Influence of Spectral Resolution on Temporal Processing

PubMed Central

Kreft, Heather A.

2014-01-01

Under normal conditions, human speech is remarkably robust to degradation by noise and other distortions. However, people with hearing loss, including those with cochlear implants, often experience great difficulty in understanding speech in noisy environments. Recent work with normal-hearing listeners has shown that the amplitude fluctuations inherent in noise contribute strongly to the masking of speech. In contrast, this study shows that speech perception via a cochlear implant is unaffected by the inherent temporal fluctuations of noise. This qualitative difference between acoustic and electric auditory perception does not seem to be due to differences in underlying temporal acuity but can instead be explained by the poorer spectral resolution of cochlear implants, relative to the normally functioning ear, which leads to an effective smoothing of the inherent temporal-envelope fluctuations of noise. The outcome suggests an unexpected trade-off between the detrimental effects of poorer spectral resolution and the beneficial effects of a smoother noise temporal envelope. This trade-off provides an explanation for the long-standing puzzle of why strong correlations between speech understanding and spectral resolution have remained elusive. The results also provide a potential explanation for why cochlear-implant users and hearing-impaired listeners exhibit reduced or absent masking release when large and relatively slow temporal fluctuations are introduced in noise maskers. The multitone maskers used here may provide an effective new diagnostic tool for assessing functional hearing loss and reduced spectral resolution. PMID:25315376
The role of phase synchronisation between low frequency amplitude modulations in child phonology and morphology speech tasks.

PubMed

Flanagan, Sheila; Goswami, Usha

2018-03-01

Recent models of the neural encoding of speech suggest a core role for amplitude modulation (AM) structure, particularly regarding AM phase alignment. Accordingly, speech tasks that measure linguistic development in children may exhibit systematic properties regarding AM structure. Here, the acoustic structure of spoken items in child phonological and morphological tasks, phoneme deletion and plural elicitation, was investigated. The phase synchronisation index (PSI), reflecting the degree of phase alignment between pairs of AMs, was computed for 3 AM bands (delta, theta, beta/low gamma; 0.9-2.5 Hz, 2.5-12 Hz, 12-40 Hz, respectively), for five spectral bands covering 100-7250 Hz. For phoneme deletion, data from 94 child participants with and without dyslexia was used to relate AM structure to behavioural performance. Results revealed that a significant change in magnitude of the phase synchronisation index (ΔPSI) of slower AMs (delta-theta) systematically accompanied both phoneme deletion and plural elicitation. Further, children with dyslexia made more linguistic errors as the delta-theta ΔPSI increased. Accordingly, ΔPSI between slower temporal modulations in the speech signal systematically distinguished test items from accurate responses and predicted task performance. This may suggest that sensitivity to slower AM information in speech is a core aspect of phonological and morphological development.
Musician effect on perception of spectro-temporally degraded speech, vocal emotion, and music in young adolescents.

PubMed

Başkent, Deniz; Fuller, Christina D; Galvin, John J; Schepel, Like; Gaudrain, Etienne; Free, Rolien H

2018-05-01

In adult normal-hearing musicians, perception of music, vocal emotion, and speech in noise has been previously shown to be better than non-musicians, sometimes even with spectro-temporally degraded stimuli. In this study, melodic contour identification, vocal emotion identification, and speech understanding in noise were measured in young adolescent normal-hearing musicians and non-musicians listening to unprocessed or degraded signals. Different from adults, there was no musician effect for vocal emotion identification or speech in noise. Melodic contour identification with degraded signals was significantly better in musicians, suggesting potential benefits from music training for young cochlear-implant users, who experience similar spectro-temporal signal degradations.
A positron emission tomography study of the neural basis of informational and energetic masking effects in speech perception

NASA Astrophysics Data System (ADS)

Scott, Sophie K.; Rosen, Stuart; Wickham, Lindsay; Wise, Richard J. S.

2004-02-01

Positron emission tomography (PET) was used to investigate the neural basis of the comprehension of speech in unmodulated noise (``energetic'' masking, dominated by effects at the auditory periphery), and when presented with another speaker (``informational'' masking, dominated by more central effects). Each type of signal was presented at four different signal-to-noise ratios (SNRs) (+3, 0, -3, -6 dB for the speech-in-speech, +6, +3, 0, -3 dB for the speech-in-noise), with listeners instructed to listen for meaning to the target speaker. Consistent with behavioral studies, there was SNR-dependent activation associated with the comprehension of speech in noise, with no SNR-dependent activity for the comprehension of speech-in-speech (at low or negative SNRs). There was, in addition, activation in bilateral superior temporal gyri which was associated with the informational masking condition. The extent to which this activation of classical ``speech'' areas of the temporal lobes might delineate the neural basis of the informational masking is considered, as is the relationship of these findings to the interfering effects of unattended speech and sound on more explicit working memory tasks. This study is a novel demonstration of candidate neural systems involved in the perception of speech in noisy environments, and of the processing of multiple speakers in the dorso-lateral temporal lobes.
A General Audiovisual Temporal Processing Deficit in Adult Readers With Dyslexia.

PubMed

Francisco, Ana A; Jesse, Alexandra; Groen, Margriet A; McQueen, James M

2017-01-01

Because reading is an audiovisual process, reading impairment may reflect an audiovisual processing deficit. The aim of the present study was to test the existence and scope of such a deficit in adult readers with dyslexia. We tested 39 typical readers and 51 adult readers with dyslexia on their sensitivity to the simultaneity of audiovisual speech and nonspeech stimuli, their time window of audiovisual integration for speech (using incongruent /aCa/ syllables), and their audiovisual perception of phonetic categories. Adult readers with dyslexia showed less sensitivity to audiovisual simultaneity than typical readers for both speech and nonspeech events. We found no differences between readers with dyslexia and typical readers in the temporal window of integration for audiovisual speech or in the audiovisual perception of phonetic categories. The results suggest an audiovisual temporal deficit in dyslexia that is not specific to speech-related events. But the differences found for audiovisual temporal sensitivity did not translate into a deficit in audiovisual speech perception. Hence, there seems to be a hiatus between simultaneity judgment and perception, suggesting a multisensory system that uses different mechanisms across tasks. Alternatively, it is possible that the audiovisual deficit in dyslexia is only observable when explicit judgments about audiovisual simultaneity are required.
Mapping a lateralization gradient within the ventral stream for auditory speech perception.

PubMed

Specht, Karsten

2013-01-01

Recent models on speech perception propose a dual-stream processing network, with a dorsal stream, extending from the posterior temporal lobe of the left hemisphere through inferior parietal areas into the left inferior frontal gyrus, and a ventral stream that is assumed to originate in the primary auditory cortex in the upper posterior part of the temporal lobe and to extend toward the anterior part of the temporal lobe, where it may connect to the ventral part of the inferior frontal gyrus. This article describes and reviews the results from a series of complementary functional magnetic resonance imaging studies that aimed to trace the hierarchical processing network for speech comprehension within the left and right hemisphere with a particular focus on the temporal lobe and the ventral stream. As hypothesized, the results demonstrate a bilateral involvement of the temporal lobes in the processing of speech signals. However, an increasing leftward asymmetry was detected from auditory-phonetic to lexico-semantic processing and along the posterior-anterior axis, thus forming a "lateralization" gradient. This increasing leftward lateralization was particularly evident for the left superior temporal sulcus and more anterior parts of the temporal lobe.
Mapping a lateralization gradient within the ventral stream for auditory speech perception

PubMed Central

Specht, Karsten

2013-01-01

Recent models on speech perception propose a dual-stream processing network, with a dorsal stream, extending from the posterior temporal lobe of the left hemisphere through inferior parietal areas into the left inferior frontal gyrus, and a ventral stream that is assumed to originate in the primary auditory cortex in the upper posterior part of the temporal lobe and to extend toward the anterior part of the temporal lobe, where it may connect to the ventral part of the inferior frontal gyrus. This article describes and reviews the results from a series of complementary functional magnetic resonance imaging studies that aimed to trace the hierarchical processing network for speech comprehension within the left and right hemisphere with a particular focus on the temporal lobe and the ventral stream. As hypothesized, the results demonstrate a bilateral involvement of the temporal lobes in the processing of speech signals. However, an increasing leftward asymmetry was detected from auditory–phonetic to lexico-semantic processing and along the posterior–anterior axis, thus forming a “lateralization” gradient. This increasing leftward lateralization was particularly evident for the left superior temporal sulcus and more anterior parts of the temporal lobe. PMID:24106470
A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system.

PubMed

Shao, Yu; Chang, Chip-Hong

2007-08-01

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.
An ALE meta-analysis on the audiovisual integration of speech signals.

PubMed

Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E

2014-11-01

The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.
Inferior frontal oscillations reveal visuo-motor matching for actions and speech: evidence from human intracranial recordings.

PubMed

Halje, Pär; Seeck, Margitta; Blanke, Olaf; Ionta, Silvio

2015-12-01

The neural correspondence between the systems responsible for the execution and recognition of actions has been suggested both in humans and non-human primates. Apart from being a key region of this visuo-motor observation-execution matching (OEM) system, the human inferior frontal gyrus (IFG) is also important for speech production. The functional overlap of visuo-motor OEM and speech, together with the phylogenetic history of the IFG as a motor area, has led to the idea that speech function has evolved from pre-existing motor systems and to the hypothesis that an OEM system may exist also for speech. However, visuo-motor OEM and speech OEM have never been compared directly. We used electrocorticography to analyze oscillations recorded from intracranial electrodes in human fronto-parieto-temporal cortex during visuo-motor (executing or visually observing an action) and speech OEM tasks (verbally describing an action using the first or third person pronoun). The results show that neural activity related to visuo-motor OEM is widespread in the frontal, parietal, and temporal regions. Speech OEM also elicited widespread responses partly overlapping with visuo-motor OEM sites (bilaterally), including frontal, parietal, and temporal regions. Interestingly a more focal region, the inferior frontal gyrus (bilaterally), showed both visuo-motor OEM and speech OEM properties independent of orolingual speech-unrelated movements. Building on the methodological advantages in human invasive electrocorticography, the present findings provide highly precise spatial and temporal information to support the existence of a modality-independent action representation system in the human brain that is shared between systems for performing, interpreting and describing actions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Superior temporal sulcus--It's my area: or is it?

PubMed

Hein, Grit; Knight, Robert T

2008-12-01

The superior temporal sulcus (STS) is the chameleon of the human brain. Several research areas claim the STS as the host brain region for their particular behavior of interest. Some see it as one of the core structures for theory of mind. For others, it is the main region for audiovisual integration. It plays an important role in biological motion perception, but is also claimed to be essential for speech processing and processing of faces. We review the foci of activations in the STS from multiple functional magnetic resonance imaging studies, focusing on theory of mind, audiovisual integration, motion processing, speech processing, and face processing. The results indicate a differentiation of the STS region in an anterior portion, mainly involved in speech processing, and a posterior portion recruited by cognitive demands of all these different research areas. The latter finding argues against a strict functional subdivision of the STS. In line with anatomical evidence from tracer studies, we propose that the function of the STS varies depending on the nature of network coactivations with different regions in the frontal cortex and medial-temporal lobe. This view is more in keeping with the notion that the same brain region can support different cognitive operations depending on task-dependent network connections, emphasizing the role of network connectivity analysis in neuroimaging.
Prior Knowledge Guides Speech Segregation in Human Auditory Cortex.

PubMed

Wang, Yuanye; Zhang, Jianfeng; Zou, Jiajie; Luo, Huan; Ding, Nai

2018-05-18

Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.
The Pathways for Intelligible Speech: Multivariate and Univariate Perspectives

PubMed Central

Evans, S.; Kyong, J.S.; Rosen, S.; Golestani, N.; Warren, J.E.; McGettigan, C.; Mourão-Miranda, J.; Wise, R.J.S.; Scott, S.K.

2014-01-01

An anterior pathway, concerned with extracting meaning from sound, has been identified in nonhuman primates. An analogous pathway has been suggested in humans, but controversy exists concerning the degree of lateralization and the precise location where responses to intelligible speech emerge. We have demonstrated that the left anterior superior temporal sulcus (STS) responds preferentially to intelligible speech (Scott SK, Blank CC, Rosen S, Wise RJS. 2000. Identification of a pathway for intelligible speech in the left temporal lobe. Brain. 123:2400–2406.). A functional magnetic resonance imaging study in Cerebral Cortex used equivalent stimuli and univariate and multivariate analyses to argue for the greater importance of bilateral posterior when compared with the left anterior STS in responding to intelligible speech (Okada K, Rong F, Venezia J, Matchin W, Hsieh IH, Saberi K, Serences JT,Hickok G. 2010. Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. 20: 2486–2495.). Here, we also replicate our original study, demonstrating that the left anterior STS exhibits the strongest univariate response and, in decoding using the bilateral temporal cortex, contains the most informative voxels showing an increased response to intelligible speech. In contrast, in classifications using local “searchlights” and a whole brain analysis, we find greater classification accuracy in posterior rather than anterior temporal regions. Thus, we show that the precise nature of the multivariate analysis used will emphasize different response profiles associated with complex sound to speech processing. PMID:23585519
Situational influences on rhythmicity in speech, music, and their interaction.

PubMed

Hawkins, Sarah

2014-12-19

Brain processes underlying the production and perception of rhythm indicate considerable flexibility in how physical signals are interpreted. This paper explores how that flexibility might play out in rhythmicity in speech and music. There is much in common across the two domains, but there are also significant differences. Interpretations are explored that reconcile some of the differences, particularly with respect to how functional properties modify the rhythmicity of speech, within limits imposed by its structural constraints. Functional and structural differences mean that music is typically more rhythmic than speech, and that speech will be more rhythmic when the emotions are more strongly engaged, or intended to be engaged. The influence of rhythmicity on attention is acknowledged, and it is suggested that local increases in rhythmicity occur at times when attention is required to coordinate joint action, whether in talking or music-making. Evidence is presented which suggests that while these short phases of heightened rhythmical behaviour are crucial to the success of transitions in communicative interaction, their modality is immaterial: they all function to enhance precise temporal prediction and hence tightly coordinated joint action. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Temporal Variability and Stability in Infant-Directed Sung Speech: Evidence for Language-Specific Patterns

ERIC Educational Resources Information Center

Falk, Simone

2011-01-01

In this paper, sung speech is used as a methodological tool to explore temporal variability in the timing of word-internal consonants and vowels. It is hypothesized that temporal variability/stability becomes clearer under the varying rhythmical conditions induced by song. This is explored cross-linguistically in German--a language that exhibits a…
Cross-language differences in the brain network subserving intelligible speech.

PubMed

Ge, Jianqiao; Peng, Gang; Lyu, Bingjiang; Wang, Yi; Zhuo, Yan; Niu, Zhendong; Tan, Li Hai; Leff, Alexander P; Gao, Jia-Hong

2015-03-10

How is language processed in the brain by native speakers of different languages? Is there one brain system for all languages or are different languages subserved by different brain systems? The first view emphasizes commonality, whereas the second emphasizes specificity. We investigated the cortical dynamics involved in processing two very diverse languages: a tonal language (Chinese) and a nontonal language (English). We used functional MRI and dynamic causal modeling analysis to compute and compare brain network models exhaustively with all possible connections among nodes of language regions in temporal and frontal cortex and found that the information flow from the posterior to anterior portions of the temporal cortex was commonly shared by Chinese and English speakers during speech comprehension, whereas the inferior frontal gyrus received neural signals from the left posterior portion of the temporal cortex in English speakers and from the bilateral anterior portion of the temporal cortex in Chinese speakers. Our results revealed that, although speech processing is largely carried out in the common left hemisphere classical language areas (Broca's and Wernicke's areas) and anterior temporal cortex, speech comprehension across different language groups depends on how these brain regions interact with each other. Moreover, the right anterior temporal cortex, which is crucial for tone processing, is equally important as its left homolog, the left anterior temporal cortex, in modulating the cortical dynamics in tone language comprehension. The current study pinpoints the importance of the bilateral anterior temporal cortex in language comprehension that is downplayed or even ignored by popular contemporary models of speech comprehension.
Cross-language differences in the brain network subserving intelligible speech

PubMed Central

Ge, Jianqiao; Peng, Gang; Lyu, Bingjiang; Wang, Yi; Zhuo, Yan; Niu, Zhendong; Tan, Li Hai; Leff, Alexander P.; Gao, Jia-Hong

2015-01-01

How is language processed in the brain by native speakers of different languages? Is there one brain system for all languages or are different languages subserved by different brain systems? The first view emphasizes commonality, whereas the second emphasizes specificity. We investigated the cortical dynamics involved in processing two very diverse languages: a tonal language (Chinese) and a nontonal language (English). We used functional MRI and dynamic causal modeling analysis to compute and compare brain network models exhaustively with all possible connections among nodes of language regions in temporal and frontal cortex and found that the information flow from the posterior to anterior portions of the temporal cortex was commonly shared by Chinese and English speakers during speech comprehension, whereas the inferior frontal gyrus received neural signals from the left posterior portion of the temporal cortex in English speakers and from the bilateral anterior portion of the temporal cortex in Chinese speakers. Our results revealed that, although speech processing is largely carried out in the common left hemisphere classical language areas (Broca’s and Wernicke’s areas) and anterior temporal cortex, speech comprehension across different language groups depends on how these brain regions interact with each other. Moreover, the right anterior temporal cortex, which is crucial for tone processing, is equally important as its left homolog, the left anterior temporal cortex, in modulating the cortical dynamics in tone language comprehension. The current study pinpoints the importance of the bilateral anterior temporal cortex in language comprehension that is downplayed or even ignored by popular contemporary models of speech comprehension. PMID:25713366
Functional overlap between regions involved in speech perception and in monitoring one's own voice during speech production.

PubMed

Zheng, Zane Z; Munhall, Kevin G; Johnsrude, Ingrid S

2010-08-01

The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not and by examining the overlap with the network recruited during passive listening to speech sounds. We used real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word ("Ted") and either heard this clearly or heard voice-gated masking noise. We compared this to when they listened to yoked stimuli (identical recordings of "Ted" or noise) without speaking. Activity along the STS and superior temporal gyrus bilaterally was significantly greater if the auditory stimulus was (a) processed as the auditory concomitant of speaking and (b) did not match the predicted outcome (noise). The network exhibiting this Feedback Type x Production/Perception interaction includes a superior temporal gyrus/middle temporal gyrus region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.

The effect of a concurrent working memory task and temporal offsets on the integration of auditory and visual speech information.

PubMed

Buchan, Julie N; Munhall, Kevin G

2012-01-01

Audiovisual speech perception is an everyday occurrence of multisensory integration. Conflicting visual speech information can influence the perception of acoustic speech (namely the McGurk effect), and auditory and visual speech are integrated over a rather wide range of temporal offsets. This research examined whether the addition of a concurrent cognitive load task would affect the audiovisual integration in a McGurk speech task and whether the cognitive load task would cause more interference at increasing offsets. The amount of integration was measured by the proportion of responses in incongruent trials that did not correspond to the audio (McGurk response). An eye-tracker was also used to examine whether the amount of temporal offset and the presence of a concurrent cognitive load task would influence gaze behavior. Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets. Participant's gaze behavior was also influenced by the addition of a cognitive load task. Gaze was less centralized on the face, less time was spent looking at the mouth and more time was spent looking at the eyes, when a concurrent cognitive load task was added to the speech task.
Alpha and Beta Oscillations Index Semantic Congruency between Speech and Gestures in Clear and Degraded Speech.

PubMed

Drijvers, Linda; Özyürek, Asli; Jensen, Ole

2018-06-19

Previous work revealed that visual semantic information conveyed by gestures can enhance degraded speech comprehension, but the mechanisms underlying these integration processes under adverse listening conditions remain poorly understood. We used MEG to investigate how oscillatory dynamics support speech-gesture integration when integration load is manipulated by auditory (e.g., speech degradation) and visual semantic (e.g., gesture congruency) factors. Participants were presented with videos of an actress uttering an action verb in clear or degraded speech, accompanied by a matching (mixing gesture + "mixing") or mismatching (drinking gesture + "walking") gesture. In clear speech, alpha/beta power was more suppressed in the left inferior frontal gyrus and motor and visual cortices when integration load increased in response to mismatching versus matching gestures. In degraded speech, beta power was less suppressed over posterior STS and medial temporal lobe for mismatching compared with matching gestures, showing that integration load was lowest when speech was degraded and mismatching gestures could not be integrated and disambiguate the degraded signal. Our results thus provide novel insights on how low-frequency oscillatory modulations in different parts of the cortex support the semantic audiovisual integration of gestures in clear and degraded speech: When speech is clear, the left inferior frontal gyrus and motor and visual cortices engage because higher-level semantic information increases semantic integration load. When speech is degraded, posterior STS/middle temporal gyrus and medial temporal lobe are less engaged because integration load is lowest when visual semantic information does not aid lexical retrieval and speech and gestures cannot be integrated.
Redistribution of neural phase coherence reflects establishment of feedforward map in speech motor adaptation

PubMed Central

Sengupta, Ranit

2015-01-01

Despite recent progress in our understanding of sensorimotor integration in speech learning, a comprehensive framework to investigate its neural basis is lacking at behaviorally relevant timescales. Structural and functional imaging studies in humans have helped us identify brain networks that support speech but fail to capture the precise spatiotemporal coordination within the networks that takes place during speech learning. Here we use neuronal oscillations to investigate interactions within speech motor networks in a paradigm of speech motor adaptation under altered feedback with continuous recording of EEG in which subjects adapted to the real-time auditory perturbation of a target vowel sound. As subjects adapted to the task, concurrent changes were observed in the theta-gamma phase coherence during speech planning at several distinct scalp regions that is consistent with the establishment of a feedforward map. In particular, there was an increase in coherence over the central region and a decrease over the fronto-temporal regions, revealing a redistribution of coherence over an interacting network of brain regions that could be a general feature of error-based motor learning in general. Our findings have implications for understanding the neural basis of speech motor learning and could elucidate how transient breakdown of neuronal communication within speech networks relates to speech disorders. PMID:25632078
Behavioural and neuroanatomical correlates of auditory speech analysis in primary progressive aphasias.

PubMed

Hardy, Chris J D; Agustus, Jennifer L; Marshall, Charles R; Clark, Camilla N; Russell, Lucy L; Bond, Rebecca L; Brotherhood, Emilie V; Thomas, David L; Crutch, Sebastian J; Rohrer, Jonathan D; Warren, Jason D

2017-07-27

Non-verbal auditory impairment is increasingly recognised in the primary progressive aphasias (PPAs) but its relationship to speech processing and brain substrates has not been defined. Here we addressed these issues in patients representing the non-fluent variant (nfvPPA) and semantic variant (svPPA) syndromes of PPA. We studied 19 patients with PPA in relation to 19 healthy older individuals. We manipulated three key auditory parameters-temporal regularity, phonemic spectral structure and prosodic predictability (an index of fundamental information content, or entropy)-in sequences of spoken syllables. The ability of participants to process these parameters was assessed using two-alternative, forced-choice tasks and neuroanatomical associations of task performance were assessed using voxel-based morphometry of patients' brain magnetic resonance images. Relative to healthy controls, both the nfvPPA and svPPA groups had impaired processing of phonemic spectral structure and signal predictability while the nfvPPA group additionally had impaired processing of temporal regularity in speech signals. Task performance correlated with standard disease severity and neurolinguistic measures. Across the patient cohort, performance on the temporal regularity task was associated with grey matter in the left supplementary motor area and right caudate, performance on the phoneme processing task was associated with grey matter in the left supramarginal gyrus, and performance on the prosodic predictability task was associated with grey matter in the right putamen. Our findings suggest that PPA syndromes may be underpinned by more generic deficits of auditory signal analysis, with a distributed cortico-subcortical neuraoanatomical substrate extending beyond the canonical language network. This has implications for syndrome classification and biomarker development.
Presence of 1/f noise in the temporal structure of psychoacoustic parameters of natural and urban sounds.

PubMed

Yang, Ming; De Coensel, Bert; Kang, Jian

2015-08-01

1/f noise or pink noise, which has been shown to be universal in nature, has also been observed in the temporal envelope of music, speech, and environmental sound. Moreover, the slope of the spectral density of the temporal envelope of music has been shown to correlate well to its pleasing, dull, or chaotic character. In this paper, the temporal structure of a number of instantaneous psychoacoustic parameters of environmental sound is examined in order to investigate whether a 1/f temporal structure appears in various types of sound that are generally preferred by people in everyday life. The results show, to some extent, that different categories of environmental sounds have different temporal structure characteristics. Only a number of urban sounds considered and birdsong, generally, exhibit 1/f behavior on short to medium duration time scales, i.e., from 0.1 s to 10 s, in instantaneous loudness and sharpness, whereas a more chaotic variation is found in birdsong at longer time scales, i.e., of 10 s-200 s. The other sound categories considered exhibit random or monotonic variations in the different time scales. In general, this study shows that a 1/f temporal structure is not necessarily present in environmental sounds that are commonly perceived as pleasant.
Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition

PubMed Central

Rigoulot, Simon; Wassiliwizky, Eugen; Pell, Marc D.

2013-01-01

Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereas happiness and disgust are recognized relatively slowly; Pell and Kotz, 2011). To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated syllable-by-syllable from the offset rather than the onset of the stimulus). Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell and Kotz (2011). We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners' accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fear, anger, sadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgust were recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400–1200 ms time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech. PMID:23805115
Acoustic richness modulates the neural networks supporting intelligible speech processing.

PubMed

Lee, Yune-Sang; Min, Nam Eun; Wingfield, Arthur; Grossman, Murray; Peelle, Jonathan E

2016-03-01

The information contained in a sensory signal plays a critical role in determining what neural processes are engaged. Here we used interleaved silent steady-state (ISSS) functional magnetic resonance imaging (fMRI) to explore how human listeners cope with different degrees of acoustic richness during auditory sentence comprehension. Twenty-six healthy young adults underwent scanning while hearing sentences that varied in acoustic richness (high vs. low spectral detail) and syntactic complexity (subject-relative vs. object-relative center-embedded clause structures). We manipulated acoustic richness by presenting the stimuli as unprocessed full-spectrum speech, or noise-vocoded with 24 channels. Importantly, although the vocoded sentences were spectrally impoverished, all sentences were highly intelligible. These manipulations allowed us to test how intelligible speech processing was affected by orthogonal linguistic and acoustic demands. Acoustically rich speech showed stronger activation than acoustically less-detailed speech in a bilateral temporoparietal network with more pronounced activity in the right hemisphere. By contrast, listening to sentences with greater syntactic complexity resulted in increased activation of a left-lateralized network including left posterior lateral temporal cortex, left inferior frontal gyrus, and left dorsolateral prefrontal cortex. Significant interactions between acoustic richness and syntactic complexity occurred in left supramarginal gyrus, right superior temporal gyrus, and right inferior frontal gyrus, indicating that the regions recruited for syntactic challenge differed as a function of acoustic properties of the speech. Our findings suggest that the neural systems involved in speech perception are finely tuned to the type of information available, and that reducing the richness of the acoustic signal dramatically alters the brain's response to spoken language, even when intelligibility is high. Copyright © 2015 Elsevier B.V. All rights reserved.
Model-Based Speech Signal Coding Using Optimized Temporal Decomposition for Storage and Broadcasting Applications

NASA Astrophysics Data System (ADS)

Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret

2003-12-01

A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.
Temporally selective attention supports speech processing in 3- to 5-year-old children.

PubMed

Astheimer, Lori B; Sanders, Lisa D

2012-01-01

Recent event-related potential (ERP) evidence demonstrates that adults employ temporally selective attention to preferentially process the initial portions of words in continuous speech. Doing so is an effective listening strategy since word-initial segments are highly informative. Although the development of this process remains unexplored, directing attention to word onsets may be important for speech processing in young children who would otherwise be overwhelmed by the rapidly changing acoustic signals that constitute speech. We examined the use of temporally selective attention in 3- to 5-year-old children listening to stories by comparing ERPs elicited by attention probes presented at four acoustically matched times relative to word onsets: concurrently with a word onset, 100 ms before, 100 ms after, and at random control times. By 80 ms, probes presented at and after word onsets elicited a larger negativity than probes presented before word onsets or at control times. The latency and distribution of this effect is similar to temporally and spatially selective attention effects measured in adults and, despite differences in polarity, spatially selective attention effects measured in children. These results indicate that, like adults, preschool aged children modulate temporally selective attention to preferentially process the initial portions of words in continuous speech. Copyright © 2011 Elsevier Ltd. All rights reserved.
Masking Period Patterns and Forward Masking for Speech-Shaped Noise: Age-Related Effects.

PubMed

Grose, John H; Menezes, Denise C; Porter, Heather L; Griz, Silvana

2016-01-01

The purpose of this study was to assess age-related changes in temporal resolution in listeners with relatively normal audiograms. The hypothesis was that increased susceptibility to nonsimultaneous masking contributes to the hearing difficulties experienced by older listeners in complex fluctuating backgrounds. Participants included younger (n = 11), middle-age (n = 12), and older (n = 11) listeners with relatively normal audiograms. The first phase of the study measured masking period patterns for speech-shaped noise maskers and signals. From these data, temporal window shapes were derived. The second phase measured forward-masking functions and assessed how well the temporal window fits accounted for these data. The masking period patterns demonstrated increased susceptibility to backward masking in the older listeners, compatible with a more symmetric temporal window in this group. The forward-masking functions exhibited an age-related decline in recovery to baseline thresholds, and there was also an increase in the variability of the temporal window fits to these data. This study demonstrated an age-related increase in susceptibility to nonsimultaneous masking, supporting the hypothesis that exacerbated nonsimultaneous masking contributes to age-related difficulties understanding speech in fluctuating noise. Further support for this hypothesis comes from limited speech-in-noise data, suggesting an association between susceptibility to forward masking and speech understanding in modulated noise.
Temporal order processing of syllables in the left parietal lobe.

PubMed

Moser, Dana; Baker, Julie M; Sanchez, Carmen E; Rorden, Chris; Fridriksson, Julius

2009-10-07

Speech processing requires the temporal parsing of syllable order. Individuals suffering from posterior left hemisphere brain injury often exhibit temporal processing deficits as well as language deficits. Although the right posterior inferior parietal lobe has been implicated in temporal order judgments (TOJs) of visual information, there is limited evidence to support the role of the left inferior parietal lobe (IPL) in processing syllable order. The purpose of this study was to examine whether the left inferior parietal lobe is recruited during temporal order judgments of speech stimuli. Functional magnetic resonance imaging data were collected on 14 normal participants while they completed the following forced-choice tasks: (1) syllable order of multisyllabic pseudowords, (2) syllable identification of single syllables, and (3) gender identification of both multisyllabic and monosyllabic speech stimuli. Results revealed increased neural recruitment in the left inferior parietal lobe when participants made judgments about syllable order compared with both syllable identification and gender identification. These findings suggest that the left inferior parietal lobe plays an important role in processing syllable order and support the hypothesized role of this region as an interface between auditory speech and the articulatory code. Furthermore, a breakdown in this interface may explain some components of the speech deficits observed after posterior damage to the left hemisphere.
Temporal Order Processing of Syllables in the Left Parietal Lobe

PubMed Central

Baker, Julie M.; Sanchez, Carmen E.; Rorden, Chris; Fridriksson, Julius

2009-01-01

Speech processing requires the temporal parsing of syllable order. Individuals suffering from posterior left hemisphere brain injury often exhibit temporal processing deficits as well as language deficits. Although the right posterior inferior parietal lobe has been implicated in temporal order judgments (TOJs) of visual information, there is limited evidence to support the role of the left inferior parietal lobe (IPL) in processing syllable order. The purpose of this study was to examine whether the left inferior parietal lobe is recruited during temporal order judgments of speech stimuli. Functional magnetic resonance imaging data were collected on 14 normal participants while they completed the following forced-choice tasks: (1) syllable order of multisyllabic pseudowords, (2) syllable identification of single syllables, and (3) gender identification of both multisyllabic and monosyllabic speech stimuli. Results revealed increased neural recruitment in the left inferior parietal lobe when participants made judgments about syllable order compared with both syllable identification and gender identification. These findings suggest that the left inferior parietal lobe plays an important role in processing syllable order and support the hypothesized role of this region as an interface between auditory speech and the articulatory code. Furthermore, a breakdown in this interface may explain some components of the speech deficits observed after posterior damage to the left hemisphere. PMID:19812331
A versatile pitch tracking algorithm: from human speech to killer whale vocalizations.

PubMed

Shapiro, Ari Daniel; Wang, Chao

2009-07-01

In this article, a pitch tracking algorithm [named discrete logarithmic Fourier transformation-pitch detection algorithm (DLFT-PDA)], originally designed for human telephone speech, was modified for killer whale vocalizations. The multiple frequency components of some of these vocalizations demand a spectral (rather than temporal) approach to pitch tracking. The DLFT-PDA algorithm derives reliable estimations of pitch and the temporal change of pitch from the harmonic structure of the vocal signal. Scores from both estimations are combined in a dynamic programming search to find a smooth pitch track. The algorithm is capable of tracking killer whale calls that contain simultaneous low and high frequency components and compares favorably across most signal to noise ratio ranges to the peak-picking and sidewinder algorithms that have been used for tracking killer whale vocalizations previously.
Perceptual consequences of normal and abnormal peripheral compression: Potential links between psychoacoustics and speech perception

NASA Astrophysics Data System (ADS)

Oxenham, Andrew J.; Rosengard, Peninah S.; Braida, Louis D.

2004-05-01

Cochlear damage can lead to a reduction in the overall amount of peripheral auditory compression, presumably due to outer hair cell (OHC) loss or dysfunction. The perceptual consequences of functional OHC loss include loudness recruitment and reduced dynamic range, poorer frequency selectivity, and poorer effective temporal resolution. These in turn may lead to a reduced ability to make use of spectral and temporal fluctuations in background noise when listening to a target sound, such as speech. We tested the effect of OHC function on speech reception in hearing-impaired listeners by comparing psychoacoustic measures of cochlear compression and sentence recognition in a variety of noise backgrounds. In line with earlier studies, we found weak (nonsignificant) correlations between the psychoacoustic tasks and speech reception thresholds in quiet or in steady-state noise. However, when spectral and temporal fluctuations were introduced in the masker, speech reception improved to an extent that was well predicted by the psychoacoustic measures. Thus, our initial results suggest a strong relationship between measures of cochlear compression and the ability of listeners to take advantage of spectral and temporal masker fluctuations in recognizing speech. [Work supported by NIH Grants Nos. R01DC03909, T32DC00038, and R01DC00117.
Listening to an Audio Drama Activates Two Processing Networks, One for All Sounds, Another Exclusively for Speech

PubMed Central

Boldt, Robert; Malinen, Sanna; Seppä, Mika; Tikka, Pia; Savolainen, Petri; Hari, Riitta; Carlson, Synnöve

2013-01-01

Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ∼19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI). An intersubject-correlation (ISC) map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC) analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two–covering non-overlapping areas of the auditory cortex–were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds. PMID:23734202
Listening to an audio drama activates two processing networks, one for all sounds, another exclusively for speech.

PubMed

Boldt, Robert; Malinen, Sanna; Seppä, Mika; Tikka, Pia; Savolainen, Petri; Hari, Riitta; Carlson, Synnöve

2013-01-01

Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ∼19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI). An intersubject-correlation (ISC) map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC) analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two-covering non-overlapping areas of the auditory cortex-were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds.
Reversal of age-related neural timing delays with training

PubMed Central

Anderson, Samira; White-Schwoch, Travis; Parbery-Clark, Alexandra; Kraus, Nina

2013-01-01

Neural slowing is commonly noted in older adults, with consequences for sensory, motor, and cognitive domains. One of the deleterious effects of neural slowing is impairment of temporal resolution; older adults, therefore, have reduced ability to process the rapid events that characterize speech, especially in noisy environments. Although hearing aids provide increased audibility, they cannot compensate for deficits in auditory temporal processing. Auditory training may provide a strategy to address these deficits. To that end, we evaluated the effects of auditory-based cognitive training on the temporal precision of subcortical processing of speech in noise. After training, older adults exhibited faster neural timing and experienced gains in memory, speed of processing, and speech-in-noise perception, whereas a matched control group showed no changes. Training was also associated with decreased variability of brainstem response peaks, suggesting a decrease in temporal jitter in response to a speech signal. These results demonstrate that auditory-based cognitive training can partially restore age-related deficits in temporal processing in the brain; this plasticity in turn promotes better cognitive and perceptual skills. PMID:23401541
Frontal and temporal contributions to understanding the iconic co-speech gestures that accompany speech.

PubMed

Dick, Anthony Steven; Mok, Eva H; Raja Beharelle, Anjali; Goldin-Meadow, Susan; Small, Steven L

2014-03-01

In everyday conversation, listeners often rely on a speaker's gestures to clarify any ambiguities in the verbal message. Using fMRI during naturalistic story comprehension, we examined which brain regions in the listener are sensitive to speakers' iconic gestures. We focused on iconic gestures that contribute information not found in the speaker's talk, compared with those that convey information redundant with the speaker's talk. We found that three regions-left inferior frontal gyrus triangular (IFGTr) and opercular (IFGOp) portions, and left posterior middle temporal gyrus (MTGp)--responded more strongly when gestures added information to nonspecific language, compared with when they conveyed the same information in more specific language; in other words, when gesture disambiguated speech as opposed to reinforced it. An increased BOLD response was not found in these regions when the nonspecific language was produced without gesture, suggesting that IFGTr, IFGOp, and MTGp are involved in integrating semantic information across gesture and speech. In addition, we found that activity in the posterior superior temporal sulcus (STSp), previously thought to be involved in gesture-speech integration, was not sensitive to the gesture-speech relation. Together, these findings clarify the neurobiology of gesture-speech integration and contribute to an emerging picture of how listeners glean meaning from gestures that accompany speech. Copyright © 2012 Wiley Periodicals, Inc.
Compressed Speech: Potential Application for Air Force Technical Training. Final Report, August 73-November 73.

ERIC Educational Resources Information Center

Dailey, K. Anne

Time-compressed speech (also called compressed speech, speeded speech, or accelerated speech) is an extension of the normal recording procedure for reproducing the spoken word. Compressed speech can be used to achieve dramatic reductions in listening time without significant loss in comprehension. The implications of such temporal reductions in…
Speech perception in individuals with auditory dys-synchrony.

PubMed

Kumar, U A; Jayaram, M

2011-03-01

This study aimed to evaluate the effect of lengthening the transition duration of selected speech segments upon the perception of those segments in individuals with auditory dys-synchrony. Thirty individuals with auditory dys-synchrony participated in the study, along with 30 age-matched normal hearing listeners. Eight consonant-vowel syllables were used as auditory stimuli. Two experiments were conducted. Experiment one measured the 'just noticeable difference' time: the smallest prolongation of the speech sound transition duration which was noticeable by the subject. In experiment two, speech sounds were modified by lengthening the transition duration by multiples of the just noticeable difference time, and subjects' speech identification scores for the modified speech sounds were assessed. Subjects with auditory dys-synchrony demonstrated poor processing of temporal auditory information. Lengthening of speech sound transition duration improved these subjects' perception of both the placement and voicing features of the speech syllables used. These results suggest that innovative speech processing strategies which enhance temporal cues may benefit individuals with auditory dys-synchrony.

Neuronal populations in the occipital cortex of the blind synchronize to the temporal dynamics of speech

PubMed Central

Van Ackeren, Markus Johannes; Barbero, Francesca M; Mattioni, Stefania; Bottini, Roberto

2018-01-01

The occipital cortex of early blind individuals (EB) activates during speech processing, challenging the notion of a hard-wired neurobiology of language. But, at what stage of speech processing do occipital regions participate in EB? Here we demonstrate that parieto-occipital regions in EB enhance their synchronization to acoustic fluctuations in human speech in the theta-range (corresponding to syllabic rate), irrespective of speech intelligibility. Crucially, enhanced synchronization to the intelligibility of speech was selectively observed in primary visual cortex in EB, suggesting that this region is at the interface between speech perception and comprehension. Moreover, EB showed overall enhanced functional connectivity between temporal and occipital cortices that are sensitive to speech intelligibility and altered directionality when compared to the sighted group. These findings suggest that the occipital cortex of the blind adopts an architecture that allows the tracking of speech material, and therefore does not fully abstract from the reorganized sensory inputs it receives. PMID:29338838
Right anterior superior temporal activation predicts auditory sentence comprehension following aphasic stroke.

PubMed

Crinion, Jenny; Price, Cathy J

2005-12-01

Previous studies have suggested that recovery of speech comprehension after left hemisphere infarction may depend on a mechanism in the right hemisphere. However, the role that distinct right hemisphere regions play in speech comprehension following left hemisphere stroke has not been established. Here, we used functional magnetic resonance imaging (fMRI) to investigate narrative speech activation in 18 neurologically normal subjects and 17 patients with left hemisphere stroke and a history of aphasia. Activation for listening to meaningful stories relative to meaningless reversed speech was identified in the normal subjects and in each patient. Second level analyses were then used to investigate how story activation changed with the patients' auditory sentence comprehension skills and surprise story recognition memory tests post-scanning. Irrespective of lesion site, performance on tests of auditory sentence comprehension was positively correlated with activation in the right lateral superior temporal region, anterior to primary auditory cortex. In addition, when the stroke spared the left temporal cortex, good performance on tests of auditory sentence comprehension was also correlated with the left posterior superior temporal cortex (Wernicke's area). In distinct contrast to this, good story recognition memory predicted left inferior frontal and right cerebellar activation. The implication of this double dissociation in the effects of auditory sentence comprehension and story recognition memory is that left frontal and left temporal activations are dissociable. Our findings strongly support the role of the right temporal lobe in processing narrative speech and, in particular, auditory sentence comprehension following left hemisphere aphasic stroke. In addition, they highlight the importance of the right anterior superior temporal cortex where the response was dissociated from that in the left posterior temporal lobe.
Spectral and temporal changes to speech produced in the presence of energetic and informational maskers.

PubMed

Cooke, Martin; Lu, Youyi

2010-10-01

Talkers change the way they speak in noisy conditions. For energetic maskers, speech production changes are relatively well-understood, but less is known about how informational maskers such as competing speech affect speech production. The current study examines the effect of energetic and informational maskers on speech production by talkers speaking alone or in pairs. Talkers produced speech in quiet and in backgrounds of speech-shaped noise, speech-modulated noise, and competing speech. Relative to quiet, speech output level and fundamental frequency increased and spectral tilt flattened in proportion to the energetic masking capacity of the background. In response to modulated backgrounds, talkers were able to reduce substantially the degree of temporal overlap with the noise, with greater reduction for the competing speech background. Reduction in foreground-background overlap can be expected to lead to a release from both energetic and informational masking for listeners. Passive changes in speech rate, mean pause length or pause distribution cannot explain the overlap reduction, which appears instead to result from a purposeful process of listening while speaking. Talkers appear to monitor the background and exploit upcoming pauses, a strategy which is particularly effective for backgrounds containing intelligible speech.
Bridging music and speech rhythm: rhythmic priming and audio-motor training affect speech perception.

PubMed

Cason, Nia; Astésano, Corine; Schön, Daniele

2015-02-01

Following findings that musical rhythmic priming enhances subsequent speech perception, we investigated whether rhythmic priming for spoken sentences can enhance phonological processing - the building blocks of speech - and whether audio-motor training enhances this effect. Participants heard a metrical prime followed by a sentence (with a matching/mismatching prosodic structure), for which they performed a phoneme detection task. Behavioural (RT) data was collected from two groups: one who received audio-motor training, and one who did not. We hypothesised that 1) phonological processing would be enhanced in matching conditions, and 2) audio-motor training with the musical rhythms would enhance this effect. Indeed, providing a matching rhythmic prime context resulted in faster phoneme detection, thus revealing a cross-domain effect of musical rhythm on phonological processing. In addition, our results indicate that rhythmic audio-motor training enhances this priming effect. These results have important implications for rhythm-based speech therapies, and suggest that metrical rhythm in music and speech may rely on shared temporal processing brain resources. Copyright © 2015 Elsevier B.V. All rights reserved.
Central Presbycusis: A Review and Evaluation of the Evidence

PubMed Central

Humes, Larry E.; Dubno, Judy R.; Gordon-Salant, Sandra; Lister, Jennifer J.; Cacace, Anthony T.; Cruickshanks, Karen J.; Gates, George A.; Wilson, Richard H.; Wingfield, Arthur

2018-01-01

Background The authors reviewed the evidence regarding the existence of age-related declines in central auditory processes and the consequences of any such declines for everyday communication. Purpose This report summarizes the review process and presents its findings. Data Collection and Analysis The authors reviewed 165 articles germane to central presbycusis. Of the 165 articles, 132 articles with a focus on human behavioral measures for either speech or nonspeech stimuli were selected for further analysis. Results For 76 smaller-scale studies of speech understanding in older adults reviewed, the following findings emerged: (1) the three most commonly studied behavioral measures were speech in competition, temporally distorted speech, and binaural speech perception (especially dichotic listening); (2) for speech in competition and temporally degraded speech, hearing loss proved to have a significant negative effect on performance in most of the laboratory studies; (3) significant negative effects of age, unconfounded by hearing loss, were observed in most of the studies of speech in competing speech, time-compressed speech, and binaural speech perception; and (4) the influence of cognitive processing on speech understanding has been examined much less frequently, but when included, significant positive associations with speech understanding were observed. For 36 smaller-scale studies of the perception of nonspeech stimuli by older adults reviewed, the following findings emerged: (1) the three most frequently studied behavioral measures were gap detection, temporal discrimination, and temporal-order discrimination or identification; (2) hearing loss was seldom a significant factor; and (3) negative effects of age were almost always observed. For 18 studies reviewed that made use of test batteries and medium-to-large sample sizes, the following findings emerged: (1) all studies included speech-based measures of auditory processing; (2) 4 of the 18 studies included nonspeech stimuli; (3) for the speech-based measures, monaural speech in a competing-speech background, dichotic speech, and monaural time-compressed speech were investigated most frequently; (4) the most frequently used tests were the Synthetic Sentence Identification (SSI) test with Ipsilateral Competing Message (ICM), the Dichotic Sentence Identification (DSI) test, and time-compressed speech; (5) many of these studies using speech-based measures reported significant effects of age, but most of these studies were confounded by declines in hearing, cognition, or both; (6) for nonspeech auditory-processing measures, the focus was on measures of temporal processing in all four studies; (7) effects of cognition on nonspeech measures of auditory processing have been studied less frequently, with mixed results, whereas the effects of hearing loss on performance were minimal due to judicious selection of stimuli; and (8) there is a paucity of observational studies using test batteries and longitudinal designs. Conclusions Based on this review of the scientific literature, there is insufficient evidence to confirm the existence of central presbycusis as an isolated entity. On the other hand, recent evidence has been accumulating in support of the existence of central presbycusis as a multifactorial condition that involves age- and/or disease-related changes in the auditory system and in the brain. Moreover, there is a clear need for additional research in this area. PMID:22967738
Exploring the Roles of Spectral Detail and Intonation Contour in Speech Intelligibility: An fMRI Study

PubMed Central

Kyong, Jeong S.; Scott, Sophie K.; Rosen, Stuart; Howe, Timothy B.; Agnew, Zarinah K.; McGettigan, Carolyn

2014-01-01

The melodic contour of speech forms an important perceptual aspect of tonal and nontonal languages and an important limiting factor on the intelligibility of speech heard through a cochlear implant. Previous work exploring the neural correlates of speech comprehension identified a left-dominant pathway in the temporal lobes supporting the extraction of an intelligible linguistic message, whereas the right anterior temporal lobe showed an overall preference for signals clearly conveying dynamic pitch information. The current study combined modulations of overall intelligibility (through vocoding and spectral inversion) with a manipulation of pitch contour (normal vs. falling) to investigate the processing of spoken sentences in functional MRI. Our overall findings replicate and extend those of Scott et al., whereas greater sentence intelligibility was predominately associated with increased activity in the left STS, the greatest response to normal sentence melody was found right superior temporal gyrus. These data suggest a spatial distinction between brain areas associated with intelligibility and those involved in the processing of dynamic pitch information in speech. By including a set of complexity-matched unintelligible conditions created by spectral inversion, this is additionally the first study reporting a fully factorial exploration of spectrotemporal complexity and spectral inversion as they relate to the neural processing of speech intelligibility. Perhaps surprisingly, there was no evidence for an interaction between the two factors—we discuss the implications for the processing of sound and speech in the dorsolateral temporal lobes. PMID:24568205
Spectral and temporal resolutions of information-bearing acoustic changes for understanding vocoded sentencesa)

PubMed Central

Stilp, Christian E.; Goupell, Matthew J.

2015-01-01

Short-time spectral changes in the speech signal are important for understanding noise-vocoded sentences. These information-bearing acoustic changes, measured using cochlea-scaled entropy in cochlear implant simulations [CSECI; Stilp et al. (2013). J. Acoust. Soc. Am. 133(2), EL136–EL141; Stilp (2014). J. Acoust. Soc. Am. 135(3), 1518–1529], may offer better understanding of speech perception by cochlear implant (CI) users. However, perceptual importance of CSECI for normal-hearing listeners was tested at only one spectral resolution and one temporal resolution, limiting generalizability of results to CI users. Here, experiments investigated the importance of these informational changes for understanding noise-vocoded sentences at different spectral resolutions (4–24 spectral channels; Experiment 1), temporal resolutions (4–64 Hz cutoff for low-pass filters that extracted amplitude envelopes; Experiment 2), or when both parameters varied (6–12 channels, 8–32 Hz; Experiment 3). Sentence intelligibility was reduced more by replacing high-CSECI intervals with noise than replacing low-CSECI intervals, but only when sentences had sufficient spectral and/or temporal resolution. High-CSECI intervals were more important for speech understanding as spectral resolution worsened and temporal resolution improved. Trade-offs between CSECI and intermediate spectral and temporal resolutions were minimal. These results suggest that signal processing strategies that emphasize information-bearing acoustic changes in speech may improve speech perception for CI users. PMID:25698018
The Processing and Interpretation of Verb Phrase Ellipsis Constructions by Children at Normal and Slowed Speech Rates

PubMed Central

Callahan, Sarah M.; Walenski, Matthew; Love, Tracy

2013-01-01

Purpose To examine children’s comprehension of verb phrase (VP) ellipsis constructions in light of their automatic, online structural processing abilities and conscious, metalinguistic reflective skill. Method Forty-two children ages 5 through 12 years listened to VP ellipsis constructions involving the strict/sloppy ambiguity (e.g., “The janitor untangled himself from the rope and the fireman in the elementary school did too after the accident.”) in which the ellipsis phrase (“did too”) had 2 interpretations: (a) strict (“untangled the janitor”) and (b) sloppy (“untangled the fireman”). We examined these sentences at a normal speech rate with an online cross-modal picture priming task (n = 14) and an offline sentence–picture matching task (n = 11). Both tasks were also given with slowed speech input (n = 17). Results Children showed priming for both the strict and sloppy interpretations at a normal speech rate but only for the strict interpretation with slowed input. Offline, children displayed an adultlike preference for the sloppy interpretation with normal-rate input but a divergent pattern with slowed speech. Conclusions Our results suggest that children and adults rely on a hybrid syntax-discourse model for the online comprehension and offline interpretation of VP ellipsis constructions. This model incorporates a temporally sensitive syntactic process of VP reconstruction (disrupted with slow input) and a temporally protracted discourse effect attributed to parallelism (preserved with slow input). PMID:22223886
[Dynamics of functional MRI and speech function in patients after resection of frontal and temporal lobe tumors].

PubMed

Buklina, S B; Batalov, A I; Smirnov, A S; Poddubskaya, A A; Pitskhelauri, D I; Kobyakov, G L; Zhukov, V Yu; Goryaynov, S A; Kulikov, A S; Ogurtsova, A A; Golanov, A V; Varyukhina, M D; Pronin, I N

There are no studies on application of functional MRI (fMRI) for long-term monitoring of the condition of patients after resection of frontal and temporal lobe tumors. The study purpose was to correlate, using fMRI, reorganization of the speech system and dynamics of speech disorders in patients with left hemisphere gliomas before surgery and in the early and late postoperative periods. A total of 20 patients with left hemisphere gliomas were dynamically monitored using fMRI and comprehensive neuropsychological testing. The tumor was located in the frontal lobe in 12 patients and in the temporal lobe in 8 patients. Fifteen patients underwent primary surgery; 5 patients had repeated surgery. Sixteen patients had WHO Grade II and Grade III gliomas; the others had WHO Grade IV gliomas. Nineteen patients were examined preoperatively; 20 patients were examined at different times after surgery. Speech functions were assessed by a Luria's test; the dominant hand was determined using the Annette questionnaire; a family history of left-handedness was investigated. Functional MRI was performed on an HDtx 3.0 T scanner using BrainWavePA 2.0, Z software for fMRI data processing program for all calculations >7, p<0.001. In patients with extensive tumors and recurrent tumors, activation of right-sided homologues of the speech areas cold be detected even before surgery; but in most patients, the activation was detected 3 months or more after surgery. Therefore, reorganization of the speech system took time. Activation of right-sided homologues of the speech areas remained in all patients for up to a year. Simultaneous activation of right-sided homologues of both speech areas, the Broca's and Wernicke's areas, was detected more often in patients with frontal lobe tumors than in those with temporal lobe tumors. No additional activation foci in the left hemisphere were found at the thresholds used to process fMRI data. Recovery of the speech function, to a certain degree, occurred in all patients, but no clear correlation with fMRI data was found. Complex fMRI and neuropsychological studies in 20 patients after resection of frontal and temporal lobe tumors revealed individual features of speech system reorganization within one year follow-up. Probably, activation of right-sided homologues of the speech areas in the presence of left hemisphere tumors depends not only on the severity of speech disorder but also reflects individual involvement of the right hemisphere in enabling speech function. This is confirmed by right-sided activation, according to the fMRI data, in right-sided patients without aphasia and, conversely, the lack of activation of right-sided homologues of the speech areas in several patients with severe postoperative speech disorders during the entire follow-up period.
Suppression of competing speech through entrainment of cortical oscillations

PubMed Central

D'Zmura, Michael; Srinivasan, Ramesh

2013-01-01

People are highly skilled at attending to one speaker in the presence of competitors, but the neural mechanisms supporting this remain unclear. Recent studies have argued that the auditory system enhances the gain of a speech stream relative to competitors by entraining (or “phase-locking”) to the rhythmic structure in its acoustic envelope, thus ensuring that syllables arrive during periods of high neuronal excitability. We hypothesized that such a mechanism could also suppress a competing speech stream by ensuring that syllables arrive during periods of low neuronal excitability. To test this, we analyzed high-density EEG recorded from human adults while they attended to one of two competing, naturalistic speech streams. By calculating the cross-correlation between the EEG channels and the speech envelopes, we found evidence of entrainment to the attended speech's acoustic envelope as well as weaker yet significant entrainment to the unattended speech's envelope. An independent component analysis (ICA) decomposition of the data revealed sources in the posterior temporal cortices that displayed robust correlations to both the attended and unattended envelopes. Critically, in these components the signs of the correlations when attended were opposite those when unattended, consistent with the hypothesized entrainment-based suppressive mechanism. PMID:23515789
Relative contributions of acoustic temporal fine structure and envelope cues for lexical tone perception in noise

PubMed Central

Qi, Beier; Mao, Yitao; Liu, Jiaxing; Liu, Bo; Xu, Li

2017-01-01

Previous studies have shown that lexical tone perception in quiet relies on the acoustic temporal fine structure (TFS) but not on the envelope (E) cues. The contributions of TFS to speech recognition in noise are under debate. In the present study, Mandarin tone tokens were mixed with speech-shaped noise (SSN) or two-talker babble (TTB) at five signal-to-noise ratios (SNRs; −18 to +6 dB). The TFS and E were then extracted from each of the 30 bands using Hilbert transform. Twenty-five combinations of TFS and E from the sound mixtures of the same tone tokens at various SNRs were created. Twenty normal-hearing, native-Mandarin-speaking listeners participated in the tone-recognition test. Results showed that tone-recognition performance improved as the SNRs in either TFS or E increased. The masking effects on tone perception for the TTB were weaker than those for the SSN. For both types of masker, the perceptual weights of TFS and E in tone perception in noise was nearly equivalent, with E playing a slightly greater role than TFS. Thus, the relative contributions of TFS and E cues to lexical tone perception in noise or in competing-talker maskers differ from those in quiet and those to speech perception of non-tonal languages. PMID:28599529
Masking Period Patterns & Forward Masking for Speech-Shaped Noise: Age-related effects

PubMed Central

Grose, John H.; Menezes, Denise C.; Porter, Heather L.; Griz, Silvana

2015-01-01

Objective The purpose of this study was to assess age-related changes in temporal resolution in listeners with relatively normal audiograms. The hypothesis was that increased susceptibility to non-simultaneous masking contributes to the hearing difficulties experienced by older listeners in complex fluctuating backgrounds. Design Participants included younger (n = 11), middle-aged (n = 12), and older (n = 11) listeners with relatively normal audiograms. The first phase of the study measured masking period patterns for speech-shaped noise maskers and signals. From these data, temporal window shapes were derived. The second phase measured forward-masking functions, and assessed how well the temporal window fits accounted for these data. Results The masking period patterns demonstrated increased susceptibility to backward masking in the older listeners, compatible with a more symmetric temporal window in this group. The forward-masking functions exhibited an age-related decline in recovery to baseline thresholds, and there was also an increase in the variability of the temporal window fits to these data. Conclusions This study demonstrated an age-related increase in susceptibility to non-simultaneous masking, supporting the hypothesis that exacerbated non-simultaneous masking contributes to age-related difficulties understanding speech in fluctuating noise. Further support for this hypothesis comes from limited speech-in-noise data suggesting an association between susceptibility to forward masking and speech understanding in modulated noise. PMID:26230495
Seeing the Song: Left Auditory Structures May Track Auditory-Visual Dynamic Alignment

PubMed Central

Mossbridge, Julia A.; Grabowecky, Marcia; Suzuki, Satoru

2013-01-01

Auditory and visual signals generated by a single source tend to be temporally correlated, such as the synchronous sounds of footsteps and the limb movements of a walker. Continuous tracking and comparison of the dynamics of auditory-visual streams is thus useful for the perceptual binding of information arising from a common source. Although language-related mechanisms have been implicated in the tracking of speech-related auditory-visual signals (e.g., speech sounds and lip movements), it is not well known what sensory mechanisms generally track ongoing auditory-visual synchrony for non-speech signals in a complex auditory-visual environment. To begin to address this question, we used music and visual displays that varied in the dynamics of multiple features (e.g., auditory loudness and pitch; visual luminance, color, size, motion, and organization) across multiple time scales. Auditory activity (monitored using auditory steady-state responses, ASSR) was selectively reduced in the left hemisphere when the music and dynamic visual displays were temporally misaligned. Importantly, ASSR was not affected when attentional engagement with the music was reduced, or when visual displays presented dynamics clearly dissimilar to the music. These results appear to suggest that left-lateralized auditory mechanisms are sensitive to auditory-visual temporal alignment, but perhaps only when the dynamics of auditory and visual streams are similar. These mechanisms may contribute to correct auditory-visual binding in a busy sensory environment. PMID:24194873
Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants

PubMed Central

Leybaert, Jacqueline; LaSasso, Carol J.

2010-01-01

Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357
Neurophysiological aspects of brainstem processing of speech stimuli in audiometric-normal geriatric population.

PubMed

Ansari, M S; Rangasayee, R; Ansari, M A H

2017-03-01

Poor auditory speech perception in geriatrics is attributable to neural de-synchronisation due to structural and degenerative changes of ageing auditory pathways. The speech-evoked auditory brainstem response may be useful for detecting alterations that cause loss of speech discrimination. Therefore, this study aimed to compare the speech-evoked auditory brainstem response in adult and geriatric populations with normal hearing. The auditory brainstem responses to click sounds and to a 40 ms speech sound (the Hindi phoneme |da|) were compared in 25 young adults and 25 geriatric people with normal hearing. The latencies and amplitudes of transient peaks representing neural responses to the onset, offset and sustained portions of the speech stimulus in quiet and noisy conditions were recorded. The older group had significantly smaller amplitudes and longer latencies for the onset and offset responses to |da| in noisy conditions. Stimulus-to-response times were longer and the spectral amplitude of the sustained portion of the stimulus was reduced. The overall stimulus level caused significant shifts in latency across the entire speech-evoked auditory brainstem response in the older group. The reduction in neural speech processing in older adults suggests diminished subcortical responsiveness to acoustically dynamic spectral cues. However, further investigations are needed to encode temporal cues at the brainstem level and determine their relationship to speech perception for developing a routine tool for clinical decision-making.
Asymmetric Dynamic Attunement of Speech and Gestures in the Construction of Children's Understanding.

PubMed

De Jonge-Hoekstra, Lisette; Van der Steen, Steffie; Van Geert, Paul; Cox, Ralf F A

2016-01-01

As children learn they use their speech to express words and their hands to gesture. This study investigates the interplay between real-time gestures and speech as children construct cognitive understanding during a hands-on science task. 12 children (M = 6, F = 6) from Kindergarten (n = 5) and first grade (n = 7) participated in this study. Each verbal utterance and gesture during the task were coded, on a complexity scale derived from dynamic skill theory. To explore the interplay between speech and gestures, we applied a cross recurrence quantification analysis (CRQA) to the two coupled time series of the skill levels of verbalizations and gestures. The analysis focused on (1) the temporal relation between gestures and speech, (2) the relative strength and direction of the interaction between gestures and speech, (3) the relative strength and direction between gestures and speech for different levels of understanding, and (4) relations between CRQA measures and other child characteristics. The results show that older and younger children differ in the (temporal) asymmetry in the gestures-speech interaction. For younger children, the balance leans more toward gestures leading speech in time, while the balance leans more toward speech leading gestures for older children. Secondly, at the group level, speech attracts gestures in a more dynamically stable fashion than vice versa, and this asymmetry in gestures and speech extends to lower and higher understanding levels. Yet, for older children, the mutual coupling between gestures and speech is more dynamically stable regarding the higher understanding levels. Gestures and speech are more synchronized in time as children are older. A higher score on schools' language tests is related to speech attracting gestures more rigidly and more asymmetry between gestures and speech, only for the less difficult understanding levels. A higher score on math or past science tasks is related to less asymmetry between gestures and speech. The picture that emerges from our analyses suggests that the relation between gestures, speech and cognition is more complex than previously thought. We suggest that temporal differences and asymmetry in influence between gestures and speech arise from simultaneous coordination of synergies.
Top-down Processes in Simulated Electric-Acoustic Hearing: The Effect of Linguistic Context on Bimodal Benefit for Temporally Interrupted Speech

PubMed Central

Oh, Soo Hee; Donaldson, Gail S.; Kong, Ying-Yee

2016-01-01

Objectives Previous studies have documented the benefits of bimodal hearing as compared with a CI alone, but most have focused on the importance of bottom-up, low-frequency cues. The purpose of the present study was to evaluate the role of top-down processing in bimodal hearing by measuring the effect of sentence context on bimodal benefit for temporally interrupted sentences. It was hypothesized that low-frequency acoustic cues would facilitate the use of contextual information in the interrupted sentences, resulting in greater bimodal benefit for the higher context (CUNY) sentences than for the lower context (IEEE) sentences. Design Young normal-hearing listeners were tested in simulated bimodal listening conditions in which noise band vocoded sentences were presented to one ear with or without low-pass (LP) filtered speech or LP harmonic complexes (LPHCs) presented to the contralateral ear. Speech recognition scores were measured in three listening conditions: vocoder-alone, vocoder combined with LP speech, and vocoder combined with LPHCs. Temporally interrupted versions of the CUNY and IEEE sentences were used to assess listeners’ ability to fill in missing segments of speech by using top-down linguistic processing. Sentences were square-wave gated at a rate of 5 Hz with a 50 percent duty cycle. Three vocoder channel conditions were tested for each type of sentence (8, 12, and 16 channels for CUNY; 12, 16, and 32 channels for IEEE) and bimodal benefit was compared for similar amounts of spectral degradation (matched-channel comparisons) and similar ranges of baseline performance. Two gain measures, percentage-point gain and normalized gain, were examined. Results Significant effects of context on bimodal benefit were observed when LP speech was presented to the residual-hearing ear. For the matched-channel comparisons, CUNY sentences showed significantly higher normalized gains than IEEE sentences for both the 12-channel (20 points higher) and 16-channel (18 points higher) conditions. For the individual gain comparisons that used a similar range of baseline performance, CUNY sentences showed bimodal benefits that were significantly higher (7 percentage points, or 15 points normalized gain) than those for IEEE sentences. The bimodal benefits observed here for temporally interrupted speech were considerably smaller than those observed in an earlier study that used continuous speech (Kong et al., 2015). Further, unlike previous findings for continuous speech, no bimodal benefit was observed when LPHCs were presented to the LP ear. Conclusions Findings indicate that linguistic context has a significant influence on bimodal benefit for temporally interrupted speech and support the hypothesis that low-frequency acoustic information presented to the residual-hearing ear facilitates the use of top-down linguistic processing in bimodal hearing. However, bimodal benefit is reduced for temporally interrupted speech as compared to continuous speech, suggesting that listeners’ ability to restore missing speech information depends not only on top-down linguistic knowledge, but also on the quality of the bottom-up sensory input. PMID:27007220
Perceptual weighting of individual and concurrent cues for sentence intelligibility: Frequency, envelope, and fine structure

PubMed Central

Fogerty, Daniel

2011-01-01

The speech signal may be divided into frequency bands, each containing temporal properties of the envelope and fine structure. For maximal speech understanding, listeners must allocate their perceptual resources to the most informative acoustic properties. Understanding this perceptual weighting is essential for the design of assistive listening devices that need to preserve these important speech cues. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for sentence materials. Perceptual weights were obtained under two listening contexts: (1) when each acoustic property was presented individually and (2) when multiple acoustic properties were available concurrently. The processing method was designed to vary the availability of each acoustic property independently by adding noise at different levels. Perceptual weights were determined by correlating a listener’s performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated that weights were (1) equal when acoustic properties were presented individually and (2) biased toward envelope and mid-frequency information when multiple properties were available. Results suggest a complex interaction between the available acoustic properties and the listening context in determining how best to allocate perceptual resources when listening to speech in noise. PMID:21361454
Frontal and temporal contributions to understanding the iconic co-speech gestures that accompany speech

PubMed Central

Dick, Anthony Steven; Mok, Eva H.; Beharelle, Anjali Raja; Goldin-Meadow, Susan; Small, Steven L.

2013-01-01

In everyday conversation, listeners often rely on a speaker’s gestures to clarify any ambiguities in the verbal message. Using fMRI during naturalistic story comprehension, we examined which brain regions in the listener are sensitive to speakers’ iconic gestures. We focused on iconic gestures that contribute information not found in the speaker’s talk, compared to those that convey information redundant with the speaker’s talk. We found that three regions—left inferior frontal gyrus triangular (IFGTr) and opercular (IFGOp) portions, and left posterior middle temporal gyrus (MTGp)—responded more strongly when gestures added information to non-specific language, compared to when they conveyed the same information in more specific language; in other words, when gesture disambiguated speech as opposed to reinforced it. An increased BOLD response was not found in these regions when the non-specific language was produced without gesture, suggesting that IFGTr, IFGOp, and MTGp are involved in integrating semantic information across gesture and speech. In addition, we found that activity in the posterior superior temporal sulcus (STSp), previously thought to be involved in gesture-speech integration, was not sensitive to the gesture-speech relation. Together, these findings clarify the neurobiology of gesture-speech integration and contribute to an emerging picture of how listeners glean meaning from gestures that accompany speech. PMID:23238964
Temporal factors affecting somatosensory–auditory interactions in speech processing

PubMed Central

Ito, Takayuki; Gracco, Vincent L.; Ostry, David J.

2014-01-01

Speech perception is known to rely on both auditory and visual information. However, sound-specific somatosensory input has been shown also to influence speech perceptual processing (Ito et al., 2009). In the present study, we addressed further the relationship between somatosensory information and speech perceptual processing by addressing the hypothesis that the temporal relationship between orofacial movement and sound processing contributes to somatosensory–auditory interaction in speech perception. We examined the changes in event-related potentials (ERPs) in response to multisensory synchronous (simultaneous) and asynchronous (90 ms lag and lead) somatosensory and auditory stimulation compared to individual unisensory auditory and somatosensory stimulation alone. We used a robotic device to apply facial skin somatosensory deformations that were similar in timing and duration to those experienced in speech production. Following synchronous multisensory stimulation the amplitude of the ERP was reliably different from the two unisensory potentials. More importantly, the magnitude of the ERP difference varied as a function of the relative timing of the somatosensory–auditory stimulation. Event-related activity change due to stimulus timing was seen between 160 and 220 ms following somatosensory onset, mostly around the parietal area. The results demonstrate a dynamic modulation of somatosensory–auditory convergence and suggest the contribution of somatosensory information for speech processing process is dependent on the specific temporal order of sensory inputs in speech production. PMID:25452733

Pronunciation difficulty, temporal regularity, and the speech-to-song illusion.

PubMed

Margulis, Elizabeth H; Simchy-Gross, Rhimmon; Black, Justin L

2015-01-01

The speech-to-song illusion (Deutsch et al., 2011) tracks the perceptual transformation from speech to song across repetitions of a brief spoken utterance. Because it involves no change in the stimulus itself, but a dramatic change in its perceived affiliation to speech or to music, it presents a unique opportunity to comparatively investigate the processing of language and music. In this study, native English-speaking participants were presented with brief spoken utterances that were subsequently repeated ten times. The utterances were drawn either from languages that are relatively difficult for a native English speaker to pronounce, or languages that are relatively easy for a native English speaker to pronounce. Moreover, the repetition could occur at regular or irregular temporal intervals. Participants rated the utterances before and after the repetitions on a 5-point Likert-like scale ranging from "sounds exactly like speech" to "sounds exactly like singing." The difference in ratings before and after was taken as a measure of the strength of the speech-to-song illusion in each case. The speech-to-song illusion occurred regardless of whether the repetitions were spaced at regular temporal intervals or not; however, it occurred more readily if the utterance was spoken in a language difficult for a native English speaker to pronounce. Speech circuitry seemed more liable to capture native and easy-to-pronounce languages, and more reluctant to relinquish them to perceived song across repetitions.
Multichannel Compression, Temporal Cues, and Audibility.

ERIC Educational Resources Information Center

Souza, Pamela E.; Turner, Christopher W.

1998-01-01

The effect of the reduction of the temporal envelope produced by multichannel compression on recognition was examined in 16 listeners with hearing loss, with particular focus on audibility of the speech signal. Multichannel compression improved speech recognition when superior audibility was provided by a two-channel compression system over linear…
Left Superior Temporal Gyrus Is Coupled to Attended Speech in a Cocktail-Party Auditory Scene.

PubMed

Vander Ghinst, Marc; Bourguignon, Mathieu; Op de Beeck, Marc; Wens, Vincent; Marty, Brice; Hassid, Sergio; Choufani, Georges; Jousmäki, Veikko; Hari, Riitta; Van Bogaert, Patrick; Goldman, Serge; De Tiège, Xavier

2016-02-03

Using a continuous listening task, we evaluated the coupling between the listener's cortical activity and the temporal envelopes of different sounds in a multitalker auditory scene using magnetoencephalography and corticovocal coherence analysis. Neuromagnetic signals were recorded from 20 right-handed healthy adult humans who listened to five different recorded stories (attended speech streams), one without any multitalker background (No noise) and four mixed with a "cocktail party" multitalker background noise at four signal-to-noise ratios (5, 0, -5, and -10 dB) to produce speech-in-noise mixtures, here referred to as Global scene. Coherence analysis revealed that the modulations of the attended speech stream, presented without multitalker background, were coupled at ∼0.5 Hz to the activity of both superior temporal gyri, whereas the modulations at 4-8 Hz were coupled to the activity of the right supratemporal auditory cortex. In cocktail party conditions, with the multitalker background noise, the coupling was at both frequencies stronger for the attended speech stream than for the unattended Multitalker background. The coupling strengths decreased as the Multitalker background increased. During the cocktail party conditions, the ∼0.5 Hz coupling became left-hemisphere dominant, compared with bilateral coupling without the multitalker background, whereas the 4-8 Hz coupling remained right-hemisphere lateralized in both conditions. The brain activity was not coupled to the multitalker background or to its individual talkers. The results highlight the key role of listener's left superior temporal gyri in extracting the slow ∼0.5 Hz modulations, likely reflecting the attended speech stream within a multitalker auditory scene. When people listen to one person in a "cocktail party," their auditory cortex mainly follows the attended speech stream rather than the entire auditory scene. However, how the brain extracts the attended speech stream from the whole auditory scene and how increasing background noise corrupts this process is still debated. In this magnetoencephalography study, subjects had to attend a speech stream with or without multitalker background noise. Results argue for frequency-dependent cortical tracking mechanisms for the attended speech stream. The left superior temporal gyrus tracked the ∼0.5 Hz modulations of the attended speech stream only when the speech was embedded in multitalker background, whereas the right supratemporal auditory cortex tracked 4-8 Hz modulations during both noiseless and cocktail-party conditions. Copyright © 2016 the authors 0270-6474/16/361597-11$15.00/0.
Audiovisual Perception of Noise Vocoded Speech in Dyslexic and Non-Dyslexic Adults: The Role of Low-Frequency Visual Modulations

ERIC Educational Resources Information Center

Megnin-Viggars, Odette; Goswami, Usha

2013-01-01

Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…
Patterns of Post-Stroke Brain Damage that Predict Speech Production Errors in Apraxia of Speech and Aphasia Dissociate

PubMed Central

Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius

2015-01-01

Background and Purpose Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions regarding if AOS emerges from a unique pattern of brain damage or as a sub-element of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Methods Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The Apraxia of Speech Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with AOS and/or aphasia. Localized brain damage was identified using structural MRI, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS and/or aphasia, and brain damage. Results The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS and/or aphasia were associated with damage to the temporal lobe and the inferior pre-central frontal regions. Conclusion AOS likely occurs in conjunction with aphasia due to the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. PMID:25908457
The effects of noise exposure and musical training on suprathreshold auditory processing and speech perception in noise.

PubMed

Yeend, Ingrid; Beach, Elizabeth Francis; Sharma, Mridula; Dillon, Harvey

2017-09-01

Recent animal research has shown that exposure to single episodes of intense noise causes cochlear synaptopathy without affecting hearing thresholds. It has been suggested that the same may occur in humans. If so, it is hypothesized that this would result in impaired encoding of sound and lead to difficulties hearing at suprathreshold levels, particularly in challenging listening environments. The primary aim of this study was to investigate the effect of noise exposure on auditory processing, including the perception of speech in noise, in adult humans. A secondary aim was to explore whether musical training might improve some aspects of auditory processing and thus counteract or ameliorate any negative impacts of noise exposure. In a sample of 122 participants (63 female) aged 30-57 years with normal or near-normal hearing thresholds, we conducted audiometric tests, including tympanometry, audiometry, acoustic reflexes, otoacoustic emissions and medial olivocochlear responses. We also assessed temporal and spectral processing, by determining thresholds for detection of amplitude modulation and temporal fine structure. We assessed speech-in-noise perception, and conducted tests of attention, memory and sentence closure. We also calculated participants' accumulated lifetime noise exposure and administered questionnaires to assess self-reported listening difficulty and musical training. The results showed no clear link between participants' lifetime noise exposure and performance on any of the auditory processing or speech-in-noise tasks. Musical training was associated with better performance on the auditory processing tasks, but not the on the speech-in-noise perception tasks. The results indicate that sentence closure skills, working memory, attention, extended high frequency hearing thresholds and medial olivocochlear suppression strength are important factors that are related to the ability to process speech in noise. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Perception of Intersensory Synchrony in Audiovisual Speech: Not that Special

ERIC Educational Resources Information Center

Vroomen, Jean; Stekelenburg, Jeroen J.

2011-01-01

Perception of intersensory temporal order is particularly difficult for (continuous) audiovisual speech, as perceivers may find it difficult to notice substantial timing differences between speech sounds and lip movements. Here we tested whether this occurs because audiovisual speech is strongly paired ("unity assumption"). Participants made…
Changes in Speech Production Associated with Alphabet Supplementation

ERIC Educational Resources Information Center

Hustad, Katherine C.; Lee, Jimin

2008-01-01

Purpose: This study examined the effect of alphabet supplementation (AS) on temporal and spectral features of speech production in individuals with cerebral palsy and dysarthria. Method: Twelve speakers with dysarthria contributed speech samples using habitual speech and while using AS. One hundred twenty listeners orthographically transcribed…
Automatic measurement of voice onset time using discriminative structured prediction.

PubMed

Sonderegger, Morgan; Keshet, Joseph

2012-12-01

A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets.
Relationships between music training, speech processing, and word learning: a network perspective.

PubMed

Elmer, Stefan; Jäncke, Lutz

2018-03-15

Numerous studies have documented the behavioral advantages conferred on professional musicians and children undergoing music training in processing speech sounds varying in the spectral and temporal dimensions. These beneficial effects have previously often been associated with local functional and structural changes in the auditory cortex (AC). However, this perspective is oversimplified, in that it does not take into account the intrinsic organization of the human brain, namely, neural networks and oscillatory dynamics. Therefore, we propose a new framework for extending these previous findings to a network perspective by integrating multimodal imaging, electrophysiology, and neural oscillations. In particular, we provide concrete examples of how functional and structural connectivity can be used to model simple neural circuits exerting a modulatory influence on AC activity. In addition, we describe how such a network approach can be used for better comprehending the beneficial effects of music training on more complex speech functions, such as word learning. © 2018 New York Academy of Sciences.
Impaired auditory temporal selectivity in the inferior colliculus of aged Mongolian gerbils.

PubMed

Khouri, Leila; Lesica, Nicholas A; Grothe, Benedikt

2011-07-06

Aged humans show severe difficulties in temporal auditory processing tasks (e.g., speech recognition in noise, low-frequency sound localization, gap detection). A degradation of auditory function with age is also evident in experimental animals. To investigate age-related changes in temporal processing, we compared extracellular responses to temporally variable pulse trains and human speech in the inferior colliculus of young adult (3 month) and aged (3 years) Mongolian gerbils. We observed a significant decrease of selectivity to the pulse trains in neuronal responses from aged animals. This decrease in selectivity led, on the population level, to an increase in signal correlations and therefore a decrease in heterogeneity of temporal receptive fields and a decreased efficiency in encoding of speech signals. A decrease in selectivity to temporal modulations is consistent with a downregulation of the inhibitory transmitter system in aged animals. These alterations in temporal processing could underlie declines in the aging auditory system, which are unrelated to peripheral hearing loss. These declines cannot be compensated by traditional hearing aids (that rely on amplification of sound) but may rather require pharmacological treatment.
Speech processing: from peripheral to hemispheric asymmetry of the auditory system.

PubMed

Lazard, Diane S; Collette, Jean-Louis; Perrot, Xavier

2012-01-01

Language processing from the cochlea to auditory association cortices shows side-dependent specificities with an apparent left hemispheric dominance. The aim of this article was to propose to nonspeech specialists a didactic review of two complementary theories about hemispheric asymmetry in speech processing. Starting from anatomico-physiological and clinical observations of auditory asymmetry and interhemispheric connections, this review then exposes behavioral (dichotic listening paradigm) as well as functional (functional magnetic resonance imaging and positron emission tomography) experiments that assessed hemispheric specialization for speech processing. Even though speech at an early phonological level is regarded as being processed bilaterally, a left-hemispheric dominance exists for higher-level processing. This asymmetry may arise from a segregation of the speech signal, broken apart within nonprimary auditory areas in two distinct temporal integration windows--a fast one on the left and a slower one on the right--modeled through the asymmetric sampling in time theory or a spectro-temporal trade-off, with a higher temporal resolution in the left hemisphere and a higher spectral resolution in the right hemisphere, modeled through the spectral/temporal resolution trade-off theory. Both theories deal with the concept that lower-order tuning principles for acoustic signal might drive higher-order organization for speech processing. However, the precise nature, mechanisms, and origin of speech processing asymmetry are still being debated. Finally, an example of hemispheric asymmetry alteration, which has direct clinical implications, is given through the case of auditory aging that mixes peripheral disorder and modifications of central processing. Copyright © 2011 The American Laryngological, Rhinological, and Otological Society, Inc.
Binaural sluggishness in the perception of tone sequences and speech in noise.

PubMed

Culling, J F; Colburn, H S

2000-01-01

The binaural system is well-known for its sluggish response to changes in the interaural parameters to which it is sensitive. Theories of binaural unmasking have suggested that detection of signals in noise is mediated by detection of differences in interaural correlation. If these theories are correct, improvements in the intelligibility of speech in favorable binaural conditions is most likely mediated by spectro-temporal variations in interaural correlation of the stimulus which mirror the spectro-temporal amplitude modulations of the speech. However, binaural sluggishness should limit the temporal resolution of the representation of speech recovered by this means. The present study tested this prediction in two ways. First, listeners' masked discrimination thresholds for ascending vs descending pure-tone arpeggios were measured as a function of rate of frequency change in the NoSo and NoSpi binaural configurations. Three-tone arpeggios were presented repeatedly and continuously for 1.6 s, masked by a 1.6-s burst of noise. In a two-interval task, listeners determined the interval in which the arpeggios were ascending. The results showed a binaural advantage of 12-14 dB for NoSpi at 3.3 arpeggios per s (arp/s), which reduced to 3-5 dB at 10.4 arp/s. This outcome confirmed that the discrimination of spectro-temporal patterns in noise is susceptible to the effects of binaural sluggishness. Second, listeners' masked speech-reception thresholds were measured in speech-shaped noise using speech which was 1, 1.5, and 2 times the original articulation rate. The articulation rate was increased using a phase-vocoder technique which increased all the modulation frequencies in the speech without altering its pitch. Speech-reception thresholds were, on average, 5.2 dB lower for the NoSpi than for the NoSo configuration, at the original articulation rate. This binaural masking release was reduced to 2.8 dB when the articulation rate was doubled, but the most notable effect was a 6-8 dB increase in thresholds with articulation rate for both configurations. These results suggest that higher modulation frequencies in masked signals cannot be temporally resolved by the binaural system, but that the useful modulation frequencies in speech are sufficiently low (<5 Hz) that they are invulnerable to the effects of binaural sluggishness, even at elevated articulation rates.
Syllabic (~2-5 Hz) and fluctuation (~1-10 Hz) ranges in speech and auditory processing

PubMed Central

Edwards, Erik; Chang, Edward F.

2013-01-01

Given recent interest in syllabic rates (~2-5 Hz) for speech processing, we review the perception of “fluctuation” range (~1-10 Hz) modulations during listening to speech and technical auditory stimuli (AM and FM tones and noises, and ripple sounds). We find evidence that the temporal modulation transfer function (TMTF) of human auditory perception is not simply low-pass in nature, but rather exhibits a peak in sensitivity in the syllabic range (~2-5 Hz). We also address human and animal neurophysiological evidence, and argue that this bandpass tuning arises at the thalamocortical level and is more associated with non-primary regions than primary regions of cortex. The bandpass rather than low-pass TMTF has implications for modeling auditory central physiology and speech processing: this implicates temporal contrast rather than simple temporal integration, with contrast enhancement for dynamic stimuli in the fluctuation range. PMID:24035819
Oral Articulatory Control in Childhood Apraxia of Speech

ERIC Educational Resources Information Center

Grigos, Maria I.; Moss, Aviva; Lu, Ying

2015-01-01

Purpose: The purpose of this research was to examine spatial and temporal aspects of articulatory control in children with childhood apraxia of speech (CAS), children with speech delay characterized by an articulation/phonological impairment (SD), and controls with typical development (TD) during speech tasks that increased in word length. Method:…
Stochastic Time Models of Syllable Structure

PubMed Central

Shaw, Jason A.; Gafos, Adamantios I.

2015-01-01

Drawing on phonology research within the generative linguistics tradition, stochastic methods, and notions from complex systems, we develop a modelling paradigm linking phonological structure, expressed in terms of syllables, to speech movement data acquired with 3D electromagnetic articulography and X-ray microbeam methods. The essential variable in the models is syllable structure. When mapped to discrete coordination topologies, syllabic organization imposes systematic patterns of variability on the temporal dynamics of speech articulation. We simulated these dynamics under different syllabic parses and evaluated simulations against experimental data from Arabic and English, two languages claimed to parse similar strings of segments into different syllabic structures. Model simulations replicated several key experimental results, including the fallibility of past phonetic heuristics for syllable structure, and exposed the range of conditions under which such heuristics remain valid. More importantly, the modelling approach consistently diagnosed syllable structure proving resilient to multiple sources of variability in experimental data including measurement variability, speaker variability, and contextual variability. Prospects for extensions of our modelling paradigm to acoustic data are also discussed. PMID:25996153
The influence of the visual modality on language structure and conventionalization: insights from sign language and gesture.

PubMed

Perniss, Pamela; Özyürek, Asli; Morgan, Gary

2015-01-01

For humans, the ability to communicate and use language is instantiated not only in the vocal modality but also in the visual modality. The main examples of this are sign languages and (co-speech) gestures. Sign languages, the natural languages of Deaf communities, use systematic and conventionalized movements of the hands, face, and body for linguistic expression. Co-speech gestures, though non-linguistic, are produced in tight semantic and temporal integration with speech and constitute an integral part of language together with speech. The articles in this issue explore and document how gestures and sign languages are similar or different and how communicative expression in the visual modality can change from being gestural to grammatical in nature through processes of conventionalization. As such, this issue contributes to our understanding of how the visual modality shapes language and the emergence of linguistic structure in newly developing systems. Studying the relationship between signs and gestures provides a new window onto the human ability to recruit multiple levels of representation (e.g., categorical, gradient, iconic, abstract) in the service of using or creating conventionalized communicative systems. Copyright © 2015 Cognitive Science Society, Inc.
Patterns of poststroke brain damage that predict speech production errors in apraxia of speech and aphasia dissociate.

PubMed

Basilakos, Alexandra; Rorden, Chris; Bonilha, Leonardo; Moser, Dana; Fridriksson, Julius

2015-06-01

Acquired apraxia of speech (AOS) is a motor speech disorder caused by brain damage. AOS often co-occurs with aphasia, a language disorder in which patients may also demonstrate speech production errors. The overlap of speech production deficits in both disorders has raised questions on whether AOS emerges from a unique pattern of brain damage or as a subelement of the aphasic syndrome. The purpose of this study was to determine whether speech production errors in AOS and aphasia are associated with distinctive patterns of brain injury. Forty-three patients with history of a single left-hemisphere stroke underwent comprehensive speech and language testing. The AOS Rating Scale was used to rate speech errors specific to AOS versus speech errors that can also be associated with both AOS and aphasia. Localized brain damage was identified using structural magnetic resonance imaging, and voxel-based lesion-impairment mapping was used to evaluate the relationship between speech errors specific to AOS, those that can occur in AOS or aphasia, and brain damage. The pattern of brain damage associated with AOS was most strongly associated with damage to cortical motor regions, with additional involvement of somatosensory areas. Speech production deficits that could be attributed to AOS or aphasia were associated with damage to the temporal lobe and the inferior precentral frontal regions. AOS likely occurs in conjunction with aphasia because of the proximity of the brain areas supporting speech and language, but the neurobiological substrate for each disorder differs. © 2015 American Heart Association, Inc.
Perception of Sentence Stress in Speech Correlates With the Temporal Unpredictability of Prosodic Features.

PubMed

Kakouros, Sofoklis; Räsänen, Okko

2016-09-01

Numerous studies have examined the acoustic correlates of sentential stress and its underlying linguistic functionality. However, the mechanism that connects stress cues to the listener's attentional processing has remained unclear. Also, the learnability versus innateness of stress perception has not been widely discussed. In this work, we introduce a novel perspective to the study of sentential stress and put forward the hypothesis that perceived sentence stress in speech is related to the unpredictability of prosodic features, thereby capturing the attention of the listener. As predictability is based on the statistical structure of the speech input, the hypothesis also suggests that stress perception is a result of general statistical learning mechanisms. To study this idea, computational simulations are performed where temporal prosodic trajectories are modeled with an n-gram model. Probabilities of the feature trajectories are subsequently evaluated on a set of novel utterances and compared to human perception of stress. The results show that the low-probability regions of F0 and energy trajectories are strongly correlated with stress perception, giving support to the idea that attention and unpredictability of sensory stimulus are mutually connected. Copyright © 2015 Cognitive Science Society, Inc.
Prosodic Temporal Alignment of Co-Speech Gestures to Speech Facilitates Referent Resolution

ERIC Educational Resources Information Center

Jesse, Alexandra; Johnson, Elizabeth K.

2012-01-01

Using a referent detection paradigm, we examined whether listeners can determine the object speakers are referring to by using the temporal alignment between the motion speakers impose on objects and their labeling utterances. Stimuli were created by videotaping speakers labeling a novel creature. Without being explicitly instructed to do so,…

Bilateral Capacity for Speech Sound Processing in Auditory Comprehension: Evidence from Wada Procedures

ERIC Educational Resources Information Center

Hickok, G.; Okada, K.; Barr, W.; Pa, J.; Rogalsky, C.; Donnelly, K.; Barde, L.; Grant, A.

2008-01-01

Data from lesion studies suggest that the ability to perceive speech sounds, as measured by auditory comprehension tasks, is supported by temporal lobe systems in both the left and right hemisphere. For example, patients with left temporal lobe damage and auditory comprehension deficits (i.e., Wernicke's aphasics), nonetheless comprehend isolated…
Auditory and Speech Processing and Reading Development in Chinese School Children: Behavioural and ERP Evidence

ERIC Educational Resources Information Center

Meng, Xiangzhi; Sai, Xiaoguang; Wang, Cixin; Wang, Jue; Sha, Shuying; Zhou, Xiaolin

2005-01-01

By measuring behavioural performance and event-related potentials (ERPs) this study investigated the extent to which Chinese school children's reading development is influenced by their skills in auditory, speech, and temporal processing. In Experiment 1, 102 normal school children's performance in pure tone temporal order judgment, tone frequency…
The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech.

PubMed

Crosse, Michael J; Lalor, Edmund C

2014-04-01

Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information.
Dynamic Encoding of Speech Sequence Probability in Human Temporal Cortex

PubMed Central

Leonard, Matthew K.; Bouchard, Kristofer E.; Tang, Claire

2015-01-01

Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning. PMID:25948269
Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech.

PubMed

Shahin, Antoine J; Shen, Stanley; Kerlin, Jess R

2017-01-01

We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync . Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing

PubMed Central

Doelling, Keith; Arnal, Luc; Ghitza, Oded; Poeppel, David

2013-01-01

A growing body of research suggests that intrinsic neuronal slow (< 10 Hz) oscillations in auditory cortex appear to track incoming speech and other spectro-temporally complex auditory signals. Within this framework, several recent studies have identified critical-band temporal envelopes as the specific acoustic feature being reflected by the phase of these oscillations. However, how this alignment between speech acoustics and neural oscillations might underpin intelligibility is unclear. Here we test the hypothesis that the ‘sharpness’ of temporal fluctuations in the critical band envelope acts as a temporal cue to speech syllabic rate, driving delta-theta rhythms to track the stimulus and facilitate intelligibility. We interpret our findings as evidence that sharp events in the stimulus cause cortical rhythms to re-align and parse the stimulus into syllable-sized chunks for further decoding. Using magnetoencephalographic recordings, we show that by removing temporal fluctuations that occur at the syllabic rate, envelope-tracking activity is reduced. By artificially reinstating these temporal fluctuations, envelope-tracking activity is regained. These changes in tracking correlate with intelligibility of the stimulus. Together, the results suggest that the sharpness of fluctuations in the stimulus, as reflected in the cochlear output, drive oscillatory activity to track and entrain to the stimulus, at its syllabic rate. This process likely facilitates parsing of the stimulus into meaningful chunks appropriate for subsequent decoding, enhancing perception and intelligibility. PMID:23791839
Congenital amusia: a cognitive disorder limited to resolved harmonics and with no peripheral basis.

PubMed

Cousineau, Marion; Oxenham, Andrew J; Peretz, Isabelle

2015-01-01

Pitch plays a fundamental role in audition, from speech and music perception to auditory scene analysis. Congenital amusia is a neurogenetic disorder that appears to affect primarily pitch and melody perception. Pitch is normally conveyed by the spectro-temporal fine structure of low harmonics, but some pitch information is available in the temporal envelope produced by the interactions of higher harmonics. Using 10 amusic subjects and 10 matched controls, we tested the hypothesis that amusics suffer exclusively from impaired processing of spectro-temporal fine structure. We also tested whether the inability of amusics to process acoustic temporal fine structure extends beyond pitch by measuring sensitivity to interaural time differences, which also rely on temporal fine structure. Further tests were carried out on basic intensity and spectral resolution. As expected, pitch perception based on spectro-temporal fine structure was impaired in amusics; however, no significant deficits were observed in amusics' ability to perceive the pitch conveyed via temporal-envelope cues. Sensitivity to interaural time differences was also not significantly different between the amusic and control groups, ruling out deficits in the peripheral coding of temporal fine structure. Finally, no significant differences in intensity or spectral resolution were found between the amusic and control groups. The results demonstrate a pitch-specific deficit in fine spectro-temporal information processing in amusia that seems unrelated to temporal or spectral coding in the auditory periphery. These results are consistent with the view that there are distinct mechanisms dedicated to processing resolved and unresolved harmonics in the general population, the former being altered in congenital amusia while the latter is spared. Copyright © 2014 Elsevier Ltd. All rights reserved.
Statistical Learning, Syllable Processing, and Speech Production in Healthy Hearing and Hearing-Impaired Preschool Children: A Mismatch Negativity Study.

PubMed

Studer-Eichenberger, Esther; Studer-Eichenberger, Felix; Koenig, Thomas

2016-01-01

The objectives of the present study were to investigate temporal/spectral sound-feature processing in preschool children (4 to 7 years old) with peripheral hearing loss compared with age-matched controls. The results verified the presence of statistical learning, which was diminished in children with hearing impairments (HIs), and elucidated possible perceptual mediators of speech production. Perception and production of the syllables /ba/, /da/, /ta/, and /na/ were recorded in 13 children with normal hearing and 13 children with HI. Perception was assessed physiologically through event-related potentials (ERPs) recorded by EEG in a multifeature mismatch negativity paradigm and behaviorally through a discrimination task. Temporal and spectral features of the ERPs during speech perception were analyzed, and speech production was quantitatively evaluated using speech motor maximum performance tasks. Proximal to stimulus onset, children with HI displayed a difference in map topography, indicating diminished statistical learning. In later ERP components, children with HI exhibited reduced amplitudes in the N2 and early parts of the late disciminative negativity components specifically, which are associated with temporal and spectral control mechanisms. Abnormalities of speech perception were only subtly reflected in speech production, as the lone difference found in speech production studies was a mild delay in regulating speech intensity. In addition to previously reported deficits of sound-feature discriminations, the present study results reflect diminished statistical learning in children with HI, which plays an early and important, but so far neglected, role in phonological processing. Furthermore, the lack of corresponding behavioral abnormalities in speech production implies that impaired perceptual capacities do not necessarily translate into productive deficits.
Transcranial Magnetic Stimulation over Left Inferior Frontal and Posterior Temporal Cortex Disrupts Gesture-Speech Integration.

PubMed

Zhao, Wanying; Riggs, Kevin; Schindler, Igor; Holle, Henning

2018-02-21

Language and action naturally occur together in the form of cospeech gestures, and there is now convincing evidence that listeners display a strong tendency to integrate semantic information from both domains during comprehension. A contentious question, however, has been which brain areas are causally involved in this integration process. In previous neuroimaging studies, left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) have emerged as candidate areas; however, it is currently not clear whether these areas are causally or merely epiphenomenally involved in gesture-speech integration. In the present series of experiments, we directly tested for a potential critical role of IFG and pMTG by observing the effect of disrupting activity in these areas using transcranial magnetic stimulation in a mixed gender sample of healthy human volunteers. The outcome measure was performance on a Stroop-like gesture task (Kelly et al., 2010a), which provides a behavioral index of gesture-speech integration. Our results provide clear evidence that disrupting activity in IFG and pMTG selectively impairs gesture-speech integration, suggesting that both areas are causally involved in the process. These findings are consistent with the idea that these areas play a joint role in gesture-speech integration, with IFG regulating strategic semantic access via top-down signals acting upon temporal storage areas. SIGNIFICANCE STATEMENT Previous neuroimaging studies suggest an involvement of inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech integration, but findings have been mixed and due to methodological constraints did not allow inferences of causality. By adopting a virtual lesion approach involving transcranial magnetic stimulation, the present study provides clear evidence that both areas are causally involved in combining semantic information arising from gesture and speech. These findings support the view that, rather than being separate entities, gesture and speech are part of an integrated multimodal language system, with inferior frontal gyrus and posterior middle temporal gyrus serving as critical nodes of the cortical network underpinning this system. Copyright © 2018 the authors 0270-6474/18/381891-10$15.00/0.
The neural processing of hierarchical structure in music and speech at different timescales.

PubMed

Farbood, Morwaread M; Heeger, David J; Marcus, Gary; Hasson, Uri; Lerner, Yulia

2015-01-01

Music, like speech, is a complex auditory signal that contains structures at multiple timescales, and as such is a potentially powerful entry point into the question of how the brain integrates complex streams of information. Using an experimental design modeled after previous studies that used scrambled versions of a spoken story (Lerner et al., 2011) and a silent movie (Hasson et al., 2008), we investigate whether listeners perceive hierarchical structure in music beyond short (~6 s) time windows and whether there is cortical overlap between music and language processing at multiple timescales. Experienced pianists were presented with an extended musical excerpt scrambled at multiple timescales-by measure, phrase, and section-while measuring brain activity with functional magnetic resonance imaging (fMRI). The reliability of evoked activity, as quantified by inter-subject correlation of the fMRI responses, was measured. We found that response reliability depended systematically on musical structure coherence, revealing a topographically organized hierarchy of processing timescales. Early auditory areas (at the bottom of the hierarchy) responded reliably in all conditions. For brain areas at the top of the hierarchy, the original (unscrambled) excerpt evoked more reliable responses than any of the scrambled excerpts, indicating that these brain areas process long-timescale musical structures, on the order of minutes. The topography of processing timescales was analogous with that reported previously for speech, but the timescale gradients for music and speech overlapped with one another only partially, suggesting that temporally analogous structures-words/measures, sentences/musical phrases, paragraph/sections-are processed separately.
Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

PubMed Central

Leong, Victoria; Goswami, Usha

2015-01-01

When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across languages. The S-AMPH model reveals a crucial developmental role for stress feet (AMs ~2 Hz). Stress feet underpin different linguistic rhythm typologies, and speech rhythm underpins language acquisition by infants in all languages. PMID:26641472
Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech.

PubMed

Leong, Victoria; Goswami, Usha

2015-01-01

When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72-82% (freely-read CDS) and 90-98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across languages. The S-AMPH model reveals a crucial developmental role for stress feet (AMs ~2 Hz). Stress feet underpin different linguistic rhythm typologies, and speech rhythm underpins language acquisition by infants in all languages.
Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders

NASA Astrophysics Data System (ADS)

Rußwurm, Marc; Körner, Marco

2018-03-01

Earth observation (EO) sensors deliver data with daily or weekly temporal resolution. Most land use and land cover (LULC) approaches, however, expect cloud-free and mono-temporal observations. The increasing temporal capabilities of today's sensors enables the use of temporal, along with spectral and spatial features. Domains, such as speech recognition or neural machine translation, work with inherently temporal data and, today, achieve impressive results using sequential encoder-decoder structures. Inspired by these sequence-to-sequence models, we adapt an encoder structure with convolutional recurrent layers in order to approximate a phenological model for vegetation classes based on a temporal sequence of Sentinel 2 (S2) images. In our experiments, we visualize internal activations over a sequence of cloudy and non-cloudy images and find several recurrent cells, which reduce the input activity for cloudy observations. Hence, we assume that our network has learned cloud-filtering schemes solely from input data, which could alleviate the need for tedious cloud-filtering as a preprocessing step for many EO approaches. Moreover, using unfiltered temporal series of top-of-atmosphere (TOA) reflectance data, we achieved in our experiments state-of-the-art classification accuracies on a large number of crop classes with minimal preprocessing compared to other classification approaches.
Do Older Listeners With Hearing Loss Benefit From Dynamic Pitch for Speech Recognition in Noise?

PubMed

Shen, Jing; Souza, Pamela E

2017-10-12

Dynamic pitch, the variation in the fundamental frequency of speech, aids older listeners' speech perception in noise. It is unclear, however, whether some older listeners with hearing loss benefit from strengthened dynamic pitch cues for recognizing speech in certain noise scenarios and how this relative benefit may be associated with individual factors. We first examined older individuals' relative benefit between natural and strong dynamic pitches for better speech recognition in noise. Further, we reported the individual factors of the 2 groups of listeners who benefit differently from natural and strong dynamic pitches. Speech reception thresholds of 13 older listeners with mild-moderate hearing loss were measured using target speech with 3 levels of dynamic pitch strength. Individuals' ability to benefit from dynamic pitch was defined as the speech reception threshold difference between speeches with and without dynamic pitch cues. The relative benefit of natural versus strong dynamic pitch varied across individuals. However, this relative benefit remained consistent for the same individuals across those background noises with temporal modulation. Those listeners who benefited more from strong dynamic pitch reported better subjective speech perception abilities. Strong dynamic pitch may be more beneficial than natural dynamic pitch for some older listeners to recognize speech better in noise, particularly when the noise has temporal modulation.
Word pair classification during imagined speech using direct brain recordings

NASA Astrophysics Data System (ADS)

Martin, Stephanie; Brunner, Peter; Iturrate, Iñaki; Millán, José Del R.; Schalk, Gerwin; Knight, Robert T.; Pasley, Brian N.

2016-05-01

People that cannot communicate due to neurological disorders would benefit from an internal speech decoder. Here, we showed the ability to classify individual words during imagined speech from electrocorticographic signals. In a word imagery task, we used high gamma (70-150 Hz) time features with a support vector machine model to classify individual words from a pair of words. To account for temporal irregularities during speech production, we introduced a non-linear time alignment into the SVM kernel. Classification accuracy reached 88% in a two-class classification framework (50% chance level), and average classification accuracy across fifteen word-pairs was significant across five subjects (mean = 58% p < 0.05). We also compared classification accuracy between imagined speech, overt speech and listening. As predicted, higher classification accuracy was obtained in the listening and overt speech conditions (mean = 89% and 86%, respectively; p < 0.0001), where speech stimuli were directly presented. The results provide evidence for a neural representation for imagined words in the temporal lobe, frontal lobe and sensorimotor cortex, consistent with previous findings in speech perception and production. These data represent a proof of concept study for basic decoding of speech imagery, and delineate a number of key challenges to usage of speech imagery neural representations for clinical applications.
Word pair classification during imagined speech using direct brain recordings

PubMed Central

Martin, Stephanie; Brunner, Peter; Iturrate, Iñaki; Millán, José del R.; Schalk, Gerwin; Knight, Robert T.; Pasley, Brian N.

2016-01-01

People that cannot communicate due to neurological disorders would benefit from an internal speech decoder. Here, we showed the ability to classify individual words during imagined speech from electrocorticographic signals. In a word imagery task, we used high gamma (70–150 Hz) time features with a support vector machine model to classify individual words from a pair of words. To account for temporal irregularities during speech production, we introduced a non-linear time alignment into the SVM kernel. Classification accuracy reached 88% in a two-class classification framework (50% chance level), and average classification accuracy across fifteen word-pairs was significant across five subjects (mean = 58%; p < 0.05). We also compared classification accuracy between imagined speech, overt speech and listening. As predicted, higher classification accuracy was obtained in the listening and overt speech conditions (mean = 89% and 86%, respectively; p < 0.0001), where speech stimuli were directly presented. The results provide evidence for a neural representation for imagined words in the temporal lobe, frontal lobe and sensorimotor cortex, consistent with previous findings in speech perception and production. These data represent a proof of concept study for basic decoding of speech imagery, and delineate a number of key challenges to usage of speech imagery neural representations for clinical applications. PMID:27165452
Singing can improve speech function in aphasics associated with intact right basal ganglia and preserve right temporal glucose metabolism: Implications for singing therapy indication.

PubMed

Akanuma, Kyoko; Meguro, Kenichi; Satoh, Masayuki; Tashiro, Manabu; Itoh, Masatoshi

2016-01-01

Clinically, we know that some aphasic patients can sing well despite their speech disturbances. Herein, we report 10 patients with non-fluent aphasia, of which half of the patients improved their speech function after singing training. We studied ten patients with non-fluent aphasia complaining of difficulty finding words. All had lesions in the left basal ganglia or temporal lobe. They selected the melodies they knew well, but which they could not sing. We made a new lyric with a familiar melody using words they could not name. The singing training using these new lyrics was performed for 30 minutes once a week for 10 weeks. Before and after the training, their speech functions were assessed by language tests. At baseline, 6 of them received positron emission tomography to evaluate glucose metabolism. Five patients exhibited improvements after intervention; all but one exhibited intact right basal ganglia and left temporal lobes, but all exhibited left basal ganglia lesions. Among them, three subjects exhibited preserved glucose metabolism in the right temporal lobe. We considered that patients who exhibit intact right basal ganglia and left temporal lobes, together with preserved right hemispheric glucose metabolism, might be an indication of the effectiveness of singing therapy.
Time-Warp–Invariant Neuronal Processing

PubMed Central

Gütig, Robert; Sompolinsky, Haim

2009-01-01

Fluctuations in the temporal durations of sensory signals constitute a major source of variability within natural stimulus ensembles. The neuronal mechanisms through which sensory systems can stabilize perception against such fluctuations are largely unknown. An intriguing instantiation of such robustness occurs in human speech perception, which relies critically on temporal acoustic cues that are embedded in signals with highly variable duration. Across different instances of natural speech, auditory cues can undergo temporal warping that ranges from 2-fold compression to 2-fold dilation without significant perceptual impairment. Here, we report that time-warp–invariant neuronal processing can be subserved by the shunting action of synaptic conductances that automatically rescales the effective integration time of postsynaptic neurons. We propose a novel spike-based learning rule for synaptic conductances that adjusts the degree of synaptic shunting to the temporal processing requirements of a given task. Applying this general biophysical mechanism to the example of speech processing, we propose a neuronal network model for time-warp–invariant word discrimination and demonstrate its excellent performance on a standard benchmark speech-recognition task. Our results demonstrate the important functional role of synaptic conductances in spike-based neuronal information processing and learning. The biophysics of temporal integration at neuronal membranes can endow sensory pathways with powerful time-warp–invariant computational capabilities. PMID:19582146
Transient Auditory Storage of Acoustic Details Is Associated with Release of Speech from Informational Masking in Reverberant Conditions

ERIC Educational Resources Information Center

Huang, Ying; Huang, Qiang; Chen, Xun; Wu, Xihong; Li, Liang

2009-01-01

Perceptual integration of the sound directly emanating from the source with reflections needs both temporal storage and correlation computation of acoustic details. We examined whether the temporal storage is frequency dependent and associated with speech unmasking. In Experiment 1, a break in correlation (BIC) between interaurally correlated…
Can rhythmic auditory cuing remediate language-related deficits in Parkinson's disease?

PubMed

Kotz, Sonja A; Gunter, Thomas C

2015-03-01

Neurodegenerative changes of the basal ganglia in idiopathic Parkinson's disease (IPD) lead to motor deficits as well as general cognitive decline. Given these impairments, the question arises as to whether motor and nonmotor deficits can be ameliorated similarly. We reason that a domain-general sensorimotor circuit involved in temporal processing may support the remediation of such deficits. Following findings that auditory cuing benefits gait kinematics, we explored whether reported language-processing deficits in IPD can also be remediated via auditory cuing. During continuous EEG measurement, an individual diagnosed with IPD heard two types of temporally predictable but metrically different auditory beat-based cues: a march, which metrically aligned with the speech accent structure, a waltz that did not metrically align, or no cue before listening to naturally spoken sentences that were either grammatically well formed or were semantically or syntactically incorrect. Results confirmed that only the cuing with a march led to improved computation of syntactic and semantic information. We infer that a marching rhythm may lead to a stronger engagement of the cerebello-thalamo-cortical circuit that compensates dysfunctional striato-cortical timing. Reinforcing temporal realignment, in turn, may lead to the timely processing of linguistic information embedded in the temporally variable speech signal. © 2014 New York Academy of Sciences.

Are mirror neurons the basis of speech perception? Evidence from five cases with damage to the purported human mirror system

PubMed Central

Rogalsky, Corianne; Love, Tracy; Driscoll, David; Anderson, Steven W.; Hickok, Gregory

2013-01-01

The discovery of mirror neurons in macaque has led to a resurrection of motor theories of speech perception. Although the majority of lesion and functional imaging studies have associated perception with the temporal lobes, it has also been proposed that the ‘human mirror system’, which prominently includes Broca’s area, is the neurophysiological substrate of speech perception. Although numerous studies have demonstrated a tight link between sensory and motor speech processes, few have directly assessed the critical prediction of mirror neuron theories of speech perception, namely that damage to the human mirror system should cause severe deficits in speech perception. The present study measured speech perception abilities of patients with lesions involving motor regions in the left posterior frontal lobe and/or inferior parietal lobule (i.e., the proposed human ‘mirror system’). Performance was at or near ceiling in patients with fronto-parietal lesions. It is only when the lesion encroaches on auditory regions in the temporal lobe that perceptual deficits are evident. This suggests that ‘mirror system’ damage does not disrupt speech perception, but rather that auditory systems are the primary substrate for speech perception. PMID:21207313
Temporal and spatio-temporal vibrotactile displays for voice fundamental frequency: an initial evaluation of a new vibrotactile speech perception aid with normal-hearing and hearing-impaired individuals.

PubMed

Auer, E T; Bernstein, L E; Coulter, D C

1998-10-01

Four experiments were performed to evaluate a new wearable vibrotactile speech perception aid that extracts fundamental frequency (F0) and displays the extracted F0 as a single-channel temporal or an eight-channel spatio-temporal stimulus. Specifically, we investigated the perception of intonation (i.e., question versus statement) and emphatic stress (i.e., stress on the first, second, or third word) under Visual-Alone (VA), Visual-Tactile (VT), and Tactile-Alone (TA) conditions and compared performance using the temporal and spatio-temporal vibrotactile display. Subjects were adults with normal hearing in experiments I-III and adults with severe to profound hearing impairments in experiment IV. Both versions of the vibrotactile speech perception aid successfully conveyed intonation. Vibrotactile stress information was successfully conveyed, but vibrotactile stress information did not enhance performance in VT conditions beyond performance in VA conditions. In experiment III, which involved only intonation identification, a reliable advantage for the spatio-temporal display was obtained. Differences between subject groups were obtained for intonation identification, with more accurate VT performance by those with normal hearing. Possible effects of long-term hearing status are discussed.
Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

PubMed

Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

2015-05-01

The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.
Paving the Way for Speech: Voice-Training-Induced Plasticity in Chronic Aphasia and Apraxia of Speech—Three Single Cases

PubMed Central

Jungblut, Monika; Huber, Walter; Mais, Christiane

2014-01-01

Difficulties with temporal coordination or sequencing of speech movements are frequently reported in aphasia patients with concomitant apraxia of speech (AOS). Our major objective was to investigate the effects of specific rhythmic-melodic voice training on brain activation of those patients. Three patients with severe chronic nonfluent aphasia and AOS were included in this study. Before and after therapy, patients underwent the same fMRI procedure as 30 healthy control subjects in our prestudy, which investigated the neural substrates of sung vowel changes in untrained rhythm sequences. A main finding was that post-minus pretreatment imaging data yielded significant perilesional activations in all patients for example, in the left superior temporal gyrus, whereas the reverse subtraction revealed either no significant activation or right hemisphere activation. Likewise, pre- and posttreatment assessments of patients' vocal rhythm production, language, and speech motor performance yielded significant improvements for all patients. Our results suggest that changes in brain activation due to the applied training might indicate specific processes of reorganization, for example, improved temporal sequencing of sublexical speech components. In this context, a training that focuses on rhythmic singing with differently demanding complexity levels as concerns motor and cognitive capabilities seems to support paving the way for speech. PMID:24977055
Phase-Locked Responses to Speech in Human Auditory Cortex are Enhanced During Comprehension

PubMed Central

Peelle, Jonathan E.; Gross, Joachim; Davis, Matthew H.

2013-01-01

A growing body of evidence shows that ongoing oscillations in auditory cortex modulate their phase to match the rhythm of temporally regular acoustic stimuli, increasing sensitivity to relevant environmental cues and improving detection accuracy. In the current study, we test the hypothesis that nonsensory information provided by linguistic content enhances phase-locked responses to intelligible speech in the human brain. Sixteen adults listened to meaningful sentences while we recorded neural activity using magnetoencephalography. Stimuli were processed using a noise-vocoding technique to vary intelligibility while keeping the temporal acoustic envelope consistent. We show that the acoustic envelopes of sentences contain most power between 4 and 7 Hz and that it is in this frequency band that phase locking between neural activity and envelopes is strongest. Bilateral oscillatory neural activity phase-locked to unintelligible speech, but this cerebro-acoustic phase locking was enhanced when speech was intelligible. This enhanced phase locking was left lateralized and localized to left temporal cortex. Together, our results demonstrate that entrainment to connected speech does not only depend on acoustic characteristics, but is also affected by listeners’ ability to extract linguistic information. This suggests a biological framework for speech comprehension in which acoustic and linguistic cues reciprocally aid in stimulus prediction. PMID:22610394
Phase-locked responses to speech in human auditory cortex are enhanced during comprehension.

PubMed

Peelle, Jonathan E; Gross, Joachim; Davis, Matthew H

2013-06-01

A growing body of evidence shows that ongoing oscillations in auditory cortex modulate their phase to match the rhythm of temporally regular acoustic stimuli, increasing sensitivity to relevant environmental cues and improving detection accuracy. In the current study, we test the hypothesis that nonsensory information provided by linguistic content enhances phase-locked responses to intelligible speech in the human brain. Sixteen adults listened to meaningful sentences while we recorded neural activity using magnetoencephalography. Stimuli were processed using a noise-vocoding technique to vary intelligibility while keeping the temporal acoustic envelope consistent. We show that the acoustic envelopes of sentences contain most power between 4 and 7 Hz and that it is in this frequency band that phase locking between neural activity and envelopes is strongest. Bilateral oscillatory neural activity phase-locked to unintelligible speech, but this cerebro-acoustic phase locking was enhanced when speech was intelligible. This enhanced phase locking was left lateralized and localized to left temporal cortex. Together, our results demonstrate that entrainment to connected speech does not only depend on acoustic characteristics, but is also affected by listeners' ability to extract linguistic information. This suggests a biological framework for speech comprehension in which acoustic and linguistic cues reciprocally aid in stimulus prediction.
Masking release due to linguistic and phonetic dissimilarity between the target and masker speech

PubMed Central

Calandruccio, Lauren; Brouwer, Susanne; Van Engen, Kristin J.; Dhar, Sumitrajit; Bradlow, Ann R.

2013-01-01

Purpose To investigate masking release for speech maskers for linguistically and phonetically close (English and Dutch) and distant (English and Mandarin) language pairs. Method Twenty monolingual speakers of English with normal-audiometric thresholds participated. Data are reported for an English sentence recognition task in English, Dutch and Mandarin competing speech maskers (Experiment I) and noise maskers (Experiment II) that were matched either to the long-term-average-speech spectra or to the temporal modulations of the speech maskers from Experiment I. Results Results indicated that listener performance increased as the target-to-masker linguistic distance increased (English-in-English < English-in-Dutch < English-in-Mandarin). Conclusions Spectral differences between maskers can account for some, but not all, of the variation in performance between maskers; however, temporal differences did not seem to play a significant role. PMID:23800811
Recognition of Time-Compressed and Natural Speech with Selective Temporal Enhancements by Young and Elderly Listeners

ERIC Educational Resources Information Center

Gordon-Salant, Sandra; Fitzgibbons, Peter J.; Friedman, Sarah A.

2007-01-01

Purpose: The goal of this experiment was to determine whether selective slowing of speech segments improves recognition performance by young and elderly listeners. The hypotheses were (a) the benefits of time expansion occur for rapid speech but not for natural-rate speech, (b) selective time expansion of consonants produces greater score…
Temporal Dependency and the Structure of Early Looking.

PubMed

Messinger, Daniel S; Mattson, Whitney I; Todd, James Torrence; Gangi, Devon N; Myers, Nicholas D; Bahrick, Lorraine E

2017-01-01

Although looking time is used to assess infant perceptual and cognitive processing, little is known about the temporal structure of infant looking. To shed light on this temporal structure, 127 three-month-olds were assessed in an infant-controlled habituation procedure and presented with a pre-recorded display of a woman addressing the infant using infant-directed speech. Previous individual look durations positively predicted subsequent look durations over a six look window, suggesting a temporal dependency between successive infant looks. The previous look duration continued to predict the subsequent look duration after accounting for habituation-linked declines in look duration, and when looks were separated by an inter-trial interval in which no stimulus was displayed. Individual differences in temporal dependency, the strength of associations between consecutive look durations, are distinct from individual differences in mean infant look duration. Nevertheless, infants with stronger temporal dependency had briefer mean look durations, a potential index of stimulus processing. Temporal dependency was evident not only between individual infant looks but between the durations of successive habituation trials (total looking within a trial). Finally, temporal dependency was evident in associations between the last look at the habituation stimulus and the first look at a novel test stimulus. Thus temporal dependency was evident across multiple timescales (individual looks and trials comprised of multiple individual looks) and persisted across conditions including brief periods of no stimulus presentation and changes from a familiar to novel stimulus. Associations between consecutive look durations over multiple timescales and stimuli suggest a temporal structure of infant attention that has been largely ignored in previous work on infant looking.
Temporal Dependency and the Structure of Early Looking

PubMed Central

Messinger, Daniel S.; Mattson, Whitney I.; Todd, James Torrence; Gangi, Devon N.; Myers, Nicholas D.; Bahrick, Lorraine E.

2017-01-01

Although looking time is used to assess infant perceptual and cognitive processing, little is known about the temporal structure of infant looking. To shed light on this temporal structure, 127 three-month-olds were assessed in an infant-controlled habituation procedure and presented with a pre-recorded display of a woman addressing the infant using infant-directed speech. Previous individual look durations positively predicted subsequent look durations over a six look window, suggesting a temporal dependency between successive infant looks. The previous look duration continued to predict the subsequent look duration after accounting for habituation-linked declines in look duration, and when looks were separated by an inter-trial interval in which no stimulus was displayed. Individual differences in temporal dependency, the strength of associations between consecutive look durations, are distinct from individual differences in mean infant look duration. Nevertheless, infants with stronger temporal dependency had briefer mean look durations, a potential index of stimulus processing. Temporal dependency was evident not only between individual infant looks but between the durations of successive habituation trials (total looking within a trial). Finally, temporal dependency was evident in associations between the last look at the habituation stimulus and the first look at a novel test stimulus. Thus temporal dependency was evident across multiple timescales (individual looks and trials comprised of multiple individual looks) and persisted across conditions including brief periods of no stimulus presentation and changes from a familiar to novel stimulus. Associations between consecutive look durations over multiple timescales and stimuli suggest a temporal structure of infant attention that has been largely ignored in previous work on infant looking. PMID:28076362
Decreased Cerebellar-Orbitofrontal Connectivity Correlates with Stuttering Severity: Whole-Brain Functional and Structural Connectivity Associations with Persistent Developmental Stuttering.

PubMed

Sitek, Kevin R; Cai, Shanqing; Beal, Deryk S; Perkell, Joseph S; Guenther, Frank H; Ghosh, Satrajit S

2016-01-01

Persistent developmental stuttering is characterized by speech production disfluency and affects 1% of adults. The degree of impairment varies widely across individuals and the neural mechanisms underlying the disorder and this variability remain poorly understood. Here we elucidate compensatory mechanisms related to this variability in impairment using whole-brain functional and white matter connectivity analyses in persistent developmental stuttering. We found that people who stutter had stronger functional connectivity between cerebellum and thalamus than people with fluent speech, while stutterers with the least severe symptoms had greater functional connectivity between left cerebellum and left orbitofrontal cortex (OFC). Additionally, people who stutter had decreased functional and white matter connectivity among the perisylvian auditory, motor, and speech planning regions compared to typical speakers, but greater functional connectivity between the right basal ganglia and bilateral temporal auditory regions. Structurally, disfluency ratings were negatively correlated with white matter connections to left perisylvian regions and to the brain stem. Overall, we found increased connectivity among subcortical and reward network structures in people who stutter compared to controls. These connections were negatively correlated with stuttering severity, suggesting the involvement of cerebellum and OFC may underlie successful compensatory mechanisms by more fluent stutterers.
The neural processing of hierarchical structure in music and speech at different timescales

PubMed Central

Farbood, Morwaread M.; Heeger, David J.; Marcus, Gary; Hasson, Uri; Lerner, Yulia

2015-01-01

Music, like speech, is a complex auditory signal that contains structures at multiple timescales, and as such is a potentially powerful entry point into the question of how the brain integrates complex streams of information. Using an experimental design modeled after previous studies that used scrambled versions of a spoken story (Lerner et al., 2011) and a silent movie (Hasson et al., 2008), we investigate whether listeners perceive hierarchical structure in music beyond short (~6 s) time windows and whether there is cortical overlap between music and language processing at multiple timescales. Experienced pianists were presented with an extended musical excerpt scrambled at multiple timescales—by measure, phrase, and section—while measuring brain activity with functional magnetic resonance imaging (fMRI). The reliability of evoked activity, as quantified by inter-subject correlation of the fMRI responses, was measured. We found that response reliability depended systematically on musical structure coherence, revealing a topographically organized hierarchy of processing timescales. Early auditory areas (at the bottom of the hierarchy) responded reliably in all conditions. For brain areas at the top of the hierarchy, the original (unscrambled) excerpt evoked more reliable responses than any of the scrambled excerpts, indicating that these brain areas process long-timescale musical structures, on the order of minutes. The topography of processing timescales was analogous with that reported previously for speech, but the timescale gradients for music and speech overlapped with one another only partially, suggesting that temporally analogous structures—words/measures, sentences/musical phrases, paragraph/sections—are processed separately. PMID:26029037
Gesture in the developing brain

PubMed Central

Dick, Anthony Steven; Goldin-Meadow, Susan; Solodkin, Ana; Small, Steven L.

2011-01-01

Speakers convey meaning not only through words, but also through gestures. Although children are exposed to co-speech gestures from birth, we do not know how the developing brain comes to connect meaning conveyed in gesture with speech. We used functional magnetic resonance imaging (fMRI) to address this question and scanned 8- to 11-year-old children and adults listening to stories accompanied by hand movements, either meaningful co-speech gestures or meaningless self-adaptors. When listening to stories accompanied by both types of hand movements, both children and adults recruited inferior frontal, inferior parietal, and posterior temporal brain regions known to be involved in processing language not accompanied by hand movements. There were, however, age-related differences in activity in posterior superior temporal sulcus (STSp), inferior frontal gyrus, pars triangularis (IFGTr), and posterior middle temporal gyrus (MTGp) regions previously implicated in processing gesture. Both children and adults showed sensitivity to the meaning of hand movements in IFGTr and MTGp, but in different ways. Finally, we found that hand movement meaning modulates interactions between STSp and other posterior temporal and inferior parietal regions for adults, but not for children. These results shed light on the developing neural substrate for understanding meaning contributed by co-speech gesture. PMID:22356173
A common functional neural network for overt production of speech and gesture.

PubMed

Marstaller, L; Burianová, H

2015-01-22

The perception of co-speech gestures, i.e., hand movements that co-occur with speech, has been investigated by several studies. The results show that the perception of co-speech gestures engages a core set of frontal, temporal, and parietal areas. However, no study has yet investigated the neural processes underlying the production of co-speech gestures. Specifically, it remains an open question whether Broca's area is central to the coordination of speech and gestures as has been suggested previously. The objective of this study was to use functional magnetic resonance imaging to (i) investigate the regional activations underlying overt production of speech, gestures, and co-speech gestures, and (ii) examine functional connectivity with Broca's area. We hypothesized that co-speech gesture production would activate frontal, temporal, and parietal regions that are similar to areas previously found during co-speech gesture perception and that both speech and gesture as well as co-speech gesture production would engage a neural network connected to Broca's area. Whole-brain analysis confirmed our hypothesis and showed that co-speech gesturing did engage brain areas that form part of networks known to subserve language and gesture. Functional connectivity analysis further revealed a functional network connected to Broca's area that is common to speech, gesture, and co-speech gesture production. This network consists of brain areas that play essential roles in motor control, suggesting that the coordination of speech and gesture is mediated by a shared motor control network. Our findings thus lend support to the idea that speech can influence co-speech gesture production on a motoric level. Copyright © 2014 IBRO. Published by Elsevier Ltd. All rights reserved.
Articulatory movements modulate auditory responses to speech

PubMed Central

Agnew, Z.K.; McGettigan, C.; Banks, B.; Scott, S.K.

2013-01-01

Production of actions is highly dependent on concurrent sensory information. In speech production, for example, movement of the articulators is guided by both auditory and somatosensory input. It has been demonstrated in non-human primates that self-produced vocalizations and those of others are differentially processed in the temporal cortex. The aim of the current study was to investigate how auditory and motor responses differ for self-produced and externally produced speech. Using functional neuroimaging, subjects were asked to produce sentences aloud, to silently mouth while listening to a different speaker producing the same sentence, to passively listen to sentences being read aloud, or to read sentences silently. We show that that separate regions of the superior temporal cortex display distinct response profiles to speaking aloud, mouthing while listening, and passive listening. Responses in anterior superior temporal cortices in both hemispheres are greater for passive listening compared with both mouthing while listening, and speaking aloud. This is the first demonstration that articulation, whether or not it has auditory consequences, modulates responses of the dorsolateral temporal cortex. In contrast posterior regions of the superior temporal cortex are recruited during both articulation conditions. In dorsal regions of the posterior superior temporal gyrus, responses to mouthing and reading aloud were equivalent, and in more ventral posterior superior temporal sulcus, responses were greater for reading aloud compared with mouthing while listening. These data demonstrate an anterior–posterior division of superior temporal regions where anterior fields are suppressed during motor output, potentially for the purpose of enhanced detection of the speech of others. We suggest posterior fields are engaged in auditory processing for the guidance of articulation by auditory information. PMID:22982103
Ultra-fast speech comprehension in blind subjects engages primary visual cortex, fusiform gyrus, and pulvinar – a functional magnetic resonance imaging (fMRI) study

PubMed Central

2013-01-01

Background Individuals suffering from vision loss of a peripheral origin may learn to understand spoken language at a rate of up to about 22 syllables (syl) per second - exceeding by far the maximum performance level of normal-sighted listeners (ca. 8 syl/s). To further elucidate the brain mechanisms underlying this extraordinary skill, functional magnetic resonance imaging (fMRI) was performed in blind subjects of varying ultra-fast speech comprehension capabilities and sighted individuals while listening to sentence utterances of a moderately fast (8 syl/s) or ultra-fast (16 syl/s) syllabic rate. Results Besides left inferior frontal gyrus (IFG), bilateral posterior superior temporal sulcus (pSTS) and left supplementary motor area (SMA), blind people highly proficient in ultra-fast speech perception showed significant hemodynamic activation of right-hemispheric primary visual cortex (V1), contralateral fusiform gyrus (FG), and bilateral pulvinar (Pv). Conclusions Presumably, FG supports the left-hemispheric perisylvian “language network”, i.e., IFG and superior temporal lobe, during the (segmental) sequencing of verbal utterances whereas the collaboration of bilateral pulvinar, right auditory cortex, and ipsilateral V1 implements a signal-driven timing mechanism related to syllabic (suprasegmental) modulation of the speech signal. These data structures, conveyed via left SMA to the perisylvian “language zones”, might facilitate – under time-critical conditions – the consolidation of linguistic information at the level of verbal working memory. PMID:23879896
Psychotic-Like Speech in Frontotemporal Dementia.

PubMed

Mendez, Mario F; Carr, Andrew R; Paholpak, Pongsatorn

2017-01-01

Behavioral variant frontotemporal dementia (bvFTD) may result in psychotic-like speech without other psychotic features. The authors identified a bvFTD subgroup with pressure of speech, tangentiality, derailment, clanging/rhyming, and punning associated with the right anterior temporal atrophy and sparing of the left frontal lobe.
The brain’s conversation with itself: neural substrates of dialogic inner speech

PubMed Central

Weis, Susanne; McCarthy-Jones, Simon; Moseley, Peter; Smailes, David; Fernyhough, Charles

2016-01-01

Inner speech has been implicated in important aspects of normal and atypical cognition, including the development of auditory hallucinations. Studies to date have focused on covert speech elicited by simple word or sentence repetition, while ignoring richer and arguably more psychologically significant varieties of inner speech. This study compared neural activation for inner speech involving conversations (‘dialogic inner speech’) with single-speaker scenarios (‘monologic inner speech’). Inner speech-related activation differences were then compared with activations relating to Theory-of-Mind (ToM) reasoning and visual perspective-taking in a conjunction design. Generation of dialogic (compared with monologic) scenarios was associated with a widespread bilateral network including left and right superior temporal gyri, precuneus, posterior cingulate and left inferior and medial frontal gyri. Activation associated with dialogic scenarios and ToM reasoning overlapped in areas of right posterior temporal cortex previously linked to mental state representation. Implications for understanding verbal cognition in typical and atypical populations are discussed. PMID:26197805
Common and distinct neural substrates for the perception of speech rhythm and intonation.

PubMed

Zhang, Linjun; Shu, Hua; Zhou, Fengying; Wang, Xiaoyi; Li, Ping

2010-07-01

The present study examines the neural substrates for the perception of speech rhythm and intonation. Subjects listened passively to synthesized speech stimuli that contained no semantic and phonological information, in three conditions: (1) continuous speech stimuli with fixed syllable duration and fundamental frequency in the standard condition, (2) stimuli with varying vocalic durations of syllables in the speech rhythm condition, and (3) stimuli with varying fundamental frequency in the intonation condition. Compared to the standard condition, speech rhythm activated the right middle superior temporal gyrus (mSTG), whereas intonation activated the bilateral superior temporal gyrus and sulcus (STG/STS) and the right posterior STS. Conjunction analysis further revealed that rhythm and intonation activated a common area in the right mSTG but compared to speech rhythm, intonation elicited additional activations in the right anterior STS. Findings from the current study reveal that the right mSTG plays an important role in prosodic processing. Implications of our findings are discussed with respect to neurocognitive theories of auditory processing. (c) 2009 Wiley-Liss, Inc.
Song and speech: brain regions involved with perception and covert production.

PubMed

Callan, Daniel E; Tsytsarev, Vassiliy; Hanakawa, Takashi; Callan, Akiko M; Katsuhara, Maya; Fukuyama, Hidenao; Turner, Robert

2006-07-01

This 3-T fMRI study investigates brain regions similarly and differentially involved with listening and covert production of singing relative to speech. Given the greater use of auditory-motor self-monitoring and imagery with respect to consonance in singing, brain regions involved with these processes are predicted to be differentially active for singing more than for speech. The stimuli consisted of six Japanese songs. A block design was employed in which the tasks for the subject were to listen passively to singing of the song lyrics, passively listen to speaking of the song lyrics, covertly sing the song lyrics visually presented, covertly speak the song lyrics visually presented, and to rest. The conjunction of passive listening and covert production tasks used in this study allow for general neural processes underlying both perception and production to be discerned that are not exclusively a result of stimulus induced auditory processing nor to low level articulatory motor control. Brain regions involved with both perception and production for singing as well as speech were found to include the left planum temporale/superior temporal parietal region, as well as left and right premotor cortex, lateral aspect of the VI lobule of posterior cerebellum, anterior superior temporal gyrus, and planum polare. Greater activity for the singing over the speech condition for both the listening and covert production tasks was found in the right planum temporale. Greater activity in brain regions involved with consonance, orbitofrontal cortex (listening task), subcallosal cingulate (covert production task) were also present for singing over speech. The results are consistent with the PT mediating representational transformation across auditory and motor domains in response to consonance for singing over that of speech. Hemispheric laterality was assessed by paired t tests between active voxels in the contrast of interest relative to the left-right flipped contrast of interest calculated from images normalized to the left-right reflected template. Consistent with some hypotheses regarding hemispheric specialization, a pattern of differential laterality for speech over singing (both covert production and listening tasks) occurs in the left temporal lobe, whereas, singing over speech (listening task only) occurs in right temporal lobe.

The Interaction of Temporal and Spectral Acoustic Information with Word Predictability on Speech Intelligibility

NASA Astrophysics Data System (ADS)

Shahsavarani, Somayeh Bahar

High-level, top-down information such as linguistic knowledge is a salient cortical resource that influences speech perception under most listening conditions. But, are all listeners able to exploit these resources for speech facilitation to the same extent? It was found that children with cochlear implants showed different patterns of benefit from contextual information in speech perception compared with their normal-haring peers. Previous studies have discussed the role of non-acoustic factors such as linguistic and cognitive capabilities to account for this discrepancy. Given the fact that the amount of acoustic information encoded and processed by auditory nerves of listeners with cochlear implants differs from normal-hearing listeners and even varies across individuals with cochlear implants, it is important to study the interaction of specific acoustic properties of the speech signal with contextual cues. This relationship has been mostly neglected in previous research. In this dissertation, we aimed to explore how different acoustic dimensions interact to affect listeners' abilities to combine top-down information with bottom-up information in speech perception beyond the known effects of linguistic and cognitive capacities shown previously. Specifically, the present study investigated whether there were any distinct context effects based on the resolution of spectral versus slowly-varying temporal information in perception of spectrally impoverished speech. To that end, two experiments were conducted. In both experiments, a noise-vocoded technique was adopted to generate spectrally-degraded speech to approximate acoustic cues delivered to listeners with cochlear implants. The frequency resolution was manipulated by varying the number of frequency channels. The temporal resolution was manipulated by low-pass filtering of amplitude envelope with varying low-pass cutoff frequencies. The stimuli were presented to normal-hearing native speakers of American English. Our results revealed a significant interaction effect between spectral, temporal, and contextual information in the perception of spectrally-degraded speech. This suggests that specific types and degradation of bottom-up information combine differently to utilize contextual resources. These findings emphasize the importance of taking the listener's specific auditory abilities into consideration while studying context effects. These results also introduce a novel perspective for designing interventions for listeners with cochlear implants or other auditory prostheses.
Maturation of the auditory t-complex brain response across adolescence.

PubMed

Mahajan, Yatin; McArthur, Genevieve

2013-02-01

Adolescence is a time of great change in the brain in terms of structure and function. It is possible to track the development of neural function across adolescence using auditory event-related potentials (ERPs). This study tested if the brain's functional processing of sound changed across adolescence. We measured passive auditory t-complex peaks to pure tones and consonant-vowel (CV) syllables in 90 children and adolescents aged 10-18 years, as well as 10 adults. Across adolescence, Na amplitude increased to tones and speech at the right, but not left, temporal site. Ta amplitude decreased at the right temporal site for tones, and at both sites for speech. The Tb remained constant at both sites. The Na and Ta appeared to mature later in the right than left hemisphere. The t-complex peaks Na and Tb exhibited left lateralization and Ta showed right lateralization. Thus, the functional processing of sound continued to develop across adolescence and into adulthood. Crown Copyright © 2012. Published by Elsevier Ltd. All rights reserved.
Temporal dynamics and the identification of musical key.

PubMed

Farbood, Morwaread Mary; Marcus, Gary; Poeppel, David

2013-08-01

A central process in music cognition involves the identification of key; however, little is known about how listeners accomplish this task in real time. This study derives from work that suggests overlap between the neural and cognitive resources underlying the analyses of both music and speech and is the first, to our knowledge, to explore the timescales at which the brain infers musical key. We investigated the temporal psychophysics of key-finding over a wide range of tempi using melodic sequences with strong structural cues, where statistical information about overall key profile was ambiguous. Listeners were able to provide robust judgments within specific limits, at rates as high as 400 beats per minute (bpm; ∼7 Hz) and as low as 30 bpm (0.5 Hz), but not outside those bounds. These boundaries on reliable performance show that the process of key-finding is restricted to timescales that are closely aligned with beat induction and speech processing. 2013 APA, all rights reserved
The brain dynamics of rapid perceptual adaptation to adverse listening conditions.

PubMed

Erb, Julia; Henry, Molly J; Eisner, Frank; Obleser, Jonas

2013-06-26

Listeners show a remarkable ability to quickly adjust to degraded speech input. Here, we aimed to identify the neural mechanisms of such short-term perceptual adaptation. In a sparse-sampling, cardiac-gated functional magnetic resonance imaging (fMRI) acquisition, human listeners heard and repeated back 4-band-vocoded sentences (in which the temporal envelope of the acoustic signal is preserved, while spectral information is highly degraded). Clear-speech trials were included as baseline. An additional fMRI experiment on amplitude modulation rate discrimination quantified the convergence of neural mechanisms that subserve coping with challenging listening conditions for speech and non-speech. First, the degraded speech task revealed an "executive" network (comprising the anterior insula and anterior cingulate cortex), parts of which were also activated in the non-speech discrimination task. Second, trial-by-trial fluctuations in successful comprehension of degraded speech drove hemodynamic signal change in classic "language" areas (bilateral temporal cortices). Third, as listeners perceptually adapted to degraded speech, downregulation in a cortico-striato-thalamo-cortical circuit was observable. The present data highlight differential upregulation and downregulation in auditory-language and executive networks, respectively, with important subcortical contributions when successfully adapting to a challenging listening situation.
Perceived Conventionality in Co-speech Gestures Involves the Fronto-Temporal Language Network.

PubMed

Wolf, Dhana; Rekittke, Linn-Marlen; Mittelberg, Irene; Klasen, Martin; Mathiak, Klaus

2017-01-01

Face-to-face communication is multimodal; it encompasses spoken words, facial expressions, gaze, and co-speech gestures. In contrast to linguistic symbols (e.g., spoken words or signs in sign language) relying on mostly explicit conventions, gestures vary in their degree of conventionality. Bodily signs may have a general accepted or conventionalized meaning (e.g., a head shake) or less so (e.g., self-grooming). We hypothesized that subjective perception of conventionality in co-speech gestures relies on the classical language network, i.e., the left hemispheric inferior frontal gyrus (IFG, Broca's area) and the posterior superior temporal gyrus (pSTG, Wernicke's area) and studied 36 subjects watching video-recorded story retellings during a behavioral and an functional magnetic resonance imaging (fMRI) experiment. It is well documented that neural correlates of such naturalistic videos emerge as intersubject covariance (ISC) in fMRI even without involving a stimulus (model-free analysis). The subjects attended either to perceived conventionality or to a control condition (any hand movements or gesture-speech relations). Such tasks modulate ISC in contributing neural structures and thus we studied ISC changes to task demands in language networks. Indeed, the conventionality task significantly increased covariance of the button press time series and neuronal synchronization in the left IFG over the comparison with other tasks. In the left IFG, synchronous activity was observed during the conventionality task only. In contrast, the left pSTG exhibited correlated activation patterns during all conditions with an increase in the conventionality task at the trend level only. Conceivably, the left IFG can be considered a core region for the processing of perceived conventionality in co-speech gestures similar to spoken language. In general, the interpretation of conventionalized signs may rely on neural mechanisms that engage during language comprehension.
Perceived Conventionality in Co-speech Gestures Involves the Fronto-Temporal Language Network

PubMed Central

Wolf, Dhana; Rekittke, Linn-Marlen; Mittelberg, Irene; Klasen, Martin; Mathiak, Klaus

2017-01-01

Face-to-face communication is multimodal; it encompasses spoken words, facial expressions, gaze, and co-speech gestures. In contrast to linguistic symbols (e.g., spoken words or signs in sign language) relying on mostly explicit conventions, gestures vary in their degree of conventionality. Bodily signs may have a general accepted or conventionalized meaning (e.g., a head shake) or less so (e.g., self-grooming). We hypothesized that subjective perception of conventionality in co-speech gestures relies on the classical language network, i.e., the left hemispheric inferior frontal gyrus (IFG, Broca's area) and the posterior superior temporal gyrus (pSTG, Wernicke's area) and studied 36 subjects watching video-recorded story retellings during a behavioral and an functional magnetic resonance imaging (fMRI) experiment. It is well documented that neural correlates of such naturalistic videos emerge as intersubject covariance (ISC) in fMRI even without involving a stimulus (model-free analysis). The subjects attended either to perceived conventionality or to a control condition (any hand movements or gesture-speech relations). Such tasks modulate ISC in contributing neural structures and thus we studied ISC changes to task demands in language networks. Indeed, the conventionality task significantly increased covariance of the button press time series and neuronal synchronization in the left IFG over the comparison with other tasks. In the left IFG, synchronous activity was observed during the conventionality task only. In contrast, the left pSTG exhibited correlated activation patterns during all conditions with an increase in the conventionality task at the trend level only. Conceivably, the left IFG can be considered a core region for the processing of perceived conventionality in co-speech gestures similar to spoken language. In general, the interpretation of conventionalized signs may rely on neural mechanisms that engage during language comprehension. PMID:29249945
How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

PubMed

Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

2013-01-01

In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1) the auditory system, (2) supramodal representations, and (3) frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal signal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for "reading" texts at ultra-fast speaking rates (>16 syllables/s), exceeding by far the normal range of 6 syllables/s. A functional magnetic resonance imaging study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the "bottleneck" for understanding time-compressed speech seems related to higher demands for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments on cross-modal adjustments during space, time, and object recognition.
How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

PubMed Central

Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

2013-01-01

In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1) the auditory system, (2) supramodal representations, and (3) frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal signal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for “reading” texts at ultra-fast speaking rates (>16 syllables/s), exceeding by far the normal range of 6 syllables/s. A functional magnetic resonance imaging study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the “bottleneck” for understanding time-compressed speech seems related to higher demands for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments on cross-modal adjustments during space, time, and object recognition. PMID:23966968
Age-related Effects on Word Recognition: Reliance on Cognitive Control Systems with Structural Declines in Speech-responsive Cortex

PubMed Central

Walczak, Adam; Ahlstrom, Jayne; Denslow, Stewart; Horwitz, Amy; Dubno, Judy R.

2008-01-01

Speech recognition can be difficult and effortful for older adults, even for those with normal hearing. Declining frontal lobe cognitive control has been hypothesized to cause age-related speech recognition problems. This study examined age-related changes in frontal lobe function for 15 clinically normal hearing adults (21–75 years) when they performed a word recognition task that was made challenging by decreasing word intelligibility. Although there were no age-related changes in word recognition, there were age-related changes in the degree of activity within left middle frontal gyrus (MFG) and anterior cingulate (ACC) regions during word recognition. Older adults engaged left MFG and ACC regions when words were most intelligible compared to younger adults who engaged these regions when words were least intelligible. Declining gray matter volume within temporal lobe regions responsive to word intelligibility significantly predicted left MFG activity, even after controlling for total gray matter volume, suggesting that declining structural integrity of brain regions responsive to speech leads to the recruitment of frontal regions when words are easily understood. Electronic supplementary material The online version of this article (doi:10.1007/s10162-008-0113-3) contains supplementary material, which is available to authorized users. PMID:18274825
Right hemisphere structures predict poststroke speech fluency.

PubMed

Pani, Ethan; Zheng, Xin; Wang, Jasmine; Norton, Andrea; Schlaug, Gottfried

2016-04-26

We sought to determine via a cross-sectional study the contribution of (1) the right hemisphere's speech-relevant white matter regions and (2) interhemispheric connectivity to speech fluency in the chronic phase of left hemisphere stroke with aphasia. Fractional anisotropy (FA) of white matter regions underlying the right middle temporal gyrus (MTG), precentral gyrus (PreCG), pars opercularis (IFGop) and triangularis (IFGtri) of the inferior frontal gyrus, and the corpus callosum (CC) was correlated with speech fluency measures. A region within the superior parietal lobule (SPL) was examined as a control. FA values of regions that significantly predicted speech measures were compared with FA values from healthy age- and sex-matched controls. FA values for the right MTG, PreCG, and IFGop significantly predicted speech fluency, but FA values of the IFGtri and SPL did not. A multiple regression showed that combining FA of the significant right hemisphere regions with the lesion load of the left arcuate fasciculus-a previously identified biomarker of poststroke speech fluency-provided the best model for predicting speech fluency. FA of CC fibers connecting left and right supplementary motor areas (SMA) was also correlated with speech fluency. FA of the right IFGop and PreCG was significantly higher in patients than controls, while FA of a whole CC region of interest (ROI) and the CC-SMA ROI was significantly lower in patients. Right hemisphere white matter integrity is related to speech fluency measures in patients with chronic aphasia. This may indicate premorbid anatomical variability beneficial for recovery or be the result of poststroke remodeling. © 2016 American Academy of Neurology.
Right hemisphere structures predict poststroke speech fluency

PubMed Central

Pani, Ethan; Zheng, Xin; Wang, Jasmine; Norton, Andrea

2016-01-01

Objective: We sought to determine via a cross-sectional study the contribution of (1) the right hemisphere's speech-relevant white matter regions and (2) interhemispheric connectivity to speech fluency in the chronic phase of left hemisphere stroke with aphasia. Methods: Fractional anisotropy (FA) of white matter regions underlying the right middle temporal gyrus (MTG), precentral gyrus (PreCG), pars opercularis (IFGop) and triangularis (IFGtri) of the inferior frontal gyrus, and the corpus callosum (CC) was correlated with speech fluency measures. A region within the superior parietal lobule (SPL) was examined as a control. FA values of regions that significantly predicted speech measures were compared with FA values from healthy age- and sex-matched controls. Results: FA values for the right MTG, PreCG, and IFGop significantly predicted speech fluency, but FA values of the IFGtri and SPL did not. A multiple regression showed that combining FA of the significant right hemisphere regions with the lesion load of the left arcuate fasciculus—a previously identified biomarker of poststroke speech fluency—provided the best model for predicting speech fluency. FA of CC fibers connecting left and right supplementary motor areas (SMA) was also correlated with speech fluency. FA of the right IFGop and PreCG was significantly higher in patients than controls, while FA of a whole CC region of interest (ROI) and the CC-SMA ROI was significantly lower in patients. Conclusions: Right hemisphere white matter integrity is related to speech fluency measures in patients with chronic aphasia. This may indicate premorbid anatomical variability beneficial for recovery or be the result of poststroke remodeling. PMID:27029627
Temporal Fine Structure Processing, Pitch, and Speech Perception in Adult Cochlear Implant Recipients.

PubMed

Dincer D'Alessandro, Hilal; Ballantyne, Deborah; Boyle, Patrick J; De Seta, Elio; DeVincentiis, Marco; Mancini, Patrizia

2017-11-30

The aim of the study was to investigate the link between temporal fine structure (TFS) processing, pitch, and speech perception performance in adult cochlear implant (CI) recipients, including bimodal listeners who may benefit better low-frequency (LF) temporal coding in the contralateral ear. The study participants were 43 adult CI recipients (23 unilateral, 6 bilateral, and 14 bimodal listeners). Two new LF pitch perception tests-harmonic intonation (HI) and disharmonic intonation (DI)-were used to evaluate TFS sensitivity. HI and DI were designed to estimate a difference limen for discrimination of tone changes based on harmonic or inharmonic pitch glides. Speech perception was assessed using the newly developed Italian Sentence Test with Adaptive Randomized Roving level (STARR) test where sentences relevant to everyday contexts were presented at low, medium, and high levels in a fluctuating background noise to estimate a speech reception threshold (SRT). Although TFS and STARR performances in the majority of CI recipients were much poorer than those of hearing people reported in the literature, a considerable intersubject variability was observed. For CI listeners, median just noticeable differences were 27.0 and 147.0 Hz for HI and DI, respectively. HI outcomes were significantly better than those for DI. Median STARR score was 14.8 dB. Better performers with speech reception thresholds less than 20 dB had a median score of 8.6 dB. A significant effect of age was observed for both HI/DI tests, suggesting that TFS sensitivity tended to worsen with increasing age. CI pure-tone thresholds and duration of profound deafness were significantly correlated with STARR performance. Bimodal users showed significantly better TFS and STARR performance for bimodal listening than for their CI-only condition. Median bimodal gains were 33.0 Hz for the HI test and 95.0 Hz for the DI test. DI outcomes in bimodal users revealed a significant correlation with unaided hearing thresholds for octave frequencies lower than 1000 Hz. Median STARR scores were 17.3 versus 8.1 dB for CI only and bimodal listening, respectively. STARR performance was significantly correlated with HI findings for CI listeners and with those of DI for bimodal listeners. LF pitch perception was found to be abnormal in the majority of adult CI recipients, confirming poor TFS processing of CIs. Similarly, the STARR findings reflected a common performance deterioration with the HI/DI tests, suggesting the cause probably being a lack of access to TFS information. Contralateral hearing aid users obtained a remarkable bimodal benefit for all tests. Such results highlighted the importance of TFS cues for challenging speech perception and the relevance to everyday listening conditions. HI/DI and STARR tests show promise for gaining insights into how TFS and speech perception are being limited and may guide the customization of CI program parameters and support the fine tuning of bimodal listening.
Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia.

PubMed

Ye, Zheng; Rüsseler, Jascha; Gerth, Ivonne; Münte, Thomas F

2017-07-25

Dyslexia is an impairment of reading and spelling that affects both children and adults even after many years of schooling. Dyslexic readers have deficits in the integration of auditory and visual inputs but the neural mechanisms of the deficits are still unclear. This fMRI study examined the neural processing of auditorily presented German numbers 0-9 and videos of lip movements of a German native speaker voicing numbers 0-9 in unimodal (auditory or visual) and bimodal (always congruent) conditions in dyslexic readers and their matched fluent readers. We confirmed results of previous studies that the superior temporal gyrus/sulcus plays a critical role in audiovisual speech integration: fluent readers showed greater superior temporal activations for combined audiovisual stimuli than auditory-/visual-only stimuli. Importantly, such an enhancement effect was absent in dyslexic readers. Moreover, the auditory network (bilateral superior temporal regions plus medial PFC) was dynamically modulated during audiovisual integration in fluent, but not in dyslexic readers. These results suggest that superior temporal dysfunction may underly poor audiovisual speech integration in readers with dyslexia. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.
Individual differences in selective attention predict speech identification at a cocktail party.

PubMed

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-08-31

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
Temporal plasticity in auditory cortex improves neural discrimination of speech sounds

PubMed Central

Engineer, Crystal T.; Shetake, Jai A.; Engineer, Navzer D.; Vrana, Will A.; Wolf, Jordan T.; Kilgard, Michael P.

2017-01-01

Background Many individuals with language learning impairments exhibit temporal processing deficits and degraded neural responses to speech sounds. Auditory training can improve both the neural and behavioral deficits, though significant deficits remain. Recent evidence suggests that vagus nerve stimulation (VNS) paired with rehabilitative therapies enhances both cortical plasticity and recovery of normal function. Objective/Hypothesis We predicted that pairing VNS with rapid tone trains would enhance the primary auditory cortex (A1) response to unpaired novel speech sounds. Methods VNS was paired with tone trains 300 times per day for 20 days in adult rats. Responses to isolated speech sounds, compressed speech sounds, word sequences, and compressed word sequences were recorded in A1 following the completion of VNS-tone train pairing. Results Pairing VNS with rapid tone trains resulted in stronger, faster, and more discriminable A1 responses to speech sounds presented at conversational rates. Conclusion This study extends previous findings by documenting that VNS paired with rapid tone trains altered the neural response to novel unpaired speech sounds. Future studies are necessary to determine whether pairing VNS with appropriate auditory stimuli could potentially be used to improve both neural responses to speech sounds and speech perception in individuals with receptive language disorders. PMID:28131520
Clear Speech Modifications in Children Aged 6-10

NASA Astrophysics Data System (ADS)

Taylor, Griffin Lijding

Modifications to speech production made by adult talkers in response to instructions to speak clearly have been well documented in the literature. Targeting adult populations has been motivated by efforts to improve speech production for the benefit of the communication partners, however, many adults also have communication partners who are children. Surprisingly, there is limited literature on whether children can change their speech production when cued to speak clearly. Pettinato, Tuomainen, Granlund, and Hazan (2016) showed that by age 12, children exhibited enlarged vowel space areas and reduced articulation rate when prompted to speak clearly, but did not produce any other adult-like clear speech modifications in connected speech. Moreover, Syrett and Kawahara (2013) suggested that preschoolers produced longer and more intense vowels when prompted to speak clearly at the word level. These findings contrasted with adult talkers who show significant temporal and spectral differences between speech produced in control and clear speech conditions. Therefore, it was the purpose of this study to analyze changes in temporal and spectral characteristics of speech production that children aged 6-10 made in these experimental conditions. It is important to elucidate the clear speech profile of this population to better understand which adult-like clear speech modifications they make spontaneously and which modifications are still developing. Understanding these baselines will advance future studies that measure the impact of more explicit instructions and children's abilities to better accommodate their interlocutors, which is a critical component of children's pragmatic and speech-motor development.
Spectrotemporal Modulation Sensitivity as a Predictor of Speech Intelligibility for Hearing-Impaired Listeners

PubMed Central

Bernstein, Joshua G.W.; Mehraei, Golbarg; Shamma, Shihab; Gallun, Frederick J.; Theodoroff, Sarah M.; Leek, Marjorie R.

2014-01-01

Background A model that can accurately predict speech intelligibility for a given hearing-impaired (HI) listener would be an important tool for hearing-aid fitting or hearing-aid algorithm development. Existing speech-intelligibility models do not incorporate variability in suprathreshold deficits that are not well predicted by classical audiometric measures. One possible approach to the incorporation of such deficits is to base intelligibility predictions on sensitivity to simultaneously spectrally and temporally modulated signals. Purpose The likelihood of success of this approach was evaluated by comparing estimates of spectrotemporal modulation (STM) sensitivity to speech intelligibility and to psychoacoustic estimates of frequency selectivity and temporal fine-structure (TFS) sensitivity across a group of HI listeners. Research Design The minimum modulation depth required to detect STM applied to an 86 dB SPL four-octave noise carrier was measured for combinations of temporal modulation rate (4, 12, or 32 Hz) and spectral modulation density (0.5, 1, 2, or 4 cycles/octave). STM sensitivity estimates for individual HI listeners were compared to estimates of frequency selectivity (measured using the notched-noise method at 500, 1000measured using the notched-noise method at 500, 2000, and 4000 Hz), TFS processing ability (2 Hz frequency-modulation detection thresholds for 500, 10002 Hz frequency-modulation detection thresholds for 500, 2000, and 4000 Hz carriers) and sentence intelligibility in noise (at a 0 dB signal-to-noise ratio) that were measured for the same listeners in a separate study. Study Sample Eight normal-hearing (NH) listeners and 12 listeners with a diagnosis of bilateral sensorineural hearing loss participated. Data Collection and Analysis STM sensitivity was compared between NH and HI listener groups using a repeated-measures analysis of variance. A stepwise regression analysis compared STM sensitivity for individual HI listeners to audiometric thresholds, age, and measures of frequency selectivity and TFS processing ability. A second stepwise regression analysis compared speech intelligibility to STM sensitivity and the audiogram-based Speech Intelligibility Index. Results STM detection thresholds were elevated for the HI listeners, but only for low rates and high densities. STM sensitivity for individual HI listeners was well predicted by a combination of estimates of frequency selectivity at 4000 Hz and TFS sensitivity at 500 Hz but was unrelated to audiometric thresholds. STM sensitivity accounted for an additional 40% of the variance in speech intelligibility beyond the 40% accounted for by the audibility-based Speech Intelligibility Index. Conclusions Impaired STM sensitivity likely results from a combination of a reduced ability to resolve spectral peaks and a reduced ability to use TFS information to follow spectral-peak movements. Combining STM sensitivity estimates with audiometric threshold measures for individual HI listeners provided a more accurate prediction of speech intelligibility than audiometric measures alone. These results suggest a significant likelihood of success for an STM-based model of speech intelligibility for HI listeners. PMID:23636210
[Spontaneous speech prosody and discourse analysis in schizophrenia and Fronto Temporal Dementia (FTD) patients].

PubMed

Martínez, Angela; Felizzola Donado, Carlos Alberto; Matallana Eslava, Diana Lucía

2015-01-01

Patients with schizophrenia and Frontotemporal Dementia (FTD) in their linguistic variants share some language characteristics such as the lexical access difficulties, disordered speech with disruptions, many pauses, interruptions and reformulations. For the schizophrenia patients it reflects a difficulty of affect expression, while for the FTD patients it reflects a linguistic issue. This study, through an analysis of a series of cases assessed Clinic both in memory and on the Mental Health Unit of HUSI-PUJ (Hospital Universitario San Ignacio), with additional language assessment (analysis speech and acoustic analysis), present distinctive features of the DFT in its linguistic variants and schizophrenia that will guide the specialist in finding early markers of a differential diagnosis. In patients with FTD language variants, in 100% of cases there is a difficulty understanding linguistic structure of complex type; and important speech fluency problems. In patients with schizophrenia, there are significant alterations in the expression of the suprasegmental elements of speech, as well as disruptions in discourse. We present how depth language assessment allows to reassess some of the rules for the speech and prosody analysis of patients with dementia and schizophrenia; we suggest how elements of speech are useful in guiding the diagnosis and correlate functional compromise in everyday psychiatrist's practice. Copyright © 2014 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
Prediction Errors but Not Sharpened Signals Simulate Multivoxel fMRI Patterns during Speech Perception

PubMed Central

Davis, Matthew H.

2016-01-01

Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior expectation and sensory detail provides evidence for a Predictive Coding account of speech perception. Our work establishes methods that can be used to distinguish representations of Prediction Error and Sharpened Signals in other perceptual domains. PMID:27846209
Getting the cocktail party started: masking effects in speech perception

PubMed Central

Evans, S; McGettigan, C; Agnew, ZK; Rosen, S; Scott, SK

2016-01-01

Spoken conversations typically take place in noisy environments and different kinds of masking sounds place differing demands on cognitive resources. Previous studies, examining the modulation of neural activity associated with the properties of competing sounds, have shown that additional speech streams engage the superior temporal gyrus. However, the absence of a condition in which target speech was heard without additional masking made it difficult to identify brain networks specific to masking and to ascertain the extent to which competing speech was processed equivalently to target speech. In this study, we scanned young healthy adults with continuous functional Magnetic Resonance Imaging (fMRI), whilst they listened to stories masked by sounds that differed in their similarity to speech. We show that auditory attention and control networks are activated during attentive listening to masked speech in the absence of an overt behavioural task. We demonstrate that competing speech is processed predominantly in the left hemisphere within the same pathway as target speech but is not treated equivalently within that stream, and that individuals who perform better in speech in noise tasks activate the left mid-posterior superior temporal gyrus more. Finally, we identify neural responses associated with the onset of sounds in the auditory environment, activity was found within right lateralised frontal regions consistent with a phasic alerting response. Taken together, these results provide a comprehensive account of the neural processes involved in listening in noise. PMID:26696297

Functional overlap between regions involved in speech perception and in monitoring one’s own voice during speech production

PubMed Central

Zheng, Zane Z.; Munhall, Kevin G; Johnsrude, Ingrid S

2009-01-01

The fluency and reliability of speech production suggests a mechanism that links motor commands and sensory feedback. Here, we examine the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not, and examining the overlap with the network recruited during passive listening to speech sounds. We use real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word (‘Ted’) and either heard this clearly, or heard voice-gated masking noise. We compare this to when they listened to yoked stimuli (identical recordings of ‘Ted’ or noise) without speaking. Activity along the superior temporal sulcus (STS) and superior temporal gyrus (STG) bilaterally was significantly greater if the auditory stimulus was a) processed as the auditory concomitant of speaking and b) did not match the predicted outcome (noise). The network exhibiting this Feedback type by Production/Perception interaction includes an STG/MTG region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts, and that processes an error signal in speech-sensitive regions when this and the sensory data do not match. PMID:19642886
Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

PubMed

Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

2018-06-01

Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
Aging affects neural precision of speech encoding

PubMed Central

Anderson, Samira; Parbery-Clark, Alexandra; White-Schwoch, Travis; Kraus, Nina

2012-01-01

Older adults frequently report they can hear what is said but cannot understand the meaning, especially in noise. This difficulty may arise from the inability to process rapidly changing elements of speech. Aging is accompanied by a general slowing of neural processing and decreased neural inhibition, both of which likely interfere with temporal processing in auditory and other sensory domains. Age-related reductions in inhibitory neurotransmitter levels and delayed neural recovery can contribute to decreases in the auditory system’s temporal precision. Decreased precision may lead to neural timing delays, reductions in neural response magnitude, and a disadvantage in processing the rapid acoustic changes in speech. The auditory brainstem response (ABR), a scalp-recorded electrical potential, is known for its ability to capture precise neural synchrony within subcortical auditory nuclei; therefore, we hypothesized that a loss of temporal precision results in subcortical timing delays and decreases in response consistency and magnitude. To assess this hypothesis, we recorded ABRs to the speech syllable /da/ in normal hearing younger (ages 18 to 30) and older adult humans (60 to 67). Older adults had delayed ABRs, especially in response to the rapidly changing formant transition, and greater response variability. We also found that older adults had decreased phase locking and smaller response magnitudes than younger adults. Taken together, our results support the theory that older adults have a loss of temporal precision in subcortical encoding of sound, which may account, at least in part, for their difficulties with speech perception. PMID:23055485
Processing of speech temporal and spectral information by users of auditory brainstem implants and cochlear implants.

PubMed

Azadpour, Mahan; McKay, Colette M

2014-01-01

Auditory brainstem implants (ABI) use the same processing strategy as was developed for cochlear implants (CI). However, the cochlear nucleus (CN), the stimulation site of ABIs, is anatomically and physiologically more complex than the auditory nerve and consists of neurons with differing roles in auditory processing. The aim of this study was to evaluate the hypotheses that ABI users are less able than CI users to access speech spectro-temporal information delivered by the existing strategies and that the sites stimulated by different locations of CI and ABI electrode arrays differ in encoding of temporal patterns in the stimulation. Six CI users and four ABI users of Nucleus implants with ACE processing strategy participated in this study. Closed-set perception of aCa syllables (16 consonants) and bVd words (11 vowels) was evaluated via experimental processing strategies that activated one, two, or four of the electrodes of the array in a CIS manner as well as subjects' clinical strategies. Three single-channel strategies presented the overall temporal envelope variations of the signal on a single-implant electrode located at the high-, medium-, and low-frequency regions of the array. Implantees' ability to discriminate within electrode temporal patterns of stimulation for phoneme perception and their ability to make use of spectral information presented by increased number of active electrodes were assessed in the single- and multiple-channel strategies, respectively. Overall percentages and information transmission of phonetic features were obtained for each experimental program. Phoneme perception performance of three ABI users was within the range of CI users in most of the experimental strategies and improved as the number of active electrodes increased. One ABI user performed close to chance with all the single and multiple electrode strategies. There was no significant difference between apical, basal, and middle CI electrodes in transmitting speech temporal information, except a trend that the voicing feature was the least transmitted by the basal electrode. A similar electrode-location pattern could be observed in most ABI subjects. Although the number of tested ABI subjects was small, their wide range of phoneme perception performance was consistent with previous reports of overall speech perception in ABI patients. The better-performing ABI user participants had access to speech temporal and spectral information that was comparable to that of average CI user. The poor-performing ABI user did not have access to within-channel speech temporal information and did not benefit from an increased number of spectral channels. The within-subject variability between different ABI electrodes was less than the variability across users in transmission of speech temporal information. The difference in the performance of ABI users could be related to the location of their electrode array on the CN, anatomy, and physiology of their CN or the damage to their auditory brainstem due to tumor or surgery.
Speech-evoked activation in adult temporal cortex measured using functional near-infrared spectroscopy (fNIRS): Are the measurements reliable?

PubMed

Wiggins, Ian M; Anderson, Carly A; Kitterick, Pádraig T; Hartley, Douglas E H

2016-09-01

Functional near-infrared spectroscopy (fNIRS) is a silent, non-invasive neuroimaging technique that is potentially well suited to auditory research. However, the reliability of auditory-evoked activation measured using fNIRS is largely unknown. The present study investigated the test-retest reliability of speech-evoked fNIRS responses in normally-hearing adults. Seventeen participants underwent fNIRS imaging in two sessions separated by three months. In a block design, participants were presented with auditory speech, visual speech (silent speechreading), and audiovisual speech conditions. Optode arrays were placed bilaterally over the temporal lobes, targeting auditory brain regions. A range of established metrics was used to quantify the reproducibility of cortical activation patterns, as well as the amplitude and time course of the haemodynamic response within predefined regions of interest. The use of a signal processing algorithm designed to reduce the influence of systemic physiological signals was found to be crucial to achieving reliable detection of significant activation at the group level. For auditory speech (with or without visual cues), reliability was good to excellent at the group level, but highly variable among individuals. Temporal-lobe activation in response to visual speech was less reliable, especially in the right hemisphere. Consistent with previous reports, fNIRS reliability was improved by averaging across a small number of channels overlying a cortical region of interest. Overall, the present results confirm that fNIRS can measure speech-evoked auditory responses in adults that are highly reliable at the group level, and indicate that signal processing to reduce physiological noise may substantially improve the reliability of fNIRS measurements. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
The role of auditory and cognitive factors in understanding speech in noise by normal-hearing older listeners

PubMed Central

Schoof, Tim; Rosen, Stuart

2014-01-01

Normal-hearing older adults often experience increased difficulties understanding speech in noise. In addition, they benefit less from amplitude fluctuations in the masker. These difficulties may be attributed to an age-related auditory temporal processing deficit. However, a decline in cognitive processing likely also plays an important role. This study examined the relative contribution of declines in both auditory and cognitive processing to the speech in noise performance in older adults. Participants included older (60–72 years) and younger (19–29 years) adults with normal hearing. Speech reception thresholds (SRTs) were measured for sentences in steady-state speech-shaped noise (SS), 10-Hz sinusoidally amplitude-modulated speech-shaped noise (AM), and two-talker babble. In addition, auditory temporal processing abilities were assessed by measuring thresholds for gap, amplitude-modulation, and frequency-modulation detection. Measures of processing speed, attention, working memory, Text Reception Threshold (a visual analog of the SRT), and reading ability were also obtained. Of primary interest was the extent to which the various measures correlate with listeners' abilities to perceive speech in noise. SRTs were significantly worse for older adults in the presence of two-talker babble but not SS and AM noise. In addition, older adults showed some cognitive processing declines (working memory and processing speed) although no declines in auditory temporal processing. However, working memory and processing speed did not correlate significantly with SRTs in babble. Despite declines in cognitive processing, normal-hearing older adults do not necessarily have problems understanding speech in noise as SRTs in SS and AM noise did not differ significantly between the two groups. Moreover, while older adults had higher SRTs in two-talker babble, this could not be explained by age-related cognitive declines in working memory or processing speed. PMID:25429266
The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

PubMed Central

Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

2016-01-01

The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286
Mapping the cortical representation of speech sounds in a syllable repetition task.

PubMed

Markiewicz, Christopher J; Bohland, Jason W

2016-11-01

Speech repetition relies on a series of distributed cortical representations and functional pathways. A speaker must map auditory representations of incoming sounds onto learned speech items, maintain an accurate representation of those items in short-term memory, interface that representation with the motor output system, and fluently articulate the target sequence. A "dorsal stream" consisting of posterior temporal, inferior parietal and premotor regions is thought to mediate auditory-motor representations and transformations, but the nature and activation of these representations for different portions of speech repetition tasks remains unclear. Here we mapped the correlates of phonetic and/or phonological information related to the specific phonemes and syllables that were heard, remembered, and produced using a series of cortical searchlight multi-voxel pattern analyses trained on estimates of BOLD responses from individual trials. Based on responses linked to input events (auditory syllable presentation), predictive vowel-level information was found in the left inferior frontal sulcus, while syllable prediction revealed significant clusters in the left ventral premotor cortex and central sulcus and the left mid superior temporal sulcus. Responses linked to output events (the GO signal cueing overt production) revealed strong clusters of vowel-related information bilaterally in the mid to posterior superior temporal sulcus. For the prediction of onset and coda consonants, input-linked responses yielded distributed clusters in the superior temporal cortices, which were further informative for classifiers trained on output-linked responses. Output-linked responses in the Rolandic cortex made strong predictions for the syllables and consonants produced, but their predictive power was reduced for vowels. The results of this study provide a systematic survey of how cortical response patterns covary with the identity of speech sounds, which will help to constrain and guide theoretical models of speech perception, speech production, and phonological working memory. Copyright © 2016 Elsevier Inc. All rights reserved.
Auditory and Cognitive Factors Associated with Speech-in-Noise Complaints following Mild Traumatic Brain Injury.

PubMed

Hoover, Eric C; Souza, Pamela E; Gallun, Frederick J

2017-04-01

Auditory complaints following mild traumatic brain injury (MTBI) are common, but few studies have addressed the role of auditory temporal processing in speech recognition complaints. In this study, deficits understanding speech in a background of speech noise following MTBI were evaluated with the goal of comparing the relative contributions of auditory and nonauditory factors. A matched-groups design was used in which a group of listeners with a history of MTBI were compared to a group matched in age and pure-tone thresholds, as well as a control group of young listeners with normal hearing (YNH). Of the 33 listeners who participated in the study, 13 were included in the MTBI group (mean age = 46.7 yr), 11 in the Matched group (mean age = 49 yr), and 9 in the YNH group (mean age = 20.8 yr). Speech-in-noise deficits were evaluated using subjective measures as well as monaural word (Words-in-Noise test) and sentence (Quick Speech-in-Noise test) tasks, and a binaural spatial release task. Performance on these measures was compared to psychophysical tasks that evaluate monaural and binaural temporal fine-structure tasks and spectral resolution. Cognitive measures of attention, processing speed, and working memory were evaluated as possible causes of differences between MTBI and Matched groups that might contribute to speech-in-noise perception deficits. A high proportion of listeners in the MTBI group reported difficulty understanding speech in noise (84%) compared to the Matched group (9.1%), and listeners who reported difficulty were more likely to have abnormal results on objective measures of speech in noise. No significant group differences were found between the MTBI and Matched listeners on any of the measures reported, but the number of abnormal tests differed across groups. Regression analysis revealed that a combination of auditory and auditory processing factors contributed to monaural speech-in-noise scores, but the benefit of spatial separation was related to a combination of working memory and peripheral auditory factors across all listeners in the study. The results of this study are consistent with previous findings that a subset of listeners with MTBI has objective auditory deficits. Speech-in-noise performance was related to a combination of auditory and nonauditory factors, confirming the important role of audiology in MTBI rehabilitation. Further research is needed to evaluate the prevalence and causal relationship of auditory deficits following MTBI. American Academy of Audiology
Articulatory mediation of speech perception: a causal analysis of multi-modal imaging data.

PubMed

Gow, David W; Segawa, Jennifer A

2009-02-01

The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analyzes of high spatiotemporal resolution neural activation data derived from the integration of magnetic resonance imaging, magnetoencephalography and electroencephalography, to examine the role of lexical and articulatory mediation in listeners' ability to use phonetic context to compensate for place assimilation. Listeners heard two-word phrases such as pen pad and then saw two pictures, from which they had to select the one that depicted the phrase. Assimilation, lexical competitor environment and the phonological validity of assimilation context were all manipulated. Behavioral data showed an effect of context on the interpretation of assimilated segments. Analysis of 40 Hz gamma phase locking patterns identified a large distributed neural network including 16 distinct regions of interest (ROIs) spanning portions of both hemispheres in the first 200 ms of post-assimilation context. Granger analyzes of individual conditions showed differing patterns of causal interaction between ROIs during this interval, with hypothesized lexical and articulatory structures and pathways driving phonetic activation in the posterior superior temporal gyrus in assimilation conditions, but not in phonetically unambiguous conditions. These results lend strong support for the motor theory of speech perception, and clarify the role of lexical mediation in the phonetic processing of assimilated speech.
The steady-state response of the cerebral cortex to the beat of music reflects both the comprehension of music and attention

PubMed Central

Meltzer, Benjamin; Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

2015-01-01

The brain’s analyses of speech and music share a range of neural resources and mechanisms. Music displays a temporal structure of complexity similar to that of speech, unfolds over comparable timescales, and elicits cognitive demands in tasks involving comprehension and attention. During speech processing, synchronized neural activity of the cerebral cortex in the delta and theta frequency bands tracks the envelope of a speech signal, and this neural activity is modulated by high-level cortical functions such as speech comprehension and attention. It remains unclear, however, whether the cortex also responds to the natural rhythmic structure of music and how the response, if present, is influenced by higher cognitive processes. Here we employ electroencephalography to show that the cortex responds to the beat of music and that this steady-state response reflects musical comprehension and attention. We show that the cortical response to the beat is weaker when subjects listen to a familiar tune than when they listen to an unfamiliar, non-sensical musical piece. Furthermore, we show that in a task of intermodal attention there is a larger neural response at the beat frequency when subjects attend to a musical stimulus than when they ignore the auditory signal and instead focus on a visual one. Our findings may be applied in clinical assessments of auditory processing and music cognition as well as in the construction of auditory brain-machine interfaces. PMID:26300760
A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography.

PubMed

Ozker, Muge; Schepers, Inga M; Magnotti, John F; Yoshor, Daniel; Beauchamp, Michael S

2017-06-01

Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.
Auditory Processing, Speech Perception and Phonological Ability in Pre-School Children at High-Risk for Dyslexia: A Longitudinal Study of the Auditory Temporal Processing Theory

ERIC Educational Resources Information Center

Boets, Bart; Wouters, Jan; van Wieringen, Astrid; Ghesquiere, Pol

2007-01-01

This study investigates whether the core bottleneck of literacy-impairment should be situated at the phonological level or at a more basic sensory level, as postulated by supporters of the auditory temporal processing theory. Phonological ability, speech perception and low-level auditory processing were assessed in a group of 5-year-old pre-school…
Hearing impairment, cognition and speech understanding: exploratory factor analyses of a comprehensive test battery for a group of hearing aid users, the n200 study

PubMed Central

Rönnberg, Jerker; Lunner, Thomas; Ng, Elaine Hoi Ning; Lidestam, Björn; Zekveld, Adriana Agatha; Sörqvist, Patrik; Lyxell, Björn; Träff, Ulf; Yumba, Wycliffe; Classon, Elisabet; Hällgren, Mathias; Larsby, Birgitta; Signoret, Carine; Pichora-Fuller, M. Kathleen; Rudner, Mary; Danielsson, Henrik; Stenfelt, Stefan

2016-01-01

Abstract Objective: The aims of the current n200 study were to assess the structural relations between three classes of test variables (i.e. HEARING, COGNITION and aided speech-in-noise OUTCOMES) and to describe the theoretical implications of these relations for the Ease of Language Understanding (ELU) model. Study sample: Participants were 200 hard-of-hearing hearing-aid users, with a mean age of 60.8 years. Forty-three percent were females and the mean hearing threshold in the better ear was 37.4 dB HL. Design: LEVEL1 factor analyses extracted one factor per test and/or cognitive function based on a priori conceptualizations. The more abstract LEVEL 2 factor analyses were performed separately for the three classes of test variables. Results: The HEARING test variables resulted in two LEVEL 2 factors, which we labelled SENSITIVITY and TEMPORAL FINE STRUCTURE; the COGNITIVE variables in one COGNITION factor only, and OUTCOMES in two factors, NO CONTEXT and CONTEXT. COGNITION predicted the NO CONTEXT factor to a stronger extent than the CONTEXT outcome factor. TEMPORAL FINE STRUCTURE and SENSITIVITY were associated with COGNITION and all three contributed significantly and independently to especially the NO CONTEXT outcome scores (R2 = 0.40). Conclusions: All LEVEL 2 factors are important theoretically as well as for clinical assessment. PMID:27589015
Hearing impairment, cognition and speech understanding: exploratory factor analyses of a comprehensive test battery for a group of hearing aid users, the n200 study.

PubMed

Rönnberg, Jerker; Lunner, Thomas; Ng, Elaine Hoi Ning; Lidestam, Björn; Zekveld, Adriana Agatha; Sörqvist, Patrik; Lyxell, Björn; Träff, Ulf; Yumba, Wycliffe; Classon, Elisabet; Hällgren, Mathias; Larsby, Birgitta; Signoret, Carine; Pichora-Fuller, M Kathleen; Rudner, Mary; Danielsson, Henrik; Stenfelt, Stefan

2016-11-01

The aims of the current n200 study were to assess the structural relations between three classes of test variables (i.e. HEARING, COGNITION and aided speech-in-noise OUTCOMES) and to describe the theoretical implications of these relations for the Ease of Language Understanding (ELU) model. Participants were 200 hard-of-hearing hearing-aid users, with a mean age of 60.8 years. Forty-three percent were females and the mean hearing threshold in the better ear was 37.4 dB HL. LEVEL1 factor analyses extracted one factor per test and/or cognitive function based on a priori conceptualizations. The more abstract LEVEL 2 factor analyses were performed separately for the three classes of test variables. The HEARING test variables resulted in two LEVEL 2 factors, which we labelled SENSITIVITY and TEMPORAL FINE STRUCTURE; the COGNITIVE variables in one COGNITION factor only, and OUTCOMES in two factors, NO CONTEXT and CONTEXT. COGNITION predicted the NO CONTEXT factor to a stronger extent than the CONTEXT outcome factor. TEMPORAL FINE STRUCTURE and SENSITIVITY were associated with COGNITION and all three contributed significantly and independently to especially the NO CONTEXT outcome scores (R(2) = 0.40). All LEVEL 2 factors are important theoretically as well as for clinical assessment.
Spatiotemporal dynamics of auditory attention synchronize with speech

PubMed Central

Wöstmann, Malte; Herrmann, Björn; Maess, Burkhard

2016-01-01

Attention plays a fundamental role in selectively processing stimuli in our environment despite distraction. Spatial attention induces increasing and decreasing power of neural alpha oscillations (8–12 Hz) in brain regions ipsilateral and contralateral to the locus of attention, respectively. This study tested whether the hemispheric lateralization of alpha power codes not just the spatial location but also the temporal structure of the stimulus. Participants attended to spoken digits presented to one ear and ignored tightly synchronized distracting digits presented to the other ear. In the magnetoencephalogram, spatial attention induced lateralization of alpha power in parietal, but notably also in auditory cortical regions. This alpha power lateralization was not maintained steadily but fluctuated in synchrony with the speech rate and lagged the time course of low-frequency (1–5 Hz) sensory synchronization. Higher amplitude of alpha power modulation at the speech rate was predictive of a listener’s enhanced performance of stream-specific speech comprehension. Our findings demonstrate that alpha power lateralization is modulated in tune with the sensory input and acts as a spatiotemporal filter controlling the read-out of sensory content. PMID:27001861
The influence of selective attention to auditory and visual speech on the integration of audiovisual speech information.

PubMed

Buchan, Julie N; Munhall, Kevin G

2011-01-01

Conflicting visual speech information can influence the perception of acoustic speech, causing an illusory percept of a sound not present in the actual acoustic speech (the McGurk effect). We examined whether participants can voluntarily selectively attend to either the auditory or visual modality by instructing participants to pay attention to the information in one modality and to ignore competing information from the other modality. We also examined how performance under these instructions was affected by weakening the influence of the visual information by manipulating the temporal offset between the audio and video channels (experiment 1), and the spatial frequency information present in the video (experiment 2). Gaze behaviour was also monitored to examine whether attentional instructions influenced the gathering of visual information. While task instructions did have an influence on the observed integration of auditory and visual speech information, participants were unable to completely ignore conflicting information, particularly information from the visual stream. Manipulating temporal offset had a more pronounced interaction with task instructions than manipulating the amount of visual information. Participants' gaze behaviour suggests that the attended modality influences the gathering of visual information in audiovisual speech perception.
The relation of object naming and other visual speech production tasks: a large scale voxel-based morphometric study.

PubMed

Lau, Johnny King L; Humphreys, Glyn W; Douis, Hassan; Balani, Alex; Bickerton, Wai-Ling; Rotshtein, Pia

2015-01-01

We report a lesion-symptom mapping analysis of visual speech production deficits in a large group (280) of stroke patients at the sub-acute stage (<120 days post-stroke). Performance on object naming was evaluated alongside three other tests of visual speech production, namely sentence production to a picture, sentence reading and nonword reading. A principal component analysis was performed on all these tests' scores and revealed a 'shared' component that loaded across all the visual speech production tasks and a 'unique' component that isolated object naming from the other three tasks. Regions for the shared component were observed in the left fronto-temporal cortices, fusiform gyrus and bilateral visual cortices. Lesions in these regions linked to both poor object naming and impairment in general visual-speech production. On the other hand, the unique naming component was potentially associated with the bilateral anterior temporal poles, hippocampus and cerebellar areas. This is in line with the models proposing that object naming relies on a left-lateralised language dominant system that interacts with a bilateral anterior temporal network. Neuropsychological deficits in object naming can reflect both the increased demands specific to the task and the more general difficulties in language processing.
Masking release for words in amplitude-modulated noise as a function of modulation rate and task

PubMed Central

Buss, Emily; Whittle, Lisa N.; Grose, John H.; Hall, Joseph W.

2009-01-01

For normal-hearing listeners, masked speech recognition can improve with the introduction of masker amplitude modulation. The present experiments tested the hypothesis that this masking release is due in part to an interaction between the temporal distribution of cues necessary to perform the task and the probability of those cues temporally coinciding with masker modulation minima. Stimuli were monosyllabic words masked by speech-shaped noise, and masker modulation was introduced via multiplication with a raised sinusoid of 2.5–40 Hz. Tasks included detection, three-alternative forced-choice identification, and open-set identification. Overall, there was more masking release associated with the closed than the open-set tasks. The best rate of modulation also differed as a function of task; whereas low modulation rates were associated with best performance for the detection and three-alternative identification tasks, performance improved with modulation rate in the open-set task. This task-by-rate interaction was also observed when amplitude-modulated speech was presented in a steady masker, and for low- and high-pass filtered speech presented in modulated noise. These results were interpreted as showing that the optimal rate of amplitude modulation depends on the temporal distribution of speech cues and the information required to perform a particular task. PMID:19603883
Eye’m talking to you: speakers’ gaze direction modulates co-speech gesture processing in the right MTG

PubMed Central

Toni, Ivan; Hagoort, Peter; Kelly, Spencer D.; Özyürek, Aslı

2015-01-01

Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture. Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech–gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts. PMID:24652857

The effect of presentation level and stimulation rate on speech perception and modulation detection for cochlear implant users.

PubMed

Brochier, Tim; McDermott, Hugh J; McKay, Colette M

2017-06-01

In order to improve speech understanding for cochlear implant users, it is important to maximize the transmission of temporal information. The combined effects of stimulation rate and presentation level on temporal information transfer and speech understanding remain unclear. The present study systematically varied presentation level (60, 50, and 40 dBA) and stimulation rate [500 and 2400 pulses per second per electrode (pps)] in order to observe how the effect of rate on speech understanding changes for different presentation levels. Speech recognition in quiet and noise, and acoustic amplitude modulation detection thresholds (AMDTs) were measured with acoustic stimuli presented to speech processors via direct audio input (DAI). With the 500 pps processor, results showed significantly better performance for consonant-vowel nucleus-consonant words in quiet, and a reduced effect of noise on sentence recognition. However, no rate or level effect was found for AMDTs, perhaps partly because of amplitude compression in the sound processor. AMDTs were found to be strongly correlated with the effect of noise on sentence perception at low levels. These results indicate that AMDTs, at least when measured with the CP910 Freedom speech processor via DAI, explain between-subject variance of speech understanding, but do not explain within-subject variance for different rates and levels.
Prediction and constraint in audiovisual speech perception

PubMed Central

Peelle, Jonathan E.; Sommers, Mitchell S.

2015-01-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. PMID:25890390
Age-related changes in the functional neuroanatomy of overt speech production.

PubMed

Sörös, Peter; Bose, Arpita; Sokoloff, Lisa Guttman; Graham, Simon J; Stuss, Donald T

2011-08-01

Alterations of existing neural networks during healthy aging, resulting in behavioral deficits and changes in brain activity, have been described for cognitive, motor, and sensory functions. To investigate age-related changes in the neural circuitry underlying overt non-lexical speech production, functional MRI was performed in 14 healthy younger (21-32 years) and 14 healthy older individuals (62-84 years). The experimental task involved the acoustically cued overt production of the vowel /a/ and the polysyllabic utterance /pataka/. In younger and older individuals, overt speech production was associated with the activation of a widespread articulo-phonological network, including the primary motor cortex, the supplementary motor area, the cingulate motor areas, and the posterior superior temporal cortex, similar in the /a/ and /pataka/ condition. An analysis of variance with the factors age and condition revealed a significant main effect of age. Irrespective of the experimental condition, significantly greater activation was found in the bilateral posterior superior temporal cortex, the posterior temporal plane, and the transverse temporal gyri in younger compared to older individuals. Significantly greater activation was found in the bilateral middle temporal gyri, medial frontal gyri, middle frontal gyri, and inferior frontal gyri in older vs. younger individuals. The analysis of variance did not reveal a significant main effect of condition and no significant interaction of age and condition. These results suggest a complex reorganization of neural networks dedicated to the production of speech during healthy aging. Copyright © 2009 Elsevier Inc. All rights reserved.
Don’t speak too fast! Processing of fast rate speech in children with specific language impairment

PubMed Central

Bedoin, Nathalie; Krifi-Papoz, Sonia; Herbillon, Vania; Caillot-Bascoul, Aurélia; Gonzalez-Monge, Sibylle; Boulenger, Véronique

2018-01-01

Background Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Method Sixteen French children with SLI (8–13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d′) to semantically incongruent sentence-final words was measured. Results Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. Conclusion In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception. PMID:29373610
The Relationship Between Spectral Modulation Detection and Speech Recognition: Adult Versus Pediatric Cochlear Implant Recipients

PubMed Central

Noble, Jack H.; Camarata, Stephen M.; Sunderhaus, Linsey W.; Dwyer, Robert T.; Dawant, Benoit M.; Dietrich, Mary S.; Labadie, Robert F.

2018-01-01

Adult cochlear implant (CI) recipients demonstrate a reliable relationship between spectral modulation detection and speech understanding. Prior studies documenting this relationship have focused on postlingually deafened adult CI recipients—leaving an open question regarding the relationship between spectral resolution and speech understanding for adults and children with prelingual onset of deafness. Here, we report CI performance on the measures of speech recognition and spectral modulation detection for 578 CI recipients including 477 postlingual adults, 65 prelingual adults, and 36 prelingual pediatric CI users. The results demonstrated a significant correlation between spectral modulation detection and various measures of speech understanding for 542 adult CI recipients. For 36 pediatric CI recipients, however, there was no significant correlation between spectral modulation detection and speech understanding in quiet or in noise nor was spectral modulation detection significantly correlated with listener age or age at implantation. These findings suggest that pediatric CI recipients might not depend upon spectral resolution for speech understanding in the same manner as adult CI recipients. It is possible that pediatric CI users are making use of different cues, such as those contained within the temporal envelope, to achieve high levels of speech understanding. Further investigation is warranted to investigate the relationship between spectral and temporal resolution and speech recognition to describe the underlying mechanisms driving peripheral auditory processing in pediatric CI users. PMID:29716437
Auditory processing and speech perception in children with specific language impairment: relations with oral language and literacy skills.

PubMed

Vandewalle, Ellen; Boets, Bart; Ghesquière, Pol; Zink, Inge

2012-01-01

This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay (n = 8), (2) children with SLI and normal literacy (n = 10) and (3) typically developing children (n = 14). Moreover, the relations between these auditory processing and speech perception skills and oral language and literacy skills in grade 1 and grade 3 were analyzed. The SLI group with literacy delay scored significantly lower than both other groups on speech perception, but not on temporal auditory processing. Both normal reading groups did not differ in terms of speech perception or auditory processing. Speech perception was significantly related to reading and spelling in grades 1 and 3 and had a unique predictive contribution to reading growth in grade 3, even after controlling reading level, phonological ability, auditory processing and oral language skills in grade 1. These findings indicated that speech perception also had a unique direct impact upon reading development and not only through its relation with phonological awareness. Moreover, speech perception seemed to be more associated with the development of literacy skills and less with oral language ability. Copyright © 2011 Elsevier Ltd. All rights reserved.
Three- and four-dimensional mapping of speech and language in patients with epilepsy.

PubMed

Nakai, Yasuo; Jeong, Jeong-Won; Brown, Erik C; Rothermel, Robert; Kojima, Katsuaki; Kambara, Toshimune; Shah, Aashit; Mittal, Sandeep; Sood, Sandeep; Asano, Eishi

2017-05-01

We have provided 3-D and 4D mapping of speech and language function based upon the results of direct cortical stimulation and event-related modulation of electrocorticography signals. Patients estimated to have right-hemispheric language dominance were excluded. Thus, 100 patients who underwent two-stage epilepsy surgery with chronic electrocorticography recording were studied. An older group consisted of 84 patients at least 10 years of age (7367 artefact-free non-epileptic electrodes), whereas a younger group included 16 children younger than age 10 (1438 electrodes). The probability of symptoms transiently induced by electrical stimulation was delineated on a 3D average surface image. The electrocorticography amplitude changes of high-gamma (70-110 Hz) and beta (15-30 Hz) activities during an auditory-naming task were animated on the average surface image in a 4D manner. Thereby, high-gamma augmentation and beta attenuation were treated as summary measures of cortical activation. Stimulation data indicated the causal relationship between (i) superior-temporal gyrus of either hemisphere and auditory hallucination; (ii) left superior-/middle-temporal gyri and receptive aphasia; (iii) widespread temporal/frontal lobe regions of the left hemisphere and expressive aphasia; and (iv) bilateral precentral/left posterior superior-frontal regions and speech arrest. On electrocorticography analysis, high-gamma augmentation involved the bilateral superior-temporal and precentral gyri immediately following question onset; at the same time, high-gamma activity was attenuated in the left orbitofrontal gyrus. High-gamma activity was augmented in the left temporal/frontal lobe regions, as well as left inferior-parietal and cingulate regions, maximally around question offset, with high-gamma augmentation in the left pars orbitalis inferior-frontal, middle-frontal, and inferior-parietal regions preceded by high-gamma attenuation in the contralateral homotopic regions. Immediately before verbal response, high-gamma augmentation involved the posterior superior-frontal and pre/postcentral regions, bilaterally. Beta-attenuation was spatially and temporally correlated with high-gamma augmentation in general but with exceptions. The younger and older groups shared similar spatial-temporal profiles of high-gamma and beta modulation; except, the younger group failed to show left-dominant activation in the rostral middle-frontal and pars orbitalis inferior-frontal regions around stimulus offset. The human brain may rapidly and alternately activate and deactivate cortical areas advantageous or obtrusive to function directed toward speech and language at a given moment. Increased left-dominant activation in the anterior frontal structures in the older age group may reflect developmental consolidation of the language system. The results of our functional mapping may be useful in predicting, across not only space but also time and patient age, sites specific to language function for presurgical evaluation of focal epilepsy. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain.
Three- and four-dimensional mapping of speech and language in patients with epilepsy

PubMed Central

Nakai, Yasuo; Jeong, Jeong-won; Brown, Erik C.; Rothermel, Robert; Kojima, Katsuaki; Kambara, Toshimune; Shah, Aashit; Mittal, Sandeep; Sood, Sandeep

2017-01-01

We have provided 3-D and 4D mapping of speech and language function based upon the results of direct cortical stimulation and event-related modulation of electrocorticography signals. Patients estimated to have right-hemispheric language dominance were excluded. Thus, 100 patients who underwent two-stage epilepsy surgery with chronic electrocorticography recording were studied. An older group consisted of 84 patients at least 10 years of age (7367 artefact-free non-epileptic electrodes), whereas a younger group included 16 children younger than age 10 (1438 electrodes). The probability of symptoms transiently induced by electrical stimulation was delineated on a 3D average surface image. The electrocorticography amplitude changes of high-gamma (70–110 Hz) and beta (15–30 Hz) activities during an auditory-naming task were animated on the average surface image in a 4D manner. Thereby, high-gamma augmentation and beta attenuation were treated as summary measures of cortical activation. Stimulation data indicated the causal relationship between (i) superior-temporal gyrus of either hemisphere and auditory hallucination; (ii) left superior-/middle-temporal gyri and receptive aphasia; (iii) widespread temporal/frontal lobe regions of the left hemisphere and expressive aphasia; and (iv) bilateral precentral/left posterior superior-frontal regions and speech arrest. On electrocorticography analysis, high-gamma augmentation involved the bilateral superior-temporal and precentral gyri immediately following question onset; at the same time, high-gamma activity was attenuated in the left orbitofrontal gyrus. High-gamma activity was augmented in the left temporal/frontal lobe regions, as well as left inferior-parietal and cingulate regions, maximally around question offset, with high-gamma augmentation in the left pars orbitalis inferior-frontal, middle-frontal, and inferior-parietal regions preceded by high-gamma attenuation in the contralateral homotopic regions. Immediately before verbal response, high-gamma augmentation involved the posterior superior-frontal and pre/postcentral regions, bilaterally. Beta-attenuation was spatially and temporally correlated with high-gamma augmentation in general but with exceptions. The younger and older groups shared similar spatial-temporal profiles of high-gamma and beta modulation; except, the younger group failed to show left-dominant activation in the rostral middle-frontal and pars orbitalis inferior-frontal regions around stimulus offset. The human brain may rapidly and alternately activate and deactivate cortical areas advantageous or obtrusive to function directed toward speech and language at a given moment. Increased left-dominant activation in the anterior frontal structures in the older age group may reflect developmental consolidation of the language system. The results of our functional mapping may be useful in predicting, across not only space but also time and patient age, sites specific to language function for presurgical evaluation of focal epilepsy. PMID:28334963
Perception of Hearing Aid-Processed Speech in Individuals with Late-Onset Auditory Neuropathy Spectrum Disorder.

PubMed

Mathai, Jijo Pottackal; Appu, Sabarish

2015-01-01

Auditory neuropathy spectrum disorder (ANSD) is a form of sensorineural hearing loss, causing severe deficits in speech perception. The perceptual problems of individuals with ANSD were attributed to their temporal processing impairment rather than to reduced audibility. This rendered their rehabilitation difficult using hearing aids. Although hearing aids can restore audibility, compression circuits in a hearing aid might distort the temporal modulations of speech, causing poor aided performance. Therefore, hearing aid settings that preserve the temporal modulations of speech might be an effective way to improve speech perception in ANSD. The purpose of the study was to investigate the perception of hearing aid-processed speech in individuals with late-onset ANSD. A repeated measures design was used to study the effect of various compression time settings on speech perception and perceived quality. Seventeen individuals with late-onset ANSD within the age range of 20-35 yr participated in the study. The word recognition scores (WRSs) and quality judgment of phonemically balanced words, processed using four different compression settings of a hearing aid (slow, medium, fast, and linear), were evaluated. The modulation spectra of hearing aid-processed stimuli were estimated to probe the effect of amplification on the temporal envelope of speech. Repeated measures analysis of variance and post hoc Bonferroni's pairwise comparisons were used to analyze the word recognition performance and quality judgment. The comparison between unprocessed and all four hearing aid-processed stimuli showed significantly higher perception using the former stimuli. Even though perception of words processed using slow compression time settings of the hearing aids were significantly higher than the fast one, their difference was only 4%. In addition, there were no significant differences in perception between any other hearing aid-processed stimuli. Analysis of the temporal envelope of hearing aid-processed stimuli revealed minimal changes in the temporal envelope across the four hearing aid settings. In terms of quality, the highest number of individuals preferred stimuli processed using slow compression time settings. Individuals who preferred medium ones followed this. However, none of the individuals preferred fast compression time settings. Analysis of quality judgment showed that slow, medium, and linear settings presented significantly higher preference scores than the fast compression setting. Individuals with ANSD showed no marked difference in perception of speech that was processed using the four different hearing aid settings. However, significantly higher preference, in terms of quality, was found for stimuli processed using slow, medium, and linear settings over the fast one. Therefore, whenever hearing aids are recommended for ANSD, those having slow compression time settings or linear amplification may be chosen over the fast (syllabic compression) one. In addition, WRSs obtained using hearing aid-processed stimuli were remarkably poorer than unprocessed stimuli. This shows that processing of speech through hearing aids might have caused a large reduction of performance in individuals with ANSD. However, further evaluation is needed using individually programmed hearing aids rather than hearing aid-processed stimuli. American Academy of Audiology.
Abnormal Structure–Function Relationship in Spasmodic Dysphonia

PubMed Central

Ludlow, Christy L.

2012-01-01

Spasmodic dysphonia (SD) is a primary focal dystonia characterized by involuntary spasms in the laryngeal muscles during speech production. Although recent studies have found abnormal brain function and white matter organization in SD, the extent of gray matter alterations, their structure–function relationships, and correlations with symptoms remain unknown. We compared gray matter volume (GMV) and cortical thickness (CT) in 40 SD patients and 40 controls using voxel-based morphometry and cortical distance estimates. These measures were examined for relationships with blood oxygen level–dependent signal change during symptomatic syllable production in 15 of the same patients. SD patients had increased GMV, CT, and brain activation in key structures of the speech control system, including the laryngeal sensorimotor cortex, inferior frontal gyrus (IFG), superior/middle temporal and supramarginal gyri, and in a structure commonly abnormal in other primary dystonias, the cerebellum. Among these regions, GMV, CT and activation of the IFG and cerebellum showed positive relationships with SD severity, while CT of the IFG correlated with SD duration. The left anterior insula was the only region with decreased CT, which also correlated with SD symptom severity. These findings provide evidence for coupling between structural and functional abnormalities at different levels within the speech production system in SD. PMID:21666131
Decreased Cerebellar-Orbitofrontal Connectivity Correlates with Stuttering Severity: Whole-Brain Functional and Structural Connectivity Associations with Persistent Developmental Stuttering

PubMed Central

Sitek, Kevin R.; Cai, Shanqing; Beal, Deryk S.; Perkell, Joseph S.; Guenther, Frank H.; Ghosh, Satrajit S.

2016-01-01

Persistent developmental stuttering is characterized by speech production disfluency and affects 1% of adults. The degree of impairment varies widely across individuals and the neural mechanisms underlying the disorder and this variability remain poorly understood. Here we elucidate compensatory mechanisms related to this variability in impairment using whole-brain functional and white matter connectivity analyses in persistent developmental stuttering. We found that people who stutter had stronger functional connectivity between cerebellum and thalamus than people with fluent speech, while stutterers with the least severe symptoms had greater functional connectivity between left cerebellum and left orbitofrontal cortex (OFC). Additionally, people who stutter had decreased functional and white matter connectivity among the perisylvian auditory, motor, and speech planning regions compared to typical speakers, but greater functional connectivity between the right basal ganglia and bilateral temporal auditory regions. Structurally, disfluency ratings were negatively correlated with white matter connections to left perisylvian regions and to the brain stem. Overall, we found increased connectivity among subcortical and reward network structures in people who stutter compared to controls. These connections were negatively correlated with stuttering severity, suggesting the involvement of cerebellum and OFC may underlie successful compensatory mechanisms by more fluent stutterers. PMID:27199712
[Investigating phonological planning processes in speech production through a speech-error induction technique].

PubMed

Nakayama, Masataka; Saito, Satoru

2015-08-01

The present study investigated principles of phonological planning, a common serial ordering mechanism for speech production and phonological short-term memory. Nakayama and Saito (2014) have investigated the principles by using a speech-error induction technique, in which participants were exposed to an auditory distracIor word immediately before an utterance of a target word. They demonstrated within-word adjacent mora exchanges and serial position effects on error rates. These findings support, respectively, the temporal distance and the edge principles at a within-word level. As this previous study induced errors using word distractors created by exchanging adjacent morae in the target words, it is possible that the speech errors are expressions of lexical intrusions reflecting interactive activation of phonological and lexical/semantic representations. To eliminate this possibility, the present study used nonword distractors that had no lexical or semantic representations. This approach successfully replicated the error patterns identified in the abovementioned study, further confirming that the temporal distance and edge principles are organizing precepts in phonological planning.
Low-resolution electromagnetic tomography (LORETA) in children with cochlear implants: a preliminary report.

PubMed

Henkin, Yael; Kishon-Rabin, Liat; Tatin-Schneider, Simona; Urbach, Doron; Hildesheimer, Minka; Kileny, Paul R

2004-12-01

The current preliminary report describes the utilization of low-resolution electromagnetic tomography (LORETA) in a small group of highly performing children using the Nucleus 22 cochlear implant (CI) and in normal-hearing (NH) adults. LORETA current density estimations were performed on an averaged target P3 component that was elicited by non-speech and speech oddball discrimination tasks. The results indicated that, when stimulated with tones, patients with right implants and NH adults (regardless of stimulated ear) showed enhanced activation in the right temporal lobe, whereas patients with left implants showed enhanced activation in the left temporal lobe. When stimulated with speech, patients with right implants showed bilateral activation of the temporal and frontal lobes, whereas patients with left implants showed only left temporal lobe activation. NH adults (regardless of stimulated ear) showed enhanced bilateral activation of the temporal and parietal lobes. The differences in activation patterns between patients with CI and NH subjects may be attributed to the long-term exposure to degraded input conditions which may have resulted in reorganization in terms of functional specialization. The difference between patients with right versus left implants, however, is intriguing and requires further investigation.
The eye as a window to the listening brain: neural correlates of pupil size as a measure of cognitive listening load.

PubMed

Zekveld, Adriana A; Heslenfeld, Dirk J; Johnsrude, Ingrid S; Versfeld, Niek J; Kramer, Sophia E

2014-11-01

An important aspect of hearing is the degree to which listeners have to deploy effort to understand speech. One promising measure of listening effort is task-evoked pupil dilation. Here, we use functional magnetic resonance imaging (fMRI) to identify the neural correlates of pupil dilation during comprehension of degraded spoken sentences in 17 normal-hearing listeners. Subjects listened to sentences degraded in three different ways: the target female speech was masked by fluctuating noise, by speech from a single male speaker, or the target speech was noise-vocoded. The degree of degradation was individually adapted such that 50% or 84% of the sentences were intelligible. Control conditions included clear speech in quiet, and silent trials. The peak pupil dilation was larger for the 50% compared to the 84% intelligibility condition, and largest for speech masked by the single-talker masker, followed by speech masked by fluctuating noise, and smallest for noise-vocoded speech. Activation in the bilateral superior temporal gyrus (STG) showed the same pattern, with most extensive activation for speech masked by the single-talker masker. Larger peak pupil dilation was associated with more activation in the bilateral STG, bilateral ventral and dorsal anterior cingulate cortex and several frontal brain areas. A subset of the temporal region sensitive to pupil dilation was also sensitive to speech intelligibility and degradation type. These results show that pupil dilation during speech perception in challenging conditions reflects both auditory and cognitive processes that are recruited to cope with degraded speech and the need to segregate target speech from interfering sounds. Copyright © 2014 Elsevier Inc. All rights reserved.
Neural Oscillations Carry Speech Rhythm through to Comprehension

PubMed Central

Peelle, Jonathan E.; Davis, Matthew H.

2012-01-01

A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251
An Acquired Deficit of Audiovisual Speech Processing

ERIC Educational Resources Information Center

Hamilton, Roy H.; Shenton, Jeffrey T.; Coslett, H. Branch

2006-01-01

We report a 53-year-old patient (AWF) who has an acquired deficit of audiovisual speech integration, characterized by a perceived temporal mismatch between speech sounds and the sight of moving lips. AWF was less accurate on an auditory digit span task with vision of a speaker's face as compared to a condition in which no visual information from…
What Iconic Gesture Fragments Reveal about Gesture-Speech Integration: When Synchrony Is Lost, Memory Can Help

ERIC Educational Resources Information Center

Obermeier, Christian; Holle, Henning; Gunter, Thomas C.

2011-01-01

The present series of experiments explores several issues related to gesture-speech integration and synchrony during sentence processing. To be able to more precisely manipulate gesture-speech synchrony, we used gesture fragments instead of complete gestures, thereby avoiding the usual long temporal overlap of gestures with their coexpressive…
Temporal Context in Speech Processing and Attentional Stream Selection: A Behavioral and Neural Perspective

ERIC Educational Resources Information Center

Golumbic, Elana M. Zion; Poeppel, David; Schroeder, Charles E.

2012-01-01

The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the "Cocktail Party" effect. Yet, the neural mechanisms underlying on-line…
Perception of Audio-Visual Speech Synchrony in Spanish-Speaking Children with and without Specific Language Impairment

ERIC Educational Resources Information Center

Pons, Ferran; Andreu, Llorenc; Sanz-Torrent, Monica; Buil-Legaz, Lucia; Lewkowicz, David J.

2013-01-01

Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the…
Speech Perception with Music Maskers by Cochlear Implant Users and Normal-Hearing Listeners

ERIC Educational Resources Information Center

Eskridge, Elizabeth N.; Galvin, John J., III; Aronoff, Justin M.; Li, Tianhao; Fu, Qian-Jie

2012-01-01

Purpose: The goal of this study was to investigate how the spectral and temporal properties in background music may interfere with cochlear implant (CI) and normal-hearing listeners' (NH) speech understanding. Method: Speech-recognition thresholds (SRTs) were adaptively measured in 11 CI and 9 NH subjects. CI subjects were tested while using their…

Changes in Speech Chunking in Reading Aloud is a marker of Mild Cognitive Impairment and Mild-to-Moderate Alzheimer's Disease.

PubMed

De Looze, Celine; Kelly, Finnian; Crosby, Lisa; Vourdanou, Aisling; Coen, Robert F; Walsh, Cathal; Lawlor, Brian A; Reilly, Richard B

2018-04-04

Speech and language impairments, generally attributed to lexico-semantic deficits, have been documented in Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD). This study investigates the temporal organisation of speech (reflective of speech production planning) in reading aloud in relation to cognitive impairment, particularly working memory and attention deficits in MCI and AD. The discriminative ability of temporal features extracted from a newly designed read speech task is also evaluated for the detection of MCI and AD. Sixteen patients with MCI, eighteen patients with mild-to-moderate AD and thirty-six healthy controls (HC) underwent a battery of neuropsychological tests and read a set of sentences varying in cognitive load, probed by manipulating sentence length and syntactic complexity. Our results show that mild-to-moderate AD is associated with a general slowness of speech, attributed to a higher number of speech chunks, silent pauses and dysfluences, and slower speech and articulation rates. Speech chunking in the context of high cognitive-linguistic demand appears to be an informative marker of MCI, specifically related to early deficits in working memory and attention. In addition, Linear Discriminant Analysis shows the ROC AUCs (Areas Under the Receiver Operating Characteristic Curves) of identifying MCI vs. HC, MCI vs. AD and AD vs HC using these speech characteristics are 0.75, 0.90 and 0.94 respectively. The implementation of connected speech-based technologies in clinical and community settings may provide additional information for the early detection of MCI and AD. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Schizophrenia affects speech-induced functional connectivity of the superior temporal gyrus under cocktail-party listening conditions.

PubMed

Li, Juanhua; Wu, Chao; Zheng, Yingjun; Li, Ruikeng; Li, Xuanzi; She, Shenglin; Wu, Haibo; Peng, Hongjun; Ning, Yuping; Li, Liang

2017-09-17

The superior temporal gyrus (STG) is involved in speech recognition against informational masking under cocktail-party-listening conditions. Compared to healthy listeners, people with schizophrenia perform worse in speech recognition under informational speech-on-speech masking conditions. It is not clear whether the schizophrenia-related vulnerability to informational masking is associated with certain changes in FC of the STG with some critical brain regions. Using sparse-sampling fMRI design, this study investigated the differences between people with schizophrenia and healthy controls in FC of the STG for target-speech listening against informational speech-on-speech masking, when a listening condition with either perceived spatial separation (PSS, with a spatial release of informational masking) or perceived spatial co-location (PSC, without the spatial release) between target speech and masking speech was introduced. The results showed that in healthy participants, but not participants with schizophrenia, the contrast of either the PSS or PSC condition against the masker-only condition induced an enhancement of functional connectivity (FC) of the STG with the left superior parietal lobule and the right precuneus. Compared to healthy participants, participants with schizophrenia showed declined FC of the STG with the bilateral precuneus, right SPL, and right supplementary motor area. Thus, FC of the STG with the parietal areas is normally involved in speech listening against informational masking under either the PSS or PSC conditions, and declined FC of the STG in people with schizophrenia with the parietal areas may be associated with the increased vulnerability to informational masking. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.
A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography

PubMed Central

Ozker, Muge; Schepers, Inga M.; Magnotti, John F.; Yoshor, Daniel; Beauchamp, Michael S.

2017-01-01

Human speech can be comprehended using only auditory information from the talker’s voice. However, comprehension is improved if the talker’s face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl’s gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech. PMID:28253074
Melodic Contour Identification and Music Perception by Cochlear Implant Users

PubMed Central

Galvin, John J.; Fu, Qian-Jie; Shannon, Robert V.

2013-01-01

Research and outcomes with cochlear implants (CIs) have revealed a dichotomy in the cues necessary for speech and music recognition. CI devices typically transmit 16–22 spectral channels, each modulated slowly in time. This coarse representation provides enough information to support speech understanding in quiet and rhythmic perception in music, but not enough to support speech understanding in noise or melody recognition. Melody recognition requires some capacity for complex pitch perception, which in turn depends strongly on access to spectral fine structure cues. Thus, temporal envelope cues are adequate for speech perception under optimal listening conditions, while spectral fine structure cues are needed for music perception. In this paper, we present recent experiments that directly measure CI users’ melodic pitch perception using a melodic contour identification (MCI) task. While normal-hearing (NH) listeners’ performance was consistently high across experiments, MCI performance was highly variable across CI users. CI users’ MCI performance was significantly affected by instrument timbre, as well as by the presence of a competing instrument. In general, CI users had great difficulty extracting melodic pitch from complex stimuli. However, musically-experienced CI users often performed as well as NH listeners, and MCI training in less experienced subjects greatly improved performance. With fixed constraints on spectral resolution, such as it occurs with hearing loss or an auditory prosthesis, training and experience can provide a considerable improvements in music perception and appreciation. PMID:19673835
The contribution of the cerebellum to speech production and speech perception: clinical and functional imaging data.

PubMed

Ackermann, Hermann; Mathiak, Klaus; Riecker, Axel

2007-01-01

A classical tenet of clinical neurology proposes that cerebellar disorders may give rise to speech motor disorders (ataxic dysarthria), but spare perceptual and cognitive aspects of verbal communication. During the past two decades, however, a variety of higher-order deficits of speech production, e.g., more or less exclusive agrammatism, amnesic or transcortical motor aphasia, have been noted in patients with vascular cerebellar lesions, and transient mutism following resection of posterior fossa tumors in children may develop into similar constellations. Perfusion studies provided evidence for cerebello-cerebral diaschisis as a possible pathomechanism in these instances. Tight functional connectivity between the language-dominant frontal lobe and the contralateral cerebellar hemisphere represents a prerequisite of such long-distance effects. Recent functional imaging data point at a contribution of the right cerebellar hemisphere, concomitant with language-dominant dorsolateral and medial frontal areas, to the temporal organization of a prearticulatory verbal code ('inner speech'), in terms of the sequencing of syllable strings at a speaker's habitual speech rate. Besides motor control, this network also appears to be engaged in executive functions, e.g., subvocal rehearsal mechanisms of verbal working memory, and seems to be recruited during distinct speech perception tasks. Taken together, thus, a prearticulatory verbal code bound to reciprocal right cerebellar/left frontal interactions might represent a common platform for a variety of cerebellar engagements in cognitive functions. The distinct computational operation provided by cerebellar structures within this framework appears to be the concatenation of syllable strings into coarticulated sequences.
Speech acquisition predicts regions of enhanced cortical response to auditory stimulation in autism spectrum individuals.

PubMed

Samson, F; Zeffiro, T A; Doyon, J; Benali, H; Mottron, L

2015-09-01

A continuum of phenotypes makes up the autism spectrum (AS). In particular, individuals show large differences in language acquisition, ranging from precocious speech to severe speech onset delay. However, the neurological origin of this heterogeneity remains unknown. Here, we sought to determine whether AS individuals differing in speech acquisition show different cortical responses to auditory stimulation and morphometric brain differences. Whole-brain activity following exposure to non-social sounds was investigated. Individuals in the AS were classified according to the presence or absence of Speech Onset Delay (AS-SOD and AS-NoSOD, respectively) and were compared with IQ-matched typically developing individuals (TYP). AS-NoSOD participants displayed greater task-related activity than TYP in the inferior frontal gyrus and peri-auditory middle and superior temporal gyri, which are associated with language processing. Conversely, the AS-SOD group only showed enhanced activity in the vicinity of the auditory cortex. We detected no differences in brain structure between groups. This is the first study to demonstrate the existence of differences in functional brain activity between AS individuals divided according to their pattern of speech development. These findings support the Trigger-threshold-target model and indicate that the occurrence of speech onset delay in AS individuals depends on the location of cortical functional reallocation, which favors perception in AS-SOD and language in AS-NoSOD. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Neural sensitivity to statistical regularities as a fundamental biological process that underlies auditory learning: the role of musical practice.

PubMed

François, Clément; Schön, Daniele

2014-02-01

There is increasing evidence that humans and other nonhuman mammals are sensitive to the statistical structure of auditory input. Indeed, neural sensitivity to statistical regularities seems to be a fundamental biological property underlying auditory learning. In the case of speech, statistical regularities play a crucial role in the acquisition of several linguistic features, from phonotactic to more complex rules such as morphosyntactic rules. Interestingly, a similar sensitivity has been shown with non-speech streams: sequences of sounds changing in frequency or timbre can be segmented on the sole basis of conditional probabilities between adjacent sounds. We recently ran a set of cross-sectional and longitudinal experiments showing that merging music and speech information in song facilitates stream segmentation and, further, that musical practice enhances sensitivity to statistical regularities in speech at both neural and behavioral levels. Based on recent findings showing the involvement of a fronto-temporal network in speech segmentation, we defend the idea that enhanced auditory learning observed in musicians originates via at least three distinct pathways: enhanced low-level auditory processing, enhanced phono-articulatory mapping via the left Inferior Frontal Gyrus and Pre-Motor cortex and increased functional connectivity within the audio-motor network. Finally, we discuss how these data predict a beneficial use of music for optimizing speech acquisition in both normal and impaired populations. Copyright © 2013 Elsevier B.V. All rights reserved.
fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex.

PubMed

van Atteveldt, Nienke M; Blau, Vera C; Blomert, Leo; Goebel, Rainer

2010-02-02

Efficient multisensory integration is of vital importance for adequate interaction with the environment. In addition to basic binding cues like temporal and spatial coherence, meaningful multisensory information is also bound together by content-based associations. Many functional Magnetic Resonance Imaging (fMRI) studies propose the (posterior) superior temporal cortex (STC) as the key structure for integrating meaningful multisensory information. However, a still unanswered question is how superior temporal cortex encodes content-based associations, especially in light of inconsistent results from studies comparing brain activation to semantically matching (congruent) versus nonmatching (incongruent) multisensory inputs. Here, we used fMR-adaptation (fMR-A) in order to circumvent potential problems with standard fMRI approaches, including spatial averaging and amplitude saturation confounds. We presented repetitions of audiovisual stimuli (letter-speech sound pairs) and manipulated the associative relation between the auditory and visual inputs (congruent/incongruent pairs). We predicted that if multisensory neuronal populations exist in STC and encode audiovisual content relatedness, adaptation should be affected by the manipulated audiovisual relation. The results revealed an occipital-temporal network that adapted independently of the audiovisual relation. Interestingly, several smaller clusters distributed over superior temporal cortex within that network, adapted stronger to congruent than to incongruent audiovisual repetitions, indicating sensitivity to content congruency. These results suggest that the revealed clusters contain multisensory neuronal populations that encode content relatedness by selectively responding to congruent audiovisual inputs, since unisensory neuronal populations are assumed to be insensitive to the audiovisual relation. These findings extend our previously revealed mechanism for the integration of letters and speech sounds and demonstrate that fMR-A is sensitive to multisensory congruency effects that may not be revealed in BOLD amplitude per se.
Speech perception in noise with a harmonic complex excited vocoder.

PubMed

Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Ihlefeld, Antje; Litovsky, Ruth Y

2014-04-01

A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.
Identification of emotional intonation evaluated by fMRI.

PubMed

Wildgruber, D; Riecker, A; Hertrich, I; Erb, M; Grodd, W; Ethofer, T; Ackermann, H

2005-02-15

During acoustic communication among human beings, emotional information can be expressed both by the propositional content of verbal utterances and by the modulation of speech melody (affective prosody). It is well established that linguistic processing is bound predominantly to the left hemisphere of the brain. By contrast, the encoding of emotional intonation has been assumed to depend specifically upon right-sided cerebral structures. However, prior clinical and functional imaging studies yielded discrepant data with respect to interhemispheric lateralization and intrahemispheric localization of brain regions contributing to processing of affective prosody. In order to delineate the cerebral network engaged in the perception of emotional tone, functional magnetic resonance imaging (fMRI) was performed during recognition of prosodic expressions of five different basic emotions (happy, sad, angry, fearful, and disgusted) and during phonetic monitoring of the same stimuli. As compared to baseline at rest, both tasks yielded widespread bilateral hemodynamic responses within frontal, temporal, and parietal areas, the thalamus, and the cerebellum. A comparison of the respective activation maps, however, revealed comprehension of affective prosody to be bound to a distinct right-hemisphere pattern of activation, encompassing posterior superior temporal sulcus (Brodmann Area [BA] 22), dorsolateral (BA 44/45), and orbitobasal (BA 47) frontal areas. Activation within left-sided speech areas, in contrast, was observed during the phonetic task. These findings indicate that partially distinct cerebral networks subserve processing of phonetic and intonational information during speech perception.
Human Frequency Following Response: Neural Representation of Envelope and Temporal Fine Structure in Listeners with Normal Hearing and Sensorineural Hearing Loss

PubMed Central

Ananthakrishnan, Saradha; Krishnan, Ananthanarayan; Bartlett, Edward

2015-01-01

Objective Listeners with sensorineural hearing loss (SNHL) typically experience reduced speech perception, which is not completely restored with amplification. This likely occurs because cochlear damage, in addition to elevating audiometric thresholds, alters the neural representation of speech transmitted to higher centers along the auditory neuroaxis. While the deleterious effects of SNHL on speech perception in humans have been well-documented using behavioral paradigms, our understanding of the neural correlates underlying these perceptual deficits remains limited. Using the scalp-recorded Frequency Following Response (FFR), the authors examine the effects of SNHL and aging on subcortical neural representation of acoustic features important for pitch and speech perception, namely the periodicity envelope (F0) and temporal fine structure (TFS) (formant structure), as reflected in the phase-locked neural activity generating the FFR. Design FFRs were obtained from 10 listeners with normal hearing (NH) and 9 listeners with mild-moderate SNHL in response to a steady-state English back vowel /u/ presented at multiple intensity levels. Use of multiple presentation levels facilitated comparisons at equal sound pressure level (SPL) and equal sensation level (SL). In a second follow-up experiment to address the effect of age on envelope and TFS representation, FFRs were obtained from 25 NH and 19 listeners with mild to moderately-severe SNHL to the same vowel stimulus presented at 80 dB SPL. Temporal waveforms, Fast Fourier Transform (FFT) and spectrograms were used to evaluate the magnitude of the phase-locked activity at F0 (periodicity envelope) and F1 (TFS). Results Neural representation of both envelope (F0) and TFS (F1) at equal SPLs was stronger in NH listeners compared to listeners with SNHL. Also, comparison of neural representation of F0 and F1 across stimulus levels expressed in SPL and SL (accounting for audibility) revealed that level-related changes in F0 and F1 magnitude were different for listeners with SNHL compared to listeners with normal hearing. Further, the degradation in subcortical neural representation was observed to persist in listeners with SNHL even when the effects of age were controlled for. Conclusions Overall, our results suggest a relatively greater degradation in the neural representation of TFS compared to periodicity envelope in individuals with SNHL. This degraded neural representation of TFS in SNHL, as reflected in the brainstem FFR, may reflect a disruption in the temporal pattern of phase-locked neural activity arising from altered tonotopic maps and/or wider filters causing poor frequency selectivity in these listeners. Lastly, while preliminary results indicate that the deleterious effects of SNHL may be greater than age-related degradation in subcortical neural representation, the lack of a balanced age-matched control group in this study does not permit us to completely rule out the effects of age on subcortical neural representation. PMID:26583482
Prediction and constraint in audiovisual speech perception.

PubMed

Peelle, Jonathan E; Sommers, Mitchell S

2015-07-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
A fast and flexible MRI system for the study of dynamic vocal tract shaping.

PubMed

Lingala, Sajan Goud; Zhu, Yinghua; Kim, Yoon-Chul; Toutios, Asterios; Narayanan, Shrikanth; Nayak, Krishna S

2017-01-01

The aim of this work was to develop and evaluate an MRI-based system for study of dynamic vocal tract shaping during speech production, which provides high spatial and temporal resolution. The proposed system utilizes (a) custom eight-channel upper airway coils that have high sensitivity to upper airway regions of interest, (b) two-dimensional golden angle spiral gradient echo acquisition, (c) on-the-fly view-sharing reconstruction, and (d) off-line temporal finite difference constrained reconstruction. The system also provides simultaneous noise-cancelled and temporally aligned audio. The system is evaluated in 3 healthy volunteers, and 1 tongue cancer patient, with a broad range of speech tasks. We report spatiotemporal resolutions of 2.4 × 2.4 mm 2 every 12 ms for single-slice imaging, and 2.4 × 2.4 mm 2 every 36 ms for three-slice imaging, which reflects roughly 7-fold acceleration over Nyquist sampling. This system demonstrates improved temporal fidelity in capturing rapid vocal tract shaping for tasks, such as producing consonant clusters in speech, and beat-boxing sounds. Novel acoustic-articulatory analysis was also demonstrated. A synergistic combination of custom coils, spiral acquisitions, and constrained reconstruction enables visualization of rapid speech with high spatiotemporal resolution in multiple planes. Magn Reson Med 77:112-125, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Auditory-Motor Rhythms and Speech Processing in French and German Listeners

PubMed Central

Falk, Simone; Volpi-Moncorger, Chloé; Dalla Bella, Simone

2017-01-01

Moving to a speech rhythm can enhance verbal processing in the listener by increasing temporal expectancies (Falk and Dalla Bella, 2016). Here we tested whether this hypothesis holds for prosodically diverse languages such as German (a lexical stress-language) and French (a non-stress language). Moreover, we examined the relation between motor performance and the benefits for verbal processing as a function of language. Sixty-four participants, 32 German and 32 French native speakers detected subtle word changes in accented positions in metrically structured sentences to which they previously tapped with their index finger. Before each sentence, they were cued by a metronome to tap either congruently (i.e., to accented syllables) or incongruently (i.e., to non-accented parts) to the following speech stimulus. Both French and German speakers detected words better when cued to tap congruently compared to incongruent tapping. Detection performance was predicted by participants' motor performance in the non-verbal cueing phase. Moreover, tapping rate while participants tapped to speech predicted detection differently for the two language groups, in particular in the incongruent tapping condition. We discuss our findings in light of the rhythmic differences of both languages and with respect to recent theories of expectancy-driven and multisensory speech processing. PMID:28443036
Phonological Working Memory for Words and Nonwords in Cerebral Cortex.

PubMed

Perrachione, Tyler K; Ghosh, Satrajit S; Ostrovskaya, Irina; Gabrieli, John D E; Kovelman, Ioulia

2017-07-12

The primary purpose of this study was to identify the brain bases of phonological working memory (the short-term maintenance of speech sounds) using behavioral tasks analogous to clinically sensitive assessments of nonword repetition. The secondary purpose of the study was to identify how individual differences in brain activation were related to participants' nonword repetition abilities. We used functional magnetic resonance imaging to measure neurophysiological response during a nonword discrimination task derived from standard clinical assessments of phonological working memory. Healthy adult control participants (N = 16) discriminated pairs of real words or nonwords under varying phonological working memory load, which we manipulated by parametrically varying the number of syllables in target (non)words. Participants' cognitive and phonological abilities were also measured using standardized assessments. Neurophysiological responses in bilateral superior temporal gyrus, inferior frontal gyrus, and supplementary motor area increased with greater phonological working memory load. Activation in left superior temporal gyrus during nonword discrimination correlated with participants' performance on standard clinical nonword repetition tests. These results suggest that phonological working memory is related to the function of cortical structures that canonically underlie speech perception and production.
Structural white matter asymmetries in relation to functional asymmetries during speech perception and production.

PubMed

Ocklenburg, Sebastian; Hugdahl, Kenneth; Westerhausen, René

2013-12-01

Functional hemispheric asymmetries of speech production and perception are a key feature of the human language system, but their neurophysiological basis is still poorly understood. Using a combined fMRI and tract-based spatial statistics approach, we investigated the relation of microstructural asymmetries in language-relevant white matter pathways and functional activation asymmetries during silent verb generation and passive listening to spoken words. Tract-based spatial statistics revealed several leftward asymmetric clusters in the arcuate fasciculus and uncinate fasciculus that were differentially related to activation asymmetries in the two functional tasks. Frontal and temporal activation asymmetries during silent verb generation were positively related to the strength of specific microstructural white matter asymmetries in the arcuate fasciculus. In contrast, microstructural uncinate fasciculus asymmetries were related to temporal activation asymmetries during passive listening. These findings suggest that white matter asymmetries may indeed be one of the factors underlying functional hemispheric asymmetries. Moreover, they also show that specific localized white matter asymmetries might be of greater relevance for functional activation asymmetries than microstructural features of whole pathways. © 2013.
Abnormal laughter-like vocalisations replacing speech in primary progressive aphasia

PubMed Central

Rohrer, Jonathan D.; Warren, Jason D.; Rossor, Martin N.

2009-01-01

We describe ten patients with a clinical diagnosis of primary progressive aphasia (PPA) (pathologically confirmed in three cases) who developed abnormal laughter-like vocalisations in the context of progressive speech output impairment leading to mutism. Failure of speech output was accompanied by increasing frequency of the abnormal vocalisations until ultimately they constituted the patient's only extended utterance. The laughter-like vocalisations did not show contextual sensitivity but occurred as an automatic vocal output that replaced speech. Acoustic analysis of the vocalisations in two patients revealed abnormal motor features including variable note duration and inter-note interval, loss of temporal symmetry of laugh notes and loss of the normal decrescendo. Abnormal laughter-like vocalisations may be a hallmark of a subgroup in the PPA spectrum with impaired control and production of nonverbal vocal behaviour due to disruption of fronto-temporal networks mediating vocalisation. PMID:19435636
Abnormal laughter-like vocalisations replacing speech in primary progressive aphasia.

PubMed

Rohrer, Jonathan D; Warren, Jason D; Rossor, Martin N

2009-09-15

We describe ten patients with a clinical diagnosis of primary progressive aphasia (PPA) (pathologically confirmed in three cases) who developed abnormal laughter-like vocalisations in the context of progressive speech output impairment leading to mutism. Failure of speech output was accompanied by increasing frequency of the abnormal vocalisations until ultimately they constituted the patient's only extended utterance. The laughter-like vocalisations did not show contextual sensitivity but occurred as an automatic vocal output that replaced speech. Acoustic analysis of the vocalisations in two patients revealed abnormal motor features including variable note duration and inter-note interval, loss of temporal symmetry of laugh notes and loss of the normal decrescendo. Abnormal laughter-like vocalisations may be a hallmark of a subgroup in the PPA spectrum with impaired control and production of nonverbal vocal behaviour due to disruption of fronto-temporal networks mediating vocalisation.
Electrophysiological evidence for a self-processing advantage during audiovisual speech integration.

PubMed

Treille, Avril; Vilain, Coriandre; Kandel, Sonia; Sato, Marc

2017-09-01

Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.
Intracranial mapping of auditory perception: event-related responses and electrocortical stimulation.

PubMed

Sinai, A; Crone, N E; Wied, H M; Franaszczuk, P J; Miglioretti, D; Boatman-Reich, D

2009-01-01

We compared intracranial recordings of auditory event-related responses with electrocortical stimulation mapping (ESM) to determine their functional relationship. Intracranial recordings and ESM were performed, using speech and tones, in adult epilepsy patients with subdural electrodes implanted over lateral left cortex. Evoked N1 responses and induced spectral power changes were obtained by trial averaging and time-frequency analysis. ESM impaired perception and comprehension of speech, not tones, at electrode sites in the posterior temporal lobe. There was high spatial concordance between ESM sites critical for speech perception and the largest spectral power (100% concordance) and N1 (83%) responses to speech. N1 responses showed good sensitivity (0.75) and specificity (0.82), but poor positive predictive value (0.32). Conversely, increased high-frequency power (>60Hz) showed high specificity (0.98), but poorer sensitivity (0.67) and positive predictive value (0.67). Stimulus-related differences were observed in the spatial-temporal patterns of event-related responses. Intracranial auditory event-related responses to speech were associated with cortical sites critical for auditory perception and comprehension of speech. These results suggest that the distribution and magnitude of intracranial auditory event-related responses to speech reflect the functional significance of the underlying cortical regions and may be useful for pre-surgical functional mapping.

Intracranial mapping of auditory perception: Event-related responses and electrocortical stimulation

PubMed Central

Sinai, A.; Crone, N.E.; Wied, H.M.; Franaszczuk, P.J.; Miglioretti, D.; Boatman-Reich, D.

2010-01-01

Objective We compared intracranial recordings of auditory event-related responses with electrocortical stimulation mapping (ESM) to determine their functional relationship. Methods Intracranial recordings and ESM were performed, using speech and tones, in adult epilepsy patients with subdural electrodes implanted over lateral left cortex. Evoked N1 responses and induced spectral power changes were obtained by trial averaging and time-frequency analysis. Results ESM impaired perception and comprehension of speech, not tones, at electrode sites in the posterior temporal lobe. There was high spatial concordance between ESM sites critical for speech perception and the largest spectral power (100% concordance) and N1 (83%) responses to speech. N1 responses showed good sensitivity (0.75) and specificity (0.82), but poor positive predictive value (0.32). Conversely, increased high-frequency power (>60 Hz) showed high specificity (0.98), but poorer sensitivity (0.67) and positive predictive value (0.67). Stimulus-related differences were observed in the spatial-temporal patterns of event-related responses. Conclusions Intracranial auditory event-related responses to speech were associated with cortical sites critical for auditory perception and comprehension of speech. Significance These results suggest that the distribution and magnitude of intracranial auditory event-related responses to speech reflect the functional significance of the underlying cortical regions and may be useful for pre-surgical functional mapping. PMID:19070540
A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation

NASA Astrophysics Data System (ADS)

Wu, Bo; Yang, Minglei; Li, Kehuang; Huang, Zhen; Siniscalchi, Sabato Marco; Wang, Tong; Lee, Chin-Hui

2017-12-01

A reverberation-time-aware deep-neural-network (DNN)-based multi-channel speech dereverberation framework is proposed to handle a wide range of reverberation times (RT60s). There are three key steps in designing a robust system. First, to accomplish simultaneous speech dereverberation and beamforming, we propose a framework, namely DNNSpatial, by selectively concatenating log-power spectral (LPS) input features of reverberant speech from multiple microphones in an array and map them into the expected output LPS features of anechoic reference speech based on a single deep neural network (DNN). Next, the temporal auto-correlation function of received signals at different RT60s is investigated to show that RT60-dependent temporal-spatial contexts in feature selection are needed in the DNNSpatial training stage in order to optimize the system performance in diverse reverberant environments. Finally, the RT60 is estimated to select the proper temporal and spatial contexts before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. The experimental evidence gathered in this study indicates that the proposed framework outperforms the state-of-the-art signal processing dereverberation algorithm weighted prediction error (WPE) and conventional DNNSpatial systems without taking the reverberation time into account, even for extremely weak and severe reverberant conditions. The proposed technique generalizes well to unseen room size, array geometry and loudspeaker position, and is robust to reverberation time estimation error.
A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.

PubMed

Carlin, Michael A; Elhilali, Mounya

2015-12-01

One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectrotemporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.
Hand and mouth: Cortical correlates of lexical processing in British Sign Language and speechreading English

PubMed Central

Capek, Cheryl M.; Waters, Dafydd; Woll, Bencie; MacSweeney, Mairéad; Brammer, Michael J.; McGuire, Philip K.; David, Anthony S.; Campbell, Ruth

2012-01-01

Spoken languages use one set of articulators – the vocal tract, whereas signed languages use multiple articulators, including both manual and facial actions. How sensitive are the cortical circuits for language processing to the particular articulators that are observed? This question can only be addressed with participants who use both speech and a signed language. In this study, we used fMRI to compare the processing of speechreading and sign processing in deaf native signers of British Sign Language (BSL) who were also proficient speechreaders. The following questions were addressed: To what extent do these different language types rely on a common brain network? To what extent do the patterns of activation differ? How are these networks affected by the articulators that languages use? Common perisylvian regions were activated both for speechreading English words and for BSL signs. Distinctive activation was also observed reflecting the language form. Speechreading elicited greater activation in the left mid-superior temporal cortex than BSL, whereas BSL processing generated greater activation at the parieto-occipito-temporal junction in both hemispheres. We probed this distinction further within BSL, where manual signs can be accompanied by different sorts of mouth action. BSL signs with speech-like mouth actions showed greater superior temporal activation, while signs made with non-speech-like mouth actions showed more activation in posterior and inferior temporal regions. Distinct regions within the temporal cortex are not only differentially sensitive to perception of the distinctive articulators for speech and for sign, but also show sensitivity to the different articulators within the (signed) language. PMID:18284353
Electrostimulation mapping of comprehension of auditory and visual words.

PubMed

Roux, Franck-Emmanuel; Miskin, Krasimir; Durand, Jean-Baptiste; Sacko, Oumar; Réhault, Emilie; Tanova, Rositsa; Démonet, Jean-François

2015-10-01

In order to spare functional areas during the removal of brain tumours, electrical stimulation mapping was used in 90 patients (77 in the left hemisphere and 13 in the right; 2754 cortical sites tested). Language functions were studied with a special focus on comprehension of auditory and visual words and the semantic system. In addition to naming, patients were asked to perform pointing tasks from auditory and visual stimuli (using sets of 4 different images controlled for familiarity), and also auditory object (sound recognition) and Token test tasks. Ninety-two auditory comprehension interference sites were observed. We found that the process of auditory comprehension involved a few, fine-grained, sub-centimetre cortical territories. Early stages of speech comprehension seem to relate to two posterior regions in the left superior temporal gyrus. Downstream lexical-semantic speech processing and sound analysis involved 2 pathways, along the anterior part of the left superior temporal gyrus, and posteriorly around the supramarginal and middle temporal gyri. Electrostimulation experimentally dissociated perceptual consciousness attached to speech comprehension. The initial word discrimination process can be considered as an "automatic" stage, the attention feedback not being impaired by stimulation as would be the case at the lexical-semantic stage. Multimodal organization of the superior temporal gyrus was also detected since some neurones could be involved in comprehension of visual material and naming. These findings demonstrate a fine graded, sub-centimetre, cortical representation of speech comprehension processing mainly in the left superior temporal gyrus and are in line with those described in dual stream models of language comprehension processing. Copyright © 2015 Elsevier Ltd. All rights reserved.
Speaker recognition with temporal cues in acoustic and electric hearing

NASA Astrophysics Data System (ADS)

Vongphoe, Michael; Zeng, Fan-Gang

2005-08-01

Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
Rapid, generalized adaptation to asynchronous audiovisual speech

PubMed Central

Van der Burg, Erik; Goodbourn, Patrick T.

2015-01-01

The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity. PMID:25716790
Auditory Processing and Speech Perception in Children with Specific Language Impairment: Relations with Oral Language and Literacy Skills

ERIC Educational Resources Information Center

Vandewalle, Ellen; Boets, Bart; Ghesquiere, Pol; Zink, Inge

2012-01-01

This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay…
The ability of cochlear implant users to use temporal envelope cues recovered from speech frequency modulation.

PubMed

Won, Jong Ho; Lorenzi, Christian; Nie, Kaibao; Li, Xing; Jameyson, Elyse M; Drennan, Ward R; Rubinstein, Jay T

2012-08-01

Previous studies have demonstrated that normal-hearing listeners can understand speech using the recovered "temporal envelopes," i.e., amplitude modulation (AM) cues from frequency modulation (FM). This study evaluated this mechanism in cochlear implant (CI) users for consonant identification. Stimuli containing only FM cues were created using 1, 2, 4, and 8-band FM-vocoders to determine if consonant identification performance would improve as the recovered AM cues become more available. A consistent improvement was observed as the band number decreased from 8 to 1, supporting the hypothesis that (1) the CI sound processor generates recovered AM cues from broadband FM, and (2) CI users can use the recovered AM cues to recognize speech. The correlation between the intact and the recovered AM components at the output of the sound processor was also generally higher when the band number was low, supporting the consonant identification results. Moreover, CI subjects who were better at using recovered AM cues from broadband FM cues showed better identification performance with intact (unprocessed) speech stimuli. This suggests that speech perception performance variability in CI users may be partly caused by differences in their ability to use AM cues recovered from FM speech cues.
Pure word deafness following left temporal damage: Behavioral and neuroanatomical evidence from a new case.

PubMed

Maffei, Chiara; Capasso, Rita; Cazzolli, Giulia; Colosimo, Cesare; Dell'Acqua, Flavio; Piludu, Francesca; Catani, Marco; Miceli, Gabriele

2017-12-01

Pure Word Deafness (PWD) is a rare disorder, characterized by selective loss of speech input processing. Its most common cause is temporal damage to the primary auditory cortex of both hemispheres, but it has been reported also following unilateral lesions. In unilateral cases, PWD has been attributed to the disconnection of Wernicke's area from both right and left primary auditory cortex. Here we report behavioral and neuroimaging evidence from a new case of left unilateral PWD with both cortical and white matter damage due to a relatively small stroke lesion in the left temporal gyrus. Selective impairment in auditory language processing was accompanied by intact processing of nonspeech sounds and normal speech, reading and writing. Performance on dichotic listening was characterized by a reversal of the right-ear advantage typically observed in healthy subjects. Cortical thickness and gyral volume were severely reduced in the left superior temporal gyrus (STG), although abnormalities were not uniformly distributed and residual intact cortical areas were detected, for example in the medial portion of the Heschl's gyrus. Diffusion tractography documented partial damage to the acoustic radiations (AR), callosal temporal connections and intralobar tracts dedicated to single words comprehension. Behavioral and neuroimaging results in this case are difficult to integrate in a pure cortical or disconnection framework, as damage to primary auditory cortex in the left STG was only partial and Wernicke's area was not completely isolated from left or right-hemisphere input. On the basis of our findings we suggest that in this case of PWD, concurrent partial topological (cortical) and disconnection mechanisms have contributed to a selective impairment of speech sounds. The discrepancy between speech and non-speech sounds suggests selective damage to a language-specific left lateralized network involved in phoneme processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Individual differences in selective attention predict speech identification at a cocktail party

PubMed Central

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-01-01

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise. DOI: http://dx.doi.org/10.7554/eLife.16747.001 PMID:27580272
Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal

PubMed Central

2015-01-01

Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and dyslexic readers represent a scaled continuum rather than being caused by a specific deficient component. PMID:25834769
Pure word deafness with auditory object agnosia after bilateral lesion of the superior temporal sulcus.

PubMed

Gutschalk, Alexander; Uppenkamp, Stefan; Riedel, Bernhard; Bartsch, Andreas; Brandt, Tobias; Vogt-Schaden, Marlies

2015-12-01

Based on results from functional imaging, cortex along the superior temporal sulcus (STS) has been suggested to subserve phoneme and pre-lexical speech perception. For vowel classification, both superior temporal plane (STP) and STS areas have been suggested relevant. Lesion of bilateral STS may conversely be expected to cause pure word deafness and possibly also impaired vowel classification. Here we studied a patient with bilateral STS lesions caused by ischemic strokes and relatively intact medial STPs to characterize the behavioral consequences of STS loss. The patient showed severe deficits in auditory speech perception, whereas his speech production was fluent and communication by written speech was grossly intact. Auditory-evoked fields in the STP were within normal limits on both sides, suggesting that major parts of the auditory cortex were functionally intact. Further studies showed that the patient had normal hearing thresholds and only mild disability in tests for telencephalic hearing disorder. Prominent deficits were discovered in an auditory-object classification task, where the patient performed four standard deviations below the control group. In marked contrast, performance in a vowel-classification task was intact. Auditory evoked fields showed enhanced responses for vowels compared to matched non-vowels within normal limits. Our results are consistent with the notion that cortex along STS is important for auditory speech perception, although it does not appear to be entirely speech specific. Formant analysis and single vowel classification, however, appear to be already implemented in auditory cortex on the STP. Copyright © 2015 Elsevier Ltd. All rights reserved.
Perception of Filtered Speech by Children with Developmental Dyslexia and Children with Specific Language Impairments

PubMed Central

Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M.; Barnes, Lisa; Fosker, Tim

2016-01-01

Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed. PMID:27303348
The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

PubMed

Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J

2004-09-01

The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.
At what time is the cocktail party? A late locus of selective attention to natural speech.

PubMed

Power, Alan J; Foxe, John J; Forde, Emma-Jane; Reilly, Richard B; Lalor, Edmund C

2012-05-01

Distinguishing between speakers and focusing attention on one speaker in multi-speaker environments is extremely important in everyday life. Exactly how the brain accomplishes this feat and, in particular, the precise temporal dynamics of this attentional deployment are as yet unknown. A long history of behavioral research using dichotic listening paradigms has debated whether selective attention to speech operates at an early stage of processing based on the physical characteristics of the stimulus or at a later stage during semantic processing. With its poor temporal resolution fMRI has contributed little to the debate, while EEG-ERP paradigms have been hampered by the need to average the EEG in response to discrete stimuli which are superimposed onto ongoing speech. This presents a number of problems, foremost among which is that early attention effects in the form of endogenously generated potentials can be so temporally broad as to mask later attention effects based on the higher level processing of the speech stream. Here we overcome this issue by utilizing the AESPA (auditory evoked spread spectrum analysis) method which allows us to extract temporally detailed responses to two concurrently presented speech streams in natural cocktail-party-like attentional conditions without the need for superimposed probes. We show attentional effects on exogenous stimulus processing in the 200-220 ms range in the left hemisphere. We discuss these effects within the context of research on auditory scene analysis and in terms of a flexible locus of attention that can be deployed at a particular processing stage depending on the task. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Spectrotemporal Modulation Detection and Speech Perception by Cochlear Implant Users

PubMed Central

Won, Jong Ho; Moon, Il Joon; Jin, Sunhwa; Park, Heesung; Woo, Jihwan; Cho, Yang-Sun; Chung, Won-Ho; Hong, Sung Hwa

2015-01-01

Spectrotemporal modulation (STM) detection performance was examined for cochlear implant (CI) users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz) and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave) were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH) and hearing-impaired (HI) listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test–retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information. PMID:26485715
Speaker Invariance for Phonetic Information: an fMRI Investigation

PubMed Central

Salvata, Caden; Blumstein, Sheila E.; Myers, Emily B.

2012-01-01

The current study explored how listeners map the variable acoustic input onto a common sound structure representation while being able to retain phonetic detail to distinguish among the identity of talkers. An adaptation paradigm was utilized to examine areas which showed an equal neural response (equal release from adaptation) to phonetic change when spoken by the same speaker and when spoken by two different speakers, and insensitivity (failure to show release from adaptation) when the same phonetic input was spoken by a different speaker. Neural areas which showed speaker invariance were located in the anterior portion of the middle superior temporal gyrus bilaterally. These findings provide support for the view that speaker normalization processes allow for the translation of a variable speech input to a common abstract sound structure. That this process appears to occur early in the processing stream, recruiting temporal structures, suggests that this mapping takes place prelexically, before sound structure input is mapped on to lexical representations. PMID:23264714
Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech

NASA Astrophysics Data System (ADS)

Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.

1996-01-01

A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.
Focal versus distributed temporal cortex activity for speech sound category assignment

PubMed Central

Bouton, Sophie; Chambon, Valérian; Tyrand, Rémi; Seeck, Margitta; Karkar, Sami; van de Ville, Dimitri; Giraud, Anne-Lise

2018-01-01

Percepts and words can be decoded from distributed neural activity measures. However, the existence of widespread representations might conflict with the more classical notions of hierarchical processing and efficient coding, which are especially relevant in speech processing. Using fMRI and magnetoencephalography during syllable identification, we show that sensory and decisional activity colocalize to a restricted part of the posterior superior temporal gyrus (pSTG). Next, using intracortical recordings, we demonstrate that early and focal neural activity in this region distinguishes correct from incorrect decisions and can be machine-decoded to classify syllables. Crucially, significant machine decoding was possible from neuronal activity sampled across different regions of the temporal and frontal lobes, despite weak or absent sensory or decision-related responses. These findings show that speech-sound categorization relies on an efficient readout of focal pSTG neural activity, while more distributed activity patterns, although classifiable by machine learning, instead reflect collateral processes of sensory perception and decision. PMID:29363598

Predictions of Speech Chimaera Intelligibility Using Auditory Nerve Mean-Rate and Spike-Timing Neural Cues.

PubMed

Wirtzfeld, Michael R; Ibrahim, Rasha A; Bruce, Ian C

2017-10-01

Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.
Perceptually relevant speech tracking in auditory and motor cortex reflects distinct linguistic features

PubMed Central

Gross, Joachim; Kayser, Christoph

2018-01-01

During online speech processing, our brain tracks the acoustic fluctuations in speech at different timescales. Previous research has focused on generic timescales (for example, delta or theta bands) that are assumed to map onto linguistic features such as prosody or syllables. However, given the high intersubject variability in speaking patterns, such a generic association between the timescales of brain activity and speech properties can be ambiguous. Here, we analyse speech tracking in source-localised magnetoencephalographic data by directly focusing on timescales extracted from statistical regularities in our speech material. This revealed widespread significant tracking at the timescales of phrases (0.6–1.3 Hz), words (1.8–3 Hz), syllables (2.8–4.8 Hz), and phonemes (8–12.4 Hz). Importantly, when examining its perceptual relevance, we found stronger tracking for correctly comprehended trials in the left premotor (PM) cortex at the phrasal scale as well as in left middle temporal cortex at the word scale. Control analyses using generic bands confirmed that these effects were specific to the speech regularities in our stimuli. Furthermore, we found that the phase at the phrasal timescale coupled to power at beta frequency (13–30 Hz) in motor areas. This cross-frequency coupling presumably reflects top-down temporal prediction in ongoing speech perception. Together, our results reveal specific functional and perceptually relevant roles of distinct tracking and cross-frequency processes along the auditory–motor pathway. PMID:29529019
[Speech perception with electric-acoustic stimulation : Comparison with bilateral cochlear implant users in different noise conditions].

PubMed

Rader, T

2015-02-01

Cochlear implantation with the aim of hearing preservation for combined electric-acoustic stimulation (EAS) is the therapy of choice for patients with residual low-frequency hearing. Preserved residual acoustic hearing has a positive effect on speech intelligibility in difficult noise conditions. The goal of this study was to assess speech reception thresholds in various complex noise conditions for patients with EAS in comparison with patients using bilateral cochlear implants (CI). Speech perception in noise was measured for bilateral CI and EAS patient groups. A total of 22 listeners with normal hearing served as a control group. Speech reception thresholds (SRT) were measured using a closed-set sentence matrix test. Speech was presented with a single source in frontal position; noise was presented in frontal position or in a multisource noise field (MSNF) consisting of a four-loudspeaker array with independent noise sources. Modulated speech-simulating noise and pseudocontinuous noise served respectively as interference signal with different temporal characteristics. The average SRTs in the EAS group were significantly better in all test conditions than those of the group with bilateral CI. Both user groups showed significant improvement in the MSNF condition compared with the frontal noise condition as a result of bilateral interaction. The normal-hearing control group was able to use short temporal gaps in modulated noise to improve speech perception in noise (gap listening). This effect was absent in both implanted user groups. Patients with combined EAS in one ear and a hearing aid in the contralateral ear show significantly improved speech perception in complex noise conditions compared with bilateral CI recipients.
Voice gender identification by cochlear implant users: The role of spectral and temporal resolution

NASA Astrophysics Data System (ADS)

Fu, Qian-Jie; Chinchilla, Sherol; Nogaki, Geraldine; Galvin, John J.

2005-09-01

The present study explored the relative contributions of spectral and temporal information to voice gender identification by cochlear implant users and normal-hearing subjects. Cochlear implant listeners were tested using their everyday speech processors, while normal-hearing subjects were tested under speech processing conditions that simulated various degrees of spectral resolution, temporal resolution, and spectral mismatch. Voice gender identification was tested for two talker sets. In Talker Set 1, the mean fundamental frequency values of the male and female talkers differed by 100 Hz while in Talker Set 2, the mean values differed by 10 Hz. Cochlear implant listeners achieved higher levels of performance with Talker Set 1, while performance was significantly reduced for Talker Set 2. For normal-hearing listeners, performance was significantly affected by the spectral resolution, for both Talker Sets. With matched speech, temporal cues contributed to voice gender identification only for Talker Set 1 while spectral mismatch significantly reduced performance for both Talker Sets. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to 4-8 spectral channels. The results suggest that, because of the reduced spectral resolution, cochlear implant patients may attend strongly to periodicity cues to distinguish voice gender.
The Influence of Environmental Sound Training on the Perception of Spectrally Degraded Speech and Environmental Sounds

PubMed Central

Sheft, Stanley; Gygi, Brian; Ho, Kim Thien N.

2012-01-01

Perceptual training with spectrally degraded environmental sounds results in improved environmental sound identification, with benefits shown to extend to untrained speech perception as well. The present study extended those findings to examine longer-term training effects as well as effects of mere repeated exposure to sounds over time. Participants received two pretests (1 week apart) prior to a week-long environmental sound training regimen, which was followed by two posttest sessions, separated by another week without training. Spectrally degraded stimuli, processed with a four-channel vocoder, consisted of a 160-item environmental sound test, word and sentence tests, and a battery of basic auditory abilities and cognitive tests. Results indicated significant improvements in all speech and environmental sound scores between the initial pretest and the last posttest with performance increments following both exposure and training. For environmental sounds (the stimulus class that was trained), the magnitude of positive change that accompanied training was much greater than that due to exposure alone, with improvement for untrained sounds roughly comparable to the speech benefit from exposure. Additional tests of auditory and cognitive abilities showed that speech and environmental sound performance were differentially correlated with tests of spectral and temporal-fine-structure processing, whereas working memory and executive function were correlated with speech, but not environmental sound perception. These findings indicate generalizability of environmental sound training and provide a basis for implementing environmental sound training programs for cochlear implant (CI) patients. PMID:22891070
Dissociating speech perception and comprehension at reduced levels of awareness

PubMed Central

Davis, Matthew H.; Coleman, Martin R.; Absalom, Anthony R.; Rodd, Jennifer M.; Johnsrude, Ingrid S.; Matta, Basil F.; Owen, Adrian M.; Menon, David K.

2007-01-01

We used functional MRI and the anesthetic agent propofol to assess the relationship among neural responses to speech, successful comprehension, and conscious awareness. Volunteers were scanned while listening to sentences containing ambiguous words, matched sentences without ambiguous words, and signal-correlated noise (SCN). During three scanning sessions, participants were nonsedated (awake), lightly sedated (a slowed response to conversation), and deeply sedated (no conversational response, rousable by loud command). Bilateral temporal-lobe responses for sentences compared with signal-correlated noise were observed at all three levels of sedation, although prefrontal and premotor responses to speech were absent at the deepest level of sedation. Additional inferior frontal and posterior temporal responses to ambiguous sentences provide a neural correlate of semantic processes critical for comprehending sentences containing ambiguous words. However, this additional response was absent during light sedation, suggesting a marked impairment of sentence comprehension. A significant decline in postscan recognition memory for sentences also suggests that sedation impaired encoding of sentences into memory, with left inferior frontal and temporal lobe responses during light sedation predicting subsequent recognition memory. These findings suggest a graded degradation of cognitive function in response to sedation such that “higher-level” semantic and mnemonic processes can be impaired at relatively low levels of sedation, whereas perceptual processing of speech remains resilient even during deep sedation. These results have important implications for understanding the relationship between speech comprehension and awareness in the healthy brain in patients receiving sedation and in patients with disorders of consciousness. PMID:17938125
Auditory-motor interaction revealed by fMRI: speech, music, and working memory in area Spt.

PubMed

Hickok, Gregory; Buchsbaum, Bradley; Humphries, Colin; Muftuler, Tugan

2003-07-01

The concept of auditory-motor interaction pervades speech science research, yet the cortical systems supporting this interface have not been elucidated. Drawing on experimental designs used in recent work in sensory-motor integration in the cortical visual system, we used fMRI in an effort to identify human auditory regions with both sensory and motor response properties, analogous to single-unit responses in known visuomotor integration areas. The sensory phase of the task involved listening to speech (nonsense sentences) or music (novel piano melodies); the "motor" phase of the task involved covert rehearsal/humming of the auditory stimuli. A small set of areas in the superior temporal and temporal-parietal cortex responded both during the listening phase and the rehearsal/humming phase. A left lateralized region in the posterior Sylvian fissure at the parietal-temporal boundary, area Spt, showed particularly robust responses to both phases of the task. Frontal areas also showed combined auditory + rehearsal responsivity consistent with the claim that the posterior activations are part of a larger auditory-motor integration circuit. We hypothesize that this circuit plays an important role in speech development as part of the network that enables acoustic-phonetic input to guide the acquisition of language-specific articulatory-phonetic gestures; this circuit may play a role in analogous musical abilities. In the adult, this system continues to support aspects of speech production, and, we suggest, supports verbal working memory.
Is the Sensorimotor Cortex Relevant for Speech Perception and Understanding? An Integrative Review

PubMed Central

Schomers, Malte R.; Pulvermüller, Friedemann

2016-01-01

In the neuroscience of language, phonemes are frequently described as multimodal units whose neuronal representations are distributed across perisylvian cortical regions, including auditory and sensorimotor areas. A different position views phonemes primarily as acoustic entities with posterior temporal localization, which are functionally independent from frontoparietal articulatory programs. To address this current controversy, we here discuss experimental results from functional magnetic resonance imaging (fMRI) as well as transcranial magnetic stimulation (TMS) studies. On first glance, a mixed picture emerges, with earlier research documenting neurofunctional distinctions between phonemes in both temporal and frontoparietal sensorimotor systems, but some recent work seemingly failing to replicate the latter. Detailed analysis of methodological differences between studies reveals that the way experiments are set up explains whether sensorimotor cortex maps phonological information during speech perception or not. In particular, acoustic noise during the experiment and ‘motor noise’ caused by button press tasks work against the frontoparietal manifestation of phonemes. We highlight recent studies using sparse imaging and passive speech perception tasks along with multivariate pattern analysis (MVPA) and especially representational similarity analysis (RSA), which succeeded in separating acoustic-phonological from general-acoustic processes and in mapping specific phonological information on temporal and frontoparietal regions. The question about a causal role of sensorimotor cortex on speech perception and understanding is addressed by reviewing recent TMS studies. We conclude that frontoparietal cortices, including ventral motor and somatosensory areas, reflect phonological information during speech perception and exert a causal influence on language understanding. PMID:27708566
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

PubMed Central

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca’s aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca’s aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment. PMID:23250889
On pure word deafness, temporal processing, and the left hemisphere.

PubMed

Stefanatos, Gerry A; Gershkoff, Arthur; Madigan, Sean

2005-07-01

Pure word deafness (PWD) is a rare neurological syndrome characterized by severe difficulties in understanding and reproducing spoken language, with sparing of written language comprehension and speech production. The pathognomonic disturbance of auditory comprehension appears to be associated with a breakdown in processes involved in mapping auditory input to lexical representations of words, but the functional locus of this disturbance and the localization of the responsible lesion have long been disputed. We report here on a woman with PWD resulting from a circumscribed unilateral infarct involving the left superior temporal lobe who demonstrated significant problems processing transitional spectrotemporal cues in both speech and nonspeech sounds. On speech discrimination tasks, she exhibited poor differentiation of stop consonant-vowel syllables distinguished by voicing onset and brief formant frequency transitions. Isolated formant transitions could be reliably discriminated only at very long durations (> 200 ms). By contrast, click fusion threshold, which depends on millisecond-level resolution of brief auditory events, was normal. These results suggest that the problems with speech analysis in this case were not secondary to general constraints on auditory temporal resolution. Rather, they point to a disturbance of left hemisphere auditory mechanisms that preferentially analyze rapid spectrotemporal variations in frequency. The findings have important implications for our conceptualization of PWD and its subtypes.
Rapid, generalized adaptation to asynchronous audiovisual speech.

PubMed

Van der Burg, Erik; Goodbourn, Patrick T

2015-04-07

The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Electrophysiological study of the basal temporal language area: a convergence zone between language perception and production networks.

PubMed

Trébuchon-Da Fonseca, Agnès; Bénar, Christian-G; Bartoloméi, Fabrice; Régis, Jean; Démonet, Jean-François; Chauvel, Patrick; Liégeois-Chauvel, Catherine

2009-03-01

Regions involved in language processing have been observed in the inferior part of the left temporal lobe. Although collectively labelled 'the Basal Temporal Language Area' (BTLA), these territories are functionally heterogeneous and are involved in language perception (i.e. reading or semantic task) or language production (speech arrest after stimulation). The objective of this study was to clarify the role of BTLA in the language network in an epileptic patient who displayed jargonaphasia. Intracerebral evoked related potentials to verbal and non-verbal stimuli in auditory and visual modalities were recorded from BTLA. Time-frequency analysis was performed during ictal events. Evoked potentials and induced gamma-band activity provided direct evidence that BTLA is sensitive to language stimuli in both modalities, 350 ms after stimulation. In addition, spontaneous gamma-band discharges were recorded from this region during which we observed phonological jargon. The findings emphasize the multimodal nature of this region in speech perception. In the context of transient dysfunction, the patient's lexical semantic processing network is disrupted, reducing spoken output to meaningless phoneme combinations. This rare opportunity to study the BTLA "in vivo" demonstrates its pivotal role in lexico-semantic processing for speech production and its multimodal nature in speech perception.
Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.

PubMed

Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger

2011-01-01

The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.
Studies in automatic speech recognition and its application in aerospace

NASA Astrophysics Data System (ADS)

Taylor, Michael Robinson

Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.
Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality.

PubMed

Kates, James M; Arehart, Kathryn H

2015-10-01

This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships.
A proposed mechanism for rapid adaptation to spectrally distorted speech.

PubMed

Azadpour, Mahan; Balaban, Evan

2015-07-01

The mechanisms underlying perceptual adaptation to severely spectrally-distorted speech were studied by training participants to comprehend spectrally-rotated speech, which is obtained by inverting the speech spectrum. Spectral-rotation produces severe distortion confined to the spectral domain while preserving temporal trajectories. During five 1-hour training sessions, pairs of participants attempted to extract spoken messages from the spectrally-rotated speech of their training partner. Data on training-induced changes in comprehension of spectrally-rotated sentences and identification/discrimination of spectrally-rotated phonemes were used to evaluate the plausibility of three different classes of underlying perceptual mechanisms: (1) phonemic remapping (the formation of new phonemic categories that specifically incorporate spectrally-rotated acoustic information); (2) experience-dependent generation of a perceptual "inverse-transform" that compensates for spectral-rotation; and (3) changes in cue weighting (the identification of sets of acoustic cues least affected by spectral-rotation, followed by a rapid shift in perceptual emphasis to favour those cues, combined with the recruitment of the same type of "perceptual filling-in" mechanisms used to disambiguate speech-in-noise). Results exclusively support the third mechanism, which is the only one predicting that learning would specifically target temporally-dynamic cues that were transmitting phonetic information most stably in spite of spectral-distortion. No support was found for phonemic remapping or for inverse-transform generation.
The ability of cochlear implant users to use temporal envelope cues recovered from speech frequency modulationa

PubMed Central

Won, Jong Ho; Lorenzi, Christian; Nie, Kaibao; Li, Xing; Jameyson, Elyse M.; Drennan, Ward R.; Rubinstein, Jay T.

2012-01-01

Previous studies have demonstrated that normal-hearing listeners can understand speech using the recovered “temporal envelopes,” i.e., amplitude modulation (AM) cues from frequency modulation (FM). This study evaluated this mechanism in cochlear implant (CI) users for consonant identification. Stimuli containing only FM cues were created using 1, 2, 4, and 8-band FM-vocoders to determine if consonant identification performance would improve as the recovered AM cues become more available. A consistent improvement was observed as the band number decreased from 8 to 1, supporting the hypothesis that (1) the CI sound processor generates recovered AM cues from broadband FM, and (2) CI users can use the recovered AM cues to recognize speech. The correlation between the intact and the recovered AM components at the output of the sound processor was also generally higher when the band number was low, supporting the consonant identification results. Moreover, CI subjects who were better at using recovered AM cues from broadband FM cues showed better identification performance with intact (unprocessed) speech stimuli. This suggests that speech perception performance variability in CI users may be partly caused by differences in their ability to use AM cues recovered from FM speech cues. PMID:22894230
Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality

PubMed Central

Kates, James M.; Arehart, Kathryn H.

2015-01-01

This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships. PMID:26520329
Brain 'talks over' boring quotes: top-down activation of voice-selective areas while listening to monotonous direct speech quotations.

PubMed

Yao, Bo; Belin, Pascal; Scheepers, Christoph

2012-04-15

In human communication, direct speech (e.g., Mary said, "I'm hungry") is perceived as more vivid than indirect speech (e.g., Mary said that she was hungry). This vividness distinction has previously been found to underlie silent reading of quotations: Using functional magnetic resonance imaging (fMRI), we found that direct speech elicited higher brain activity in the temporal voice areas (TVA) of the auditory cortex than indirect speech, consistent with an "inner voice" experience in reading direct speech. Here we show that listening to monotonously spoken direct versus indirect speech quotations also engenders differential TVA activity. This suggests that individuals engage in top-down simulations or imagery of enriched supra-segmental acoustic representations while listening to monotonous direct speech. The findings shed new light on the acoustic nature of the "inner voice" in understanding direct speech. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Through a glass darkly: some insights on change talk via magnetoencephalography.

PubMed

Houck, Jon M; Moyers, Theresa B; Tesche, Claudia D

2013-06-01

Motivational interviewing (MI) is a directive, client-centered therapeutic method employed in the treatment of substance abuse, with strong evidence of effectiveness. To date, the sole mechanism of action in MI with any consistent empirical support is "change talk" (CT), which is generally defined as client within-session speech in support of a behavior change. "Sustain talk" (ST) incorporates speech in support of the status quo. MI maintains that during treatment, clients essentially talk themselves into change. Multiple studies have now supported this theory, linking within-session speech to substance use outcomes. Although a causal chain has been established linking therapist behavior, client CT, and substance use outcome, the neural substrate of CT has been largely uncharted. We addressed this gap by measuring neural responses to clients' own CT using magnetoencephalography (MEG), a noninvasive neuroimaging technique with excellent spatial and temporal resolution. Following a recorded MI session, MEG was used to measure brain activity while participants heard multiple repetitions of their CT and ST utterances from that session, intermingled and presented in a random order. Results suggest that CT processing occurs in a right-hemisphere network that includes the inferior frontal gyrus, insula, and superior temporal cortex. These results support a representation of CT at the neural level, consistent with the role of these structures in self-perception. This suggests that during treatment sessions, clinicians who are able to evoke this special kind of language are tapping into neural circuitry that may be essential to behavior change. 2013 APA, all rights reserved

Dissociation of quantifiers and object nouns in speech in focal neurodegenerative disease.

PubMed

Ash, Sharon; Ternes, Kylie; Bisbing, Teagan; Min, Nam Eun; Moran, Eileen; York, Collin; McMillan, Corey T; Irwin, David J; Grossman, Murray

2016-08-01

Quantifiers such as many and some are thought to depend in part on the conceptual representation of number knowledge, while object nouns such as cookie and boy appear to depend in part on visual feature knowledge associated with object concepts. Further, number knowledge is associated with a frontal-parietal network while object knowledge is related in part to anterior and ventral portions of the temporal lobe. We examined the cognitive and anatomic basis for the spontaneous speech production of quantifiers and object nouns in non-aphasic patients with focal neurodegenerative disease associated with corticobasal syndrome (CBS, n=33), behavioral variant frontotemporal degeneration (bvFTD, n=54), and semantic variant primary progressive aphasia (svPPA, n=19). We recorded a semi-structured speech sample elicited from patients and healthy seniors (n=27) during description of the Cookie Theft scene. We observed a dissociation: CBS and bvFTD were significantly impaired in the production of quantifiers but not object nouns, while svPPA were significantly impaired in the production of object nouns but not quantifiers. MRI analysis revealed that quantifier production deficits in CBS and bvFTD were associated with disease in a frontal-parietal network important for number knowledge, while impaired production of object nouns in all patient groups was related to disease in inferior temporal regions important for representations of visual feature knowledge of objects. These findings imply that partially dissociable representations in semantic memory may underlie different segments of the lexicon. Copyright © 2016 Elsevier Ltd. All rights reserved.
Dynamic speech representations in the human temporal lobe.

PubMed

Leonard, Matthew K; Chang, Edward F

2014-09-01

Speech perception requires rapid integration of acoustic input with context-dependent knowledge. Recent methodological advances have allowed researchers to identify underlying information representations in primary and secondary auditory cortex and to examine how context modulates these representations. We review recent studies that focus on contextual modulations of neural activity in the superior temporal gyrus (STG), a major hub for spectrotemporal encoding. Recent findings suggest a highly interactive flow of information processing through the auditory ventral stream, including influences of higher-level linguistic and metalinguistic knowledge, even within individual areas. Such mechanisms may give rise to more abstract representations, such as those for words. We discuss the importance of characterizing representations of context-dependent and dynamic patterns of neural activity in the approach to speech perception research. Copyright © 2014 Elsevier Ltd. All rights reserved.
From the analysis of verbal data to the analysis of organizations: organizing as a dialogical process.

PubMed

Lorino, Philippe

2014-12-01

The analysis of conversational turn-taking and its implications on time (the speaker cannot completely anticipate the future effects of her/his speech) and sociality (the speech is co-produced by the various speakers rather than by the speaking individual) can provide a useful basis to analyze complex organizing processes and collective action: the actor cannot completely anticipate the future effects of her/his acts and the act is co-produced by multiple actors. This translation from verbal to broader classes of interaction stresses the performativity of speeches, the importance of the situation, the role of semiotic mediations to make temporally and spatially distant "ghosts" present in the dialog, and the dissymmetrical relationship between successive conversational turns, due to temporal irreversibility.
Dual routes for verbal repetition: articulation-based and acoustic-phonetic codes for pseudoword and word repetition, respectively.

PubMed

Yoo, Sejin; Chung, Jun-Young; Jeon, Hyeon-Ae; Lee, Kyoung-Min; Kim, Young-Bo; Cho, Zang-Hee

2012-07-01

Speech production is inextricably linked to speech perception, yet they are usually investigated in isolation. In this study, we employed a verbal-repetition task to identify the neural substrates of speech processing with two ends active simultaneously using functional MRI. Subjects verbally repeated auditory stimuli containing an ambiguous vowel sound that could be perceived as either a word or a pseudoword depending on the interpretation of the vowel. We found verbal repetition commonly activated the audition-articulation interface bilaterally at Sylvian fissures and superior temporal sulci. Contrasting word-versus-pseudoword trials revealed neural activities unique to word repetition in the left posterior middle temporal areas and activities unique to pseudoword repetition in the left inferior frontal gyrus. These findings imply that the tasks are carried out using different speech codes: an articulation-based code of pseudowords and an acoustic-phonetic code of words. It also supports the dual-stream model and imitative learning of vocabulary. Copyright © 2012 Elsevier Inc. All rights reserved.
Speech-evoked Brainstem Auditory Responses and Auditory Processing Skills: A Correlation in Adults with Hearing Loss

PubMed Central

Sanguebuche, Taissane Rodrigues; Peixe, Bruna Pias; Bruno, Rúbia Soares; Biaggio, Eliara Pinto Vieira; Garcia, Michele Vargas

2018-01-01

Introduction The auditory system consists of sensory structures and central connections. The evaluation of the auditory pathway at a central level can be performed through behavioral and electrophysiological tests, because they are complementary to each other and provide important information about comprehension. Objective To correlate the findings of speech brainstem-evoked response audiometry with the behavioral tests Random Gap Detection Test and Masking Level Difference in adults with hearing loss. Methods All patients were submitted to a basic audiological evaluation, to the aforementioned behavioral tests, and to an electrophysiological assessment, by means of click-evoked and speech-evoked brainstem response audiometry. Results There were no statistically significant values among the electrophysiological test and the behavioral tests. However, there was a significant correlation between the V and A waves, as well as the D and F waves, of the speech-evoked brainstem response audiometry peaks. Such correlations are positive, indicating that the increase of a variable implies an increase in another and vice versa. Conclusion It was possible to correlate the findings of the speech-evoked brainstem response audiometry with those of the behavioral tests Random Gap Detection and Masking Level Difference. However, there was no statistically significant correlation between them. This shows that the electrophysiological evaluation does not depend uniquely on the behavioral skills of temporal resolution and selective attention. PMID:29379574
Structural connectivity of right frontal hyperactive areas scales with stuttering severity.

PubMed

Neef, Nicole E; Anwander, Alfred; Bütfering, Christoph; Schmidt-Samoa, Carsten; Friederici, Angela D; Paulus, Walter; Sommer, Martin

2018-01-01

A neuronal sign of persistent developmental stuttering is the magnified coactivation of right frontal brain regions during speech production. Whether and how stuttering severity relates to the connection strength of these hyperactive right frontal areas to other brain areas is an open question. Scrutinizing such brain-behaviour and structure-function relationships aims at disentangling suspected underlying neuronal mechanisms of stuttering. Here, we acquired diffusion-weighted and functional images from 31 adults who stutter and 34 matched control participants. Using a newly developed structural connectivity measure, we calculated voxel-wise correlations between connection strength and stuttering severity within tract volumes that originated from functionally hyperactive right frontal regions. Correlation analyses revealed that with increasing speech motor deficits the connection strength increased in the right frontal aslant tract, the right anterior thalamic radiation, and in U-shaped projections underneath the right precentral sulcus. In contrast, with decreasing speech motor deficits connection strength increased in the right uncinate fasciculus. Additional group comparisons of whole-brain white matter skeletons replicated the previously reported reduction of fractional anisotropy in the left and right superior longitudinal fasciculus as well as at the junction of right frontal aslant tract and right superior longitudinal fasciculus in adults who stutter compared to control participants. Overall, our investigation suggests that right fronto-temporal networks play a compensatory role as a fluency enhancing mechanism. In contrast, the increased connection strength within subcortical-cortical pathways may be implied in an overly active global response suppression mechanism in stuttering. Altogether, this combined functional MRI-diffusion tensor imaging study disentangles different networks involved in the neuronal underpinnings of the speech motor deficit in persistent developmental stuttering. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain.
Alpha Oscillatory Dynamics Index Temporal Expectation Benefits in Working Memory.

PubMed

Wilsch, Anna; Henry, Molly J; Herrmann, Björn; Maess, Burkhard; Obleser, Jonas

2015-07-01

Enhanced alpha power compared with a baseline can reflect states of increased cognitive load, for example, when listening to speech in noise. Can knowledge about "when" to listen (temporal expectations) potentially counteract cognitive load and concomitantly reduce alpha? The current magnetoencephalography (MEG) experiment induced cognitive load using an auditory delayed-matching-to-sample task with 2 syllables S1 and S2 presented in speech-shaped noise. Temporal expectation about the occurrence of S1 was manipulated in 3 different cue conditions: "Neutral" (uninformative about foreperiod), "early-cued" (short foreperiod), and "late-cued" (long foreperiod). Alpha power throughout the trial was highest when the cue was uninformative about the onset time of S1 (neutral) and lowest for the late-cued condition. This alpha-reducing effect of late compared with neutral cues was most evident during memory retention in noise and originated primarily in the right insula. Moreover, individual alpha effects during retention accounted best for observed individual performance differences between late-cued and neutral conditions, indicating a tradeoff between allocation of neural resources and the benefits drawn from temporal cues. Overall, the results indicate that temporal expectations can facilitate the encoding of speech in noise, and concomitantly reduce neural markers of cognitive load. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Effects of spectral and temporal disruption on cortical encoding of gerbil vocalizations

PubMed Central

Ter-Mikaelian, Maria; Semple, Malcolm N.

2013-01-01

Animal communication sounds contain spectrotemporal fluctuations that provide powerful cues for detection and discrimination. Human perception of speech is influenced both by spectral and temporal acoustic features but is most critically dependent on envelope information. To investigate the neural coding principles underlying the perception of communication sounds, we explored the effect of disrupting the spectral or temporal content of five different gerbil call types on neural responses in the awake gerbil's primary auditory cortex (AI). The vocalizations were impoverished spectrally by reduction to 4 or 16 channels of band-passed noise. For this acoustic manipulation, an average firing rate of the neuron did not carry sufficient information to distinguish between call types. In contrast, the discharge patterns of individual AI neurons reliably categorized vocalizations composed of only four spectral bands with the appropriate natural token. The pooled responses of small populations of AI cells classified spectrally disrupted and natural calls with an accuracy that paralleled human performance on an analogous speech task. To assess whether discharge pattern was robust to temporal perturbations of an individual call, vocalizations were disrupted by time-reversing segments of variable duration. For this acoustic manipulation, cortical neurons were relatively insensitive to short reversal lengths. Consistent with human perception of speech, these results indicate that the stable representation of communication sounds in AI is more dependent on sensitivity to slow temporal envelopes than on spectral detail. PMID:23761696
The word processing deficit in semantic dementia: all categories are equal, but some categories are more equal than others.

PubMed

Pulvermüller, Friedemann; Cooper-Pye, Elisa; Dine, Clare; Hauk, Olaf; Nestor, Peter J; Patterson, Karalyn

2010-09-01

It has been claimed that semantic dementia (SD), the temporal variant of fronto-temporal dementia, is characterized by an across-the-board deficit affecting all types of conceptual knowledge. We here confirm this generalized deficit but also report differential degrees of impairment in processing specific semantic word categories in a case series of SD patients (N = 11). Within the domain of words with strong visually grounded meaning, the patients' lexical decision accuracy was more impaired for color-related than for form-related words. Likewise, within the domain of action verbs, the patients' performance was worse for words referring to face movements and speech acts than for words semantically linked to actions performed with the hand and arm. Psycholinguistic properties were matched between the stimulus groups entering these contrasts; an explanation for the differential degrees of impairment must therefore involve semantic features of the words in the different conditions. Furthermore, this specific pattern of deficits cannot be captured by classic category distinctions such as nouns versus verbs or living versus nonliving things. Evidence from previous neuroimaging research indicates that color- and face/speech-related words, respectively, draw most heavily on anterior-temporal and inferior-frontal areas, the structures most affected in SD. Our account combines (a) the notion of an anterior-temporal amodal semantic "hub" to explain the profound across-the-board deficit in SD word processing, with (b) a semantic topography model of category-specific circuits whose cortical distributions reflect semantic features of the words and concepts represented.
Auditory cortical activation and plasticity after cochlear implantation measured by PET using fluorodeoxyglucose.

PubMed

Łukaszewicz-Moszyńska, Zuzanna; Lachowska, Magdalena; Niemczyk, Kazimierz

2014-01-01

The purpose of this study was to evaluate possible relationships between duration of cochlear implant use and results of positron emission tomography (PET) measurements in the temporal lobes performed while subjects listened to speech stimuli. Other aspects investigated were whether implantation side impacts significantly on cortical representations of functions related to understanding speech (ipsi- or contralateral to the implanted side) and whether any correlation exists between cortical activation and speech therapy results. Objective cortical responses to acoustic stimulation were measured, using PET, in nine cochlear implant patients (age range: 15 to 50 years). All the patients suffered from bilateral deafness, were right-handed, and had no additional neurological deficits. They underwent PET imaging three times: immediately after the first fitting of the speech processor (activation of the cochlear implant), and one and two years later. A tendency towards increasing levels of activation in areas of the primary and secondary auditory cortex on the left side of the brain was observed. There was no clear effect of the side of implantation (left or right) on the degree of cortical activation in the temporal lobe. However, the PET results showed a correlation between degree of cortical activation and speech therapy results.
Auditory cortical activation and plasticity after cochlear implantation measured by PET using fluorodeoxyglucose

PubMed Central

Łukaszewicz-Moszyńska, Zuzanna; Lachowska, Magdalena; Niemczyk, Kazimierz

2014-01-01

Summary The purpose of this study was to evaluate possible relationships between duration of cochlear implant use and results of positron emission tomography (PET) measurements in the temporal lobes performed while subjects listened to speech stimuli. Other aspects investigated were whether implantation side impacts significantly on cortical representations of functions related to understanding speech (ipsi- or contralateral to the implanted side) and whether any correlation exists between cortical activation and speech therapy results. Objective cortical responses to acoustic stimulation were measured, using PET, in nine cochlear implant patients (age range: 15 to 50 years). All the patients suffered from bilateral deafness, were right-handed, and had no additional neurological deficits. They underwent PET imaging three times: immediately after the first fitting of the speech processor (activation of the cochlear implant), and one and two years later. A tendency towards increasing levels of activation in areas of the primary and secondary auditory cortex on the left side of the brain was observed. There was no clear effect of the side of implantation (left or right) on the degree of cortical activation in the temporal lobe. However, the PET results showed a correlation between degree of cortical activation and speech therapy results. PMID:25306122
Oral Articulatory Control in Childhood Apraxia of Speech

PubMed Central

Moss, Aviva; Lu, Ying

2015-01-01

Purpose The purpose of this research was to examine spatial and temporal aspects of articulatory control in children with childhood apraxia of speech (CAS), children with speech delay characterized by an articulation/phonological impairment (SD), and controls with typical development (TD) during speech tasks that increased in word length. Method The participants included 33 children (11 CAS, 11 SD, and 11 TD) between 3 and 7 years of age. A motion capture system was used to track jaw, lower lip, and upper lip movement during a naming task. Movement duration, velocity, displacement, and variability were measured from accurate word productions. Results Movement variability was significantly higher in the children with CAS compared with participants in the SD and TD groups. Differences in temporal control were seen between both groups of children with speech impairment and the controls with TD during accurate word productions. As word length increased, movement duration and variability differed between the children with CAS and those with SD. Conclusions These findings provide evidence that movement variability distinguishes children with CAS from speakers with SD. Kinematic differences between the participants with CAS and those with SD suggest that these groups respond differently to linguistic challenges. PMID:25951237
Status Report on Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications, January 1-March 31, 1977.

ERIC Educational Resources Information Center

Haskins Labs., New Haven, CT.

This report is one of a regular series about the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. The 11 papers discuss the dissociation of spectral and temporal cues to the voicing distinction in initial stopped consonants; perceptual integration and selective attention in…
Phonological Working Memory for Words and Nonwords in Cerebral Cortex

PubMed Central

Ghosh, Satrajit S.; Ostrovskaya, Irina; Gabrieli, John D. E.; Kovelman, Ioulia

2017-01-01

Purpose The primary purpose of this study was to identify the brain bases of phonological working memory (the short-term maintenance of speech sounds) using behavioral tasks analogous to clinically sensitive assessments of nonword repetition. The secondary purpose of the study was to identify how individual differences in brain activation were related to participants' nonword repetition abilities. Method We used functional magnetic resonance imaging to measure neurophysiological response during a nonword discrimination task derived from standard clinical assessments of phonological working memory. Healthy adult control participants (N = 16) discriminated pairs of real words or nonwords under varying phonological working memory load, which we manipulated by parametrically varying the number of syllables in target (non)words. Participants' cognitive and phonological abilities were also measured using standardized assessments. Results Neurophysiological responses in bilateral superior temporal gyrus, inferior frontal gyrus, and supplementary motor area increased with greater phonological working memory load. Activation in left superior temporal gyrus during nonword discrimination correlated with participants' performance on standard clinical nonword repetition tests. Conclusion These results suggest that phonological working memory is related to the function of cortical structures that canonically underlie speech perception and production. PMID:28631005
Neural source dynamics of brain responses to continuous stimuli: Speech processing from acoustics to comprehension.

PubMed

Brodbeck, Christian; Presacco, Alessandro; Simon, Jonathan Z

2018-05-15

Human experience often involves continuous sensory information that unfolds over time. This is true in particular for speech comprehension, where continuous acoustic signals are processed over seconds or even minutes. We show that brain responses to such continuous stimuli can be investigated in detail, for magnetoencephalography (MEG) data, by combining linear kernel estimation with minimum norm source localization. Previous research has shown that the requirement to average data over many trials can be overcome by modeling the brain response as a linear convolution of the stimulus and a kernel, or response function, and estimating a kernel that predicts the response from the stimulus. However, such analysis has been typically restricted to sensor space. Here we demonstrate that this analysis can also be performed in neural source space. We first computed distributed minimum norm current source estimates for continuous MEG recordings, and then computed response functions for the current estimate at each source element, using the boosting algorithm with cross-validation. Permutation tests can then assess the significance of individual predictor variables, as well as features of the corresponding spatio-temporal response functions. We demonstrate the viability of this technique by computing spatio-temporal response functions for speech stimuli, using predictor variables reflecting acoustic, lexical and semantic processing. Results indicate that processes related to comprehension of continuous speech can be differentiated anatomically as well as temporally: acoustic information engaged auditory cortex at short latencies, followed by responses over the central sulcus and inferior frontal gyrus, possibly related to somatosensory/motor cortex involvement in speech perception; lexical frequency was associated with a left-lateralized response in auditory cortex and subsequent bilateral frontal activity; and semantic composition was associated with bilateral temporal and frontal brain activity. We conclude that this technique can be used to study the neural processing of continuous stimuli in time and anatomical space with the millisecond temporal resolution of MEG. This suggests new avenues for analyzing neural processing of naturalistic stimuli, without the necessity of averaging over artificially short or truncated stimuli. Copyright © 2018 Elsevier Inc. All rights reserved.
Visual activity predicts auditory recovery from deafness after adult cochlear implantation.

PubMed

Strelnikov, Kuzma; Rouger, Julien; Demonet, Jean-François; Lagleyre, Sebastien; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal

2013-12-01

Modern cochlear implantation technologies allow deaf patients to understand auditory speech; however, the implants deliver only a coarse auditory input and patients must use long-term adaptive processes to achieve coherent percepts. In adults with post-lingual deafness, the high progress of speech recovery is observed during the first year after cochlear implantation, but there is a large range of variability in the level of cochlear implant outcomes and the temporal evolution of recovery. It has been proposed that when profoundly deaf subjects receive a cochlear implant, the visual cross-modal reorganization of the brain is deleterious for auditory speech recovery. We tested this hypothesis in post-lingually deaf adults by analysing whether brain activity shortly after implantation correlated with the level of auditory recovery 6 months later. Based on brain activity induced by a speech-processing task, we found strong positive correlations in areas outside the auditory cortex. The highest positive correlations were found in the occipital cortex involved in visual processing, as well as in the posterior-temporal cortex known for audio-visual integration. The other area, which positively correlated with auditory speech recovery, was localized in the left inferior frontal area known for speech processing. Our results demonstrate that the visual modality's functional level is related to the proficiency level of auditory recovery. Based on the positive correlation of visual activity with auditory speech recovery, we suggest that visual modality may facilitate the perception of the word's auditory counterpart in communicative situations. The link demonstrated between visual activity and auditory speech perception indicates that visuoauditory synergy is crucial for cross-modal plasticity and fostering speech-comprehension recovery in adult cochlear-implanted deaf patients.
Individual Sensitivity to Spectral and Temporal Cues in Listeners With Hearing Impairment

PubMed Central

Wright, Richard A.; Blackburn, Michael C.; Tatman, Rachael; Gallun, Frederick J.

2015-01-01

Purpose The present study was designed to evaluate use of spectral and temporal cues under conditions in which both types of cues were available. Method Participants included adults with normal hearing and hearing loss. We focused on 3 categories of speech cues: static spectral (spectral shape), dynamic spectral (formant change), and temporal (amplitude envelope). Spectral and/or temporal dimensions of synthetic speech were systematically manipulated along a continuum, and recognition was measured using the manipulated stimuli. Level was controlled to ensure cue audibility. Discriminant function analysis was used to determine to what degree spectral and temporal information contributed to the identification of each stimulus. Results Listeners with normal hearing were influenced to a greater extent by spectral cues for all stimuli. Listeners with hearing impairment generally utilized spectral cues when the information was static (spectral shape) but used temporal cues when the information was dynamic (formant transition). The relative use of spectral and temporal dimensions varied among individuals, especially among listeners with hearing loss. Conclusion Information about spectral and temporal cue use may aid in identifying listeners who rely to a greater extent on particular acoustic cues and applying that information toward therapeutic interventions. PMID:25629388
Long short-term memory for speaker generalization in supervised speech separation

PubMed Central

Chen, Jitong; Wang, DeLiang

2017-01-01

Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation. PMID:28679261
Auditory temporal processing in patients with temporal lobe epilepsy.

PubMed

Lavasani, Azam Navaei; Mohammadkhani, Ghassem; Motamedi, Mahmoud; Karimi, Leyla Jalilvand; Jalaei, Shohreh; Shojaei, Fereshteh Sadat; Danesh, Ali; Azimi, Hadi

2016-07-01

Auditory temporal processing is the main feature of speech processing ability. Patients with temporal lobe epilepsy, despite their normal hearing sensitivity, may present speech recognition disorders. The present study was carried out to evaluate the auditory temporal processing in patients with unilateral TLE. The present study was carried out on 25 patients with epilepsy: 11 patients with right temporal lobe epilepsy and 14 with left temporal lobe epilepsy with a mean age of 31.1years and 18 control participants with a mean age of 29.4years. The two experimental and control groups were evaluated via gap-in-noise and duration pattern sequence tests. One-way ANOVA was run to analyze the data. The mean of the threshold of the GIN test in the control group was observed to be better than that in participants with LTLE and RTLE. Also, it was observed that the percentage of correct responses on the DPS test in the control group and in participants with RTLE was better than that in participants with LTLE. Patients with TLE have difficulties in temporal processing. Difficulties are more significant in patients with LTLE, likely because the left temporal lobe is specialized for the processing of temporal information. Copyright © 2016 Elsevier Inc. All rights reserved.
Semantic retrieval during overt picture description: Left anterior temporal or the parietal lobe?

PubMed

Geranmayeh, Fatemeh; Leech, Robert; Wise, Richard J S

2015-09-01

Retrieval of semantic representations is a central process during overt speech production. There is an increasing consensus that an amodal semantic 'hub' must exist that draws together modality-specific representations of concepts. Based on the distribution of atrophy and the behavioral deficit of patients with the semantic variant of fronto-temporal lobar degeneration, it has been proposed that this hub is localized within both anterior temporal lobes (ATL), and is functionally connected with verbal 'output' systems via the left ATL. An alternative view, dating from Geschwind's proposal in 1965, is that the angular gyrus (AG) is central to object-based semantic representations. In this fMRI study we examined the connectivity of the left ATL and parietal lobe (PL) with whole brain networks known to be activated during overt picture description. We decomposed each of these two brain volumes into 15 regions of interest (ROIs), using independent component analysis. A dual regression analysis was used to establish the connectivity of each ROI with whole brain-networks. An ROI within the left anterior superior temporal sulcus (antSTS) was functionally connected to other parts of the left ATL, including anterior ventromedial left temporal cortex (partially attenuated by signal loss due to susceptibility artifact), a large left dorsolateral prefrontal region (including 'classic' Broca's area), extensive bilateral sensory-motor cortices, and the length of both superior temporal gyri. The time-course of this functionally connected network was associated with picture description but not with non-semantic baseline tasks. This system has the distribution expected for the production of overt speech with appropriate semantic content, and the auditory monitoring of the overt speech output. In contrast, the only left PL ROI that showed connectivity with brain systems most strongly activated by the picture-description task, was in the superior parietal lobe (supPL). This region showed connectivity with predominantly posterior cortical regions required for the visual processing of the pictorial stimuli, with additional connectivity to the dorsal left AG and a small component of the left inferior frontal gyrus. None of the other PL ROIs that included part of the left AG were activated by Speech alone. The best interpretation of these results is that the left antSTS connects the proposed semantic hub (specifically localized to ventral anterior temporal cortex based on clinical neuropsychological studies) to posterior frontal regions and sensory-motor cortices responsible for the overt production of speech. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

NASA Astrophysics Data System (ADS)

Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

2016-10-01

Objective. The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.
Bilingual Listeners' Perception of Temporally Manipulated English Passages

ERIC Educational Resources Information Center

Shi, Lu-Feng; Farooq, Nadia

2012-01-01

Purpose: The current study measured, objectively and subjectively, how changes in speech rate affect recognition of English passages in bilingual listeners. Method: Ten native monolingual, 20 English-dominant bilingual, and 20 non-English-dominant bilingual listeners repeated target words in English passages at five speech rates (unprocessed, two…
A Comparison of Five FMRI Protocols for Mapping Speech Comprehension Systems

PubMed Central

Binder, Jeffrey R.; Swanson, Sara J.; Hammeke, Thomas A.; Sabsevitz, David S.

2008-01-01

Aims Many fMRI protocols for localizing speech comprehension have been described, but there has been little quantitative comparison of these methods. We compared five such protocols in terms of areas activated, extent of activation, and lateralization. Methods FMRI BOLD signals were measured in 26 healthy adults during passive listening and active tasks using words and tones. Contrasts were designed to identify speech perception and semantic processing systems. Activation extent and lateralization were quantified by counting activated voxels in each hemisphere for each participant. Results Passive listening to words produced bilateral superior temporal activation. After controlling for pre-linguistic auditory processing, only a small area in the left superior temporal sulcus responded selectively to speech. Active tasks engaged an extensive, bilateral attention and executive processing network. Optimal results (consistent activation and strongly lateralized pattern) were obtained by contrasting an active semantic decision task with a tone decision task. There was striking similarity between the network of brain regions activated by the semantic task and the network of brain regions that showed task-induced deactivation, suggesting that semantic processing occurs during the resting state. Conclusions FMRI protocols for mapping speech comprehension systems differ dramatically in pattern, extent, and lateralization of activation. Brain regions involved in semantic processing were identified only when an active, non-linguistic task was used as a baseline, supporting the notion that semantic processing occurs whenever attentional resources are not controlled. Identification of these lexical-semantic regions is particularly important for predicting language outcome in patients undergoing temporal lobe surgery. PMID:18513352
Prereader to beginning reader: changes induced by reading acquisition in print and speech brain networks.

PubMed

Chyl, Katarzyna; Kossowski, Bartosz; Dębska, Agnieszka; Łuniewska, Magdalena; Banaszkiewicz, Anna; Żelechowska, Agata; Frost, Stephen J; Mencl, William Einar; Wypych, Marek; Marchewka, Artur; Pugh, Kenneth R; Jednoróg, Katarzyna

2018-01-01

Literacy acquisition is a demanding process that induces significant changes in the brain, especially in the spoken and written language networks. Nevertheless, large-scale paediatric fMRI studies are still limited. We analyzed fMRI data to show how individual differences in reading performance correlate with brain activation for speech and print in 111 children attending kindergarten or first grade and examined group differences between a matched subset of emergent-readers and prereaders. Across the entire cohort, individual differences analysis revealed that reading skill was positively correlated with the magnitude of activation difference between words and symbol strings in left superior temporal, inferior frontal and fusiform gyri. Group comparisons of the matched subset of pre- and emergent-readers showed higher activity for emergent-readers in left inferior frontal, precentral, and postcentral gyri. Individual differences in activation for natural versus vocoded speech were also positively correlated with reading skill, primarily in the left temporal cortex. However, in contrast to studies on adult illiterates, group comparisons revealed higher activity in prereaders compared to readers in the frontal lobes. Print-speech coactivation was observed only in readers and individual differences analyses revealed a positive correlation between convergence and reading skill in the left superior temporal sulcus. These results emphasise that a child's brain undergoes several modifications to both visual and oral language systems in the process of learning to read. They also suggest that print-speech convergence is a hallmark of acquiring literacy. © 2017 Association for Child and Adolescent Mental Health.
Conjugating time and frequency: hemispheric specialization, acoustic uncertainty, and the mustached bat

PubMed Central

Washington, Stuart D.; Tillinghast, John S.

2015-01-01

A prominent hypothesis of hemispheric specialization for human speech and music states that the left and right auditory cortices (ACs) are respectively specialized for precise calculation of two canonically-conjugate variables: time and frequency. This spectral-temporal asymmetry does not account for sex, brain-volume, or handedness, and is in opposition to closed-system hypotheses that restrict this asymmetry to humans. Mustached bats have smaller brains, but greater ethological pressures to develop such a spectral-temporal asymmetry, than humans. Using the Heisenberg-Gabor Limit (i.e., the mathematical basis of the spectral-temporal asymmetry) to frame mustached bat literature, we show that recent findings in bat AC (1) support the notion that hemispheric specialization for speech and music is based on hemispheric differences in temporal and spectral resolution, (2) discredit closed-system, handedness, and brain-volume theories, (3) underscore the importance of sex differences, and (4) provide new avenues for phonological research. PMID:25926767
Conjugating time and frequency: hemispheric specialization, acoustic uncertainty, and the mustached bat.

PubMed

Washington, Stuart D; Tillinghast, John S

2015-01-01

A prominent hypothesis of hemispheric specialization for human speech and music states that the left and right auditory cortices (ACs) are respectively specialized for precise calculation of two canonically-conjugate variables: time and frequency. This spectral-temporal asymmetry does not account for sex, brain-volume, or handedness, and is in opposition to closed-system hypotheses that restrict this asymmetry to humans. Mustached bats have smaller brains, but greater ethological pressures to develop such a spectral-temporal asymmetry, than humans. Using the Heisenberg-Gabor Limit (i.e., the mathematical basis of the spectral-temporal asymmetry) to frame mustached bat literature, we show that recent findings in bat AC (1) support the notion that hemispheric specialization for speech and music is based on hemispheric differences in temporal and spectral resolution, (2) discredit closed-system, handedness, and brain-volume theories, (3) underscore the importance of sex differences, and (4) provide new avenues for phonological research.
The Representation and Execution of Articulatory Timing in First and Second Language Acquisition.

PubMed

Redford, Melissa A; Oh, Grace E

2017-07-01

The early acquisition of language-specific temporal patterns relative to the late development of speech motor control suggests a dissociation between the representation and execution of articulatory timing. The current study tested for such a dissociation in first and second language acquisition. American English-speaking children (5- and 8-year-olds) and Korean-speaking adult learners of English repeatedly produced real English words in a simple carrier sentence. The words were designed to elicit different language-specific vowel length contrasts. Measures of absolute duration and variability in single vowel productions were extracted to evaluate the realization of contrasts (representation) and to index speech motor abilities (execution). Results were mostly consistent with a dissociation. Native English-speaking children produced the same language-specific temporal patterns as native English-speaking adults, but their productions were more variable than the adults'. In contrast, Korean-speaking adult learners of English typically produced different temporal patterns than native English-speaking adults, but their productions were as stable as the native speakers'. Implications of the results are discussed with reference to different models of speech production.
Nonverbal auditory agnosia with lesion to Wernicke's area.

PubMed

Saygin, Ayse Pinar; Leech, Robert; Dick, Frederic

2010-01-01

We report the case of patient M, who suffered unilateral left posterior temporal and parietal damage, brain regions typically associated with language processing. Language function largely recovered since the infarct, with no measurable speech comprehension impairments. However, the patient exhibited a severe impairment in nonverbal auditory comprehension. We carried out extensive audiological and behavioral testing in order to characterize M's unusual neuropsychological profile. We also examined the patient's and controls' neural responses to verbal and nonverbal auditory stimuli using functional magnetic resonance imaging (fMRI). We verified that the patient exhibited persistent and severe auditory agnosia for nonverbal sounds in the absence of verbal comprehension deficits or peripheral hearing problems. Acoustical analyses suggested that his residual processing of a minority of environmental sounds might rely on his speech processing abilities. In the patient's brain, contralateral (right) temporal cortex as well as perilesional (left) anterior temporal cortex were strongly responsive to verbal, but not to nonverbal sounds, a pattern that stands in marked contrast to the controls' data. This substantial reorganization of auditory processing likely supported the recovery of M's speech processing.
Widening the temporal window: Processing support in the treatment of aphasic language production

PubMed Central

Linebarger, Marcia; McCall, Denise; Virata, Telana; Berndt, Rita Sloan

2007-01-01

Investigations of language processing in aphasia have increasingly implicated performance factors such as slowed activation and/or rapid decay of linguistic information. This approach is supported by studies utilizing a communication system (SentenceShaper™) which functions as a “processing prosthesis.” The system may reduce the impact of processing limitations by allowing repeated refreshing of working memory and by increasing the opportunity for aphasic subjects to monitor their own speech. Some aphasic subjects are able to produce markedly more structured speech on the system than they are able to produce spontaneously, and periods of largely independent home use of SentenceShaper have been linked to treatment effects, that is, to gains in speech produced without the use of the system. The purpose of the current study was to follow up on these studies with a new group of subjects. A second goal was to determine whether repeated, unassisted elicitations of the same narratives at baseline would give rise to practice effects, which could undermine claims for the efficacy of the system. PMID:17069883
Musical rhythm and reading development: does beat processing matter?

PubMed

Ozernov-Palchik, Ola; Patel, Aniruddh D

2018-05-20

There is mounting evidence for links between musical rhythm processing and reading-related cognitive skills, such as phonological awareness. This may be because music and speech are rhythmic: both involve processing complex sound sequences with systematic patterns of timing, accent, and grouping. Yet, there is a salient difference between musical and speech rhythm: musical rhythm is often beat-based (based on an underlying grid of equal time intervals), while speech rhythm is not. Thus, the role of beat-based processing in the reading-rhythm relationship is not clear. Is there is a distinct relation between beat-based processing mechanisms and reading-related language skills, or is the rhythm-reading link entirely due to shared mechanisms for processing nonbeat-based aspects of temporal structure? We discuss recent evidence for a distinct link between beat-based processing and early reading abilities in young children, and suggest experimental designs that would allow one to further methodically investigate this relationship. We propose that beat-based processing taps into a listener's ability to use rich contextual regularities to form predictions, a skill important for reading development. © 2018 New York Academy of Sciences.
It doesn't matter what you say: FMRI correlates of voice learning and recognition independent of speech content.

PubMed

Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal

2017-09-01

Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content. Copyright © 2017 Elsevier Ltd. All rights reserved.
Speech production variability in fricatives of children and adults: Results of functional data analysis

PubMed Central

Koenig, Laura L.; Lucero, Jorge C.; Perlman, Elizabeth

2008-01-01

This study investigates token-to-token variability in fricative production of 5 year olds, 10 year olds, and adults. Previous studies have reported higher intrasubject variability in children than adults, in speech as well as nonspeech tasks, but authors have disagreed on the causes and implications of this finding. The current work assessed the characteristics of age-related variability across articulators (larynx and tongue) as well as in temporal versus spatial domains. Oral airflow signals, which reflect changes in both laryngeal and supralaryngeal apertures, were obtained for multiple productions of ∕h s z∕. The data were processed using functional data analysis, which provides a means of obtaining relatively independent indices of amplitude and temporal (phasing) variability. Consistent with past work, both temporal and amplitude variabilities were higher in children than adults, but the temporal indices were generally less adultlike than the amplitude indices for both groups of children. Quantitative and qualitative analyses showed considerable speaker- and consonant-specific patterns of variability. The data indicate that variability in ∕s∕ may represent laryngeal as well as supralaryngeal control and further that a simple random noise factor, higher in children than in adults, is insufficient to explain developmental differences in speech production variability. PMID:19045800
An eye-tracking paradigm for analyzing the processing time of sentences with different linguistic complexities.

PubMed

Wendt, Dorothea; Brand, Thomas; Kollmeier, Birger

2014-01-01

An eye-tracking paradigm was developed for use in audiology in order to enable online analysis of the speech comprehension process. This paradigm should be useful in assessing impediments in speech processing. In this paradigm, two scenes, a target picture and a competitor picture, were presented simultaneously with an aurally presented sentence that corresponded to the target picture. At the same time, eye fixations were recorded using an eye-tracking device. The effect of linguistic complexity on language processing time was assessed from eye fixation information by systematically varying linguistic complexity. This was achieved with a sentence corpus containing seven German sentence structures. A novel data analysis method computed the average tendency to fixate the target picture as a function of time during sentence processing. This allowed identification of the point in time at which the participant understood the sentence, referred to as the decision moment. Systematic differences in processing time were observed as a function of linguistic complexity. These differences in processing time may be used to assess the efficiency of cognitive processes involved in resolving linguistic complexity. Thus, the proposed method enables a temporal analysis of the speech comprehension process and has potential applications in speech audiology and psychoacoustics.
An Eye-Tracking Paradigm for Analyzing the Processing Time of Sentences with Different Linguistic Complexities

PubMed Central

Wendt, Dorothea; Brand, Thomas; Kollmeier, Birger

2014-01-01

An eye-tracking paradigm was developed for use in audiology in order to enable online analysis of the speech comprehension process. This paradigm should be useful in assessing impediments in speech processing. In this paradigm, two scenes, a target picture and a competitor picture, were presented simultaneously with an aurally presented sentence that corresponded to the target picture. At the same time, eye fixations were recorded using an eye-tracking device. The effect of linguistic complexity on language processing time was assessed from eye fixation information by systematically varying linguistic complexity. This was achieved with a sentence corpus containing seven German sentence structures. A novel data analysis method computed the average tendency to fixate the target picture as a function of time during sentence processing. This allowed identification of the point in time at which the participant understood the sentence, referred to as the decision moment. Systematic differences in processing time were observed as a function of linguistic complexity. These differences in processing time may be used to assess the efficiency of cognitive processes involved in resolving linguistic complexity. Thus, the proposed method enables a temporal analysis of the speech comprehension process and has potential applications in speech audiology and psychoacoustics. PMID:24950184
Language/culture/mind/brain. Progress at the margins between disciplines.

PubMed

Kuhl, P K; Tsao, F M; Liu, H M; Zhang, Y; De Boer, B

2001-05-01

At the forefront of research on language are new data demonstrating infants' strategies in the early acquisition of language. The data show that infants perceptually "map" critical aspects of ambient language in the first year of life before they can speak. Statistical and abstract properties of speech are picked up through exposure to ambient language. Moreover, linguistic experience alters infants' perception of speech, warping perception in a way that enhances native-language speech processing. Infants' strategies are unexpected and unpredicted by historical views. At the same time, research in three additional disciplines is contributing to our understanding of language and its acquisition by children. Cultural anthropologists are demonstrating the universality of adult speech behavior when addressing infants and children across cultures, and this is creating a new view of the role adult speakers play in bringing about language in the child. Neuroscientists, using the techniques of modern brain imaging, are revealing the temporal and structural aspects of language processing by the brain and suggesting new views of the critical period for language. Computer scientists, modeling the computational aspects of childrens' language acquisition, are meeting success using biologically inspired neural networks. Although a consilient view cannot yet be offered, the cross-disciplinary interaction now seen among scientists pursuing one of humans' greatest achievements, language, is quite promising.
A Visual Cortical Network for Deriving Phonological Information from Intelligible Lip Movements.

PubMed

Hauswald, Anne; Lithari, Chrysa; Collignon, Olivier; Leonardelli, Elisa; Weisz, Nathan

2018-05-07

Successful lip-reading requires a mapping from visual to phonological information [1]. Recently, visual and motor cortices have been implicated in tracking lip movements (e.g., [2]). It remains unclear, however, whether visuo-phonological mapping occurs already at the level of the visual cortex-that is, whether this structure tracks the acoustic signal in a functionally relevant manner. To elucidate this, we investigated how the cortex tracks (i.e., entrains to) absent acoustic speech signals carried by silent lip movements. Crucially, we contrasted the entrainment to unheard forward (intelligible) and backward (unintelligible) acoustic speech. We observed that the visual cortex exhibited stronger entrainment to the unheard forward acoustic speech envelope compared to the unheard backward acoustic speech envelope. Supporting the notion of a visuo-phonological mapping process, this forward-backward difference of occipital entrainment was not present for actually observed lip movements. Importantly, the respective occipital region received more top-down input, especially from left premotor, primary motor, and somatosensory regions and, to a lesser extent, also from posterior temporal cortex. Strikingly, across participants, the extent of top-down modulation of the visual cortex stemming from these regions partially correlated with the strength of entrainment to absent acoustic forward speech envelope, but not to present forward lip movements. Our findings demonstrate that a distributed cortical network, including key dorsal stream auditory regions [3-5], influences how the visual cortex shows sensitivity to the intelligibility of speech while tracking silent lip movements. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
High-frame-rate full-vocal-tract 3D dynamic speech imaging.

PubMed

Fu, Maojing; Barlaz, Marissa S; Holtrop, Joseph L; Perry, Jamie L; Kuehn, David P; Shosted, Ryan K; Liang, Zhi-Pei; Sutton, Bradley P

2017-04-01

To achieve high temporal frame rate, high spatial resolution and full-vocal-tract coverage for three-dimensional dynamic speech MRI by using low-rank modeling and sparse sampling. Three-dimensional dynamic speech MRI is enabled by integrating a novel data acquisition strategy and an image reconstruction method with the partial separability model: (a) a self-navigated sparse sampling strategy that accelerates data acquisition by collecting high-nominal-frame-rate cone navigator sand imaging data within a single repetition time, and (b) are construction method that recovers high-quality speech dynamics from sparse (k,t)-space data by enforcing joint low-rank and spatiotemporal total variation constraints. The proposed method has been evaluated through in vivo experiments. A nominal temporal frame rate of 166 frames per second (defined based on a repetition time of 5.99 ms) was achieved for an imaging volume covering the entire vocal tract with a spatial resolution of 2.2 × 2.2 × 5.0 mm 3 . Practical utility of the proposed method was demonstrated via both validation experiments and a phonetics investigation. Three-dimensional dynamic speech imaging is possible with full-vocal-tract coverage, high spatial resolution and high nominal frame rate to provide dynamic speech data useful for phonetic studies. Magn Reson Med 77:1619-1629, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Activations in temporal areas using visual and auditory naming stimuli: A language fMRI study in temporal lobe epilepsy.

PubMed

Gonzálvez, Gloria G; Trimmel, Karin; Haag, Anja; van Graan, Louis A; Koepp, Matthias J; Thompson, Pamela J; Duncan, John S

2016-12-01

Verbal fluency functional MRI (fMRI) is used for predicting language deficits after anterior temporal lobe resection (ATLR) for temporal lobe epilepsy (TLE), but primarily engages frontal lobe areas. In this observational study we investigated fMRI paradigms using visual and auditory stimuli, which predominately involve language areas resected during ATLR. Twenty-three controls and 33 patients (20 left (LTLE), 13 right (RTLE)) were assessed using three fMRI paradigms: verbal fluency, auditory naming with a contrast of auditory reversed speech; picture naming with a contrast of scrambled pictures and blurred faces. Group analysis showed bilateral temporal activations for auditory naming and picture naming. Correcting for auditory and visual input (by subtracting activations resulting from auditory reversed speech and blurred pictures/scrambled faces respectively) resulted in left-lateralised activations for patients and controls, which was more pronounced for LTLE compared to RTLE patients. Individual subject activations at a threshold of T>2.5, extent >10 voxels, showed that verbal fluency activated predominantly the left inferior frontal gyrus (IFG) in 90% of LTLE, 92% of RTLE, and 65% of controls, compared to right IFG activations in only 15% of LTLE and RTLE and 26% of controls. Middle temporal (MTG) or superior temporal gyrus (STG) activations were seen on the left in 30% of LTLE, 23% of RTLE, and 52% of controls, and on the right in 15% of LTLE, 15% of RTLE, and 35% of controls. Auditory naming activated temporal areas more frequently than did verbal fluency (LTLE: 93%/73%; RTLE: 92%/58%; controls: 82%/70% (left/right)). Controlling for auditory input resulted in predominantly left-sided temporal activations. Picture naming resulted in temporal lobe activations less frequently than did auditory naming (LTLE 65%/55%; RTLE 53%/46%; controls 52%/35% (left/right)). Controlling for visual input had left-lateralising effects. Auditory and picture naming activated temporal lobe structures, which are resected during ATLR, more frequently than did verbal fluency. Controlling for auditory and visual input resulted in more left-lateralised activations. We hypothesise that these paradigms may be more predictive of postoperative language decline than verbal fluency fMRI. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Cerebral activation associated with speech sound discrimination during the diotic listening task: an fMRI study.

PubMed

Ikeda, Yumiko; Yahata, Noriaki; Takahashi, Hidehiko; Koeda, Michihiko; Asai, Kunihiko; Okubo, Yoshiro; Suzuki, Hidenori

2010-05-01

Comprehending conversation in a crowd requires appropriate orienting and sustainment of auditory attention to and discrimination of the target speaker. While a multitude of cognitive functions such as voice perception and language processing work in concert to subserve this ability, it is still unclear which cognitive components critically determine successful discrimination of speech sounds under constantly changing auditory conditions. To investigate this, we present a functional magnetic resonance imaging (fMRI) study of changes in cerebral activities associated with varying challenge levels of speech discrimination. Subjects participated in a diotic listening paradigm that presented them with two news stories read simultaneously but independently by a target speaker and a distracting speaker of incongruent or congruent sex. We found that the voice of distracter of congruent rather than incongruent sex made the listening more challenging, resulting in enhanced activities mainly in the left temporal and frontal gyri. Further, the activities at the left inferior, left anterior superior and right superior loci in the temporal gyrus were shown to be significantly correlated with accuracy of the discrimination performance. The present results suggest that the subregions of bilateral temporal gyri play a key role in the successful discrimination of speech under constantly changing auditory conditions as encountered in daily life. 2010 Elsevier Ireland Ltd and the Japan Neuroscience Society. All rights reserved.
Perception of environmental sounds by experienced cochlear implant patients.

PubMed

Shafiro, Valeriy; Gygi, Brian; Cheng, Min-Yu; Vachhani, Jay; Mulvey, Megan

2011-01-01

Environmental sound perception serves an important ecological function by providing listeners with information about objects and events in their immediate environment. Environmental sounds such as car horns, baby cries, or chirping birds can alert listeners to imminent dangers as well as contribute to one's sense of awareness and well being. Perception of environmental sounds as acoustically and semantically complex stimuli may also involve some factors common to the processing of speech. However, very limited research has investigated the abilities of cochlear implant (CI) patients to identify common environmental sounds, despite patients' general enthusiasm about them. This project (1) investigated the ability of patients with modern-day CIs to perceive environmental sounds, (2) explored associations among speech, environmental sounds, and basic auditory abilities, and (3) examined acoustic factors that might be involved in environmental sound perception. Seventeen experienced postlingually deafened CI patients participated in the study. Environmental sound perception was assessed with a large-item test composed of 40 sound sources, each represented by four different tokens. The relationship between speech and environmental sound perception and the role of working memory and some basic auditory abilities were examined based on patient performance on a battery of speech tests (HINT, CNC, and individual consonant and vowel tests), tests of basic auditory abilities (audiometric thresholds, gap detection, temporal pattern, and temporal order for tones tests), and a backward digit recall test. The results indicated substantially reduced ability to identify common environmental sounds in CI patients (45.3%). Except for vowels, all speech test scores significantly correlated with the environmental sound test scores: r = 0.73 for HINT in quiet, r = 0.69 for HINT in noise, r = 0.70 for CNC, r = 0.64 for consonants, and r = 0.48 for vowels. HINT and CNC scores in quiet moderately correlated with the temporal order for tones. However, the correlation between speech and environmental sounds changed little after partialling out the variance due to other variables. Present findings indicate that environmental sound identification is difficult for CI patients. They further suggest that speech and environmental sounds may overlap considerably in their perceptual processing. Certain spectrotemproral processing abilities are separately associated with speech and environmental sound performance. However, they do not appear to mediate the relationship between speech and environmental sounds in CI patients. Environmental sound rehabilitation may be beneficial to some patients. Environmental sound testing may have potential diagnostic applications, especially with difficult-to-test populations and might also be predictive of speech performance for prelingually deafened patients with cochlear implants.

Neural Entrainment to Rhythmically Presented Auditory, Visual, and Audio-Visual Speech in Children

PubMed Central

Power, Alan James; Mead, Natasha; Barnes, Lisa; Goswami, Usha

2012-01-01

Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal “samples” of information from the speech stream at different rates, phase resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (“phase locking”). Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate) based on repetition of the syllable “ba,” presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a “talking head”). To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the “ba” stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a “ba” in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling, such as dyslexia. PMID:22833726
Development of Displaced Speech in Early Mother--Child Conversations

ERIC Educational Resources Information Center

Adamson, Lauren B.; Bakeman, Roger

2006-01-01

This study documents the development of symbolic, spatial, and temporal displacement of toddler's speech. Fifty-six children and their mothers were observed longitudinally 5 times from 18 to 30 months of age during a staged communication play while they engaged in scenes that encouraged interacting, requesting, and commenting and scenes that…
Concurrent Processing of Words and Their Replacements during Speech

ERIC Educational Resources Information Center

Hartsuiker, Robert J.; Catchpole, Ciara M.; de Jong, Nivja H.; Pickering, Martin J.

2008-01-01

Two picture naming experiments, in which an initial picture was occasionally replaced with another (target) picture, were conducted to study the temporal coordination of abandoning one word and resuming with another word in speech production. In Experiment 1, participants abandoned saying the initial name, and resumed with the name of the target…
Multi-time resolution analysis of speech: evidence from psychophysics

PubMed Central

Chait, Maria; Greenberg, Steven; Arai, Takayuki; Simon, Jonathan Z.; Poeppel, David

2015-01-01

How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10–40 Hz modulation frequency) and syllable-sized (2–10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~4 Hz; Slow) and rapid (~33 Hz; Shigh) modulations—corresponding to ~250 and ~30 ms, the average duration of syllables and certain phonetic properties, respectively—were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow and Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility—a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing. PMID:26136650
Neural correlates of lexicon and grammar: evidence from the production, reading, and judgment of inflection in aphasia.

PubMed

Ullman, Michael T; Pancheva, Roumyana; Love, Tracy; Yee, Eiling; Swinney, David; Hickok, Gregory

2005-05-01

Are the linguistic forms that are memorized in the mental lexicon and those that are specified by the rules of grammar subserved by distinct neurocognitive systems or by a single computational system with relatively broad anatomic distribution? On a dual-system view, the productive -ed-suffixation of English regular past tense forms (e.g., look-looked) depends upon the mental grammar, whereas irregular forms (e.g., dig-dug) are retrieved from lexical memory. On a single-mechanism view, the computation of both past tense types depends on associative memory. Neurological double dissociations between regulars and irregulars strengthen the dual-system view. The computation of real and novel, regular and irregular past tense forms was investigated in 20 aphasic subjects. Aphasics with non-fluent agrammatic speech and left frontal lesions were consistently more impaired at the production, reading, and judgment of regular than irregular past tenses. Aphasics with fluent speech and word-finding difficulties, and with left temporal/temporo-parietal lesions, showed the opposite pattern. These patterns held even when measures of frequency, phonological complexity, articulatory difficulty, and other factors were held constant. The data support the view that the memorized words of the mental lexicon are subserved by a brain system involving left temporal/temporo-parietal structures, whereas aspects of the mental grammar, in particular the computation of regular morphological forms, are subserved by a distinct system involving left frontal structures.
Speech Sound Processing Deficits and Training-Induced Neural Plasticity in Rats with Dyslexia Gene Knockdown

PubMed Central

Centanni, Tracy M.; Chen, Fuyi; Booker, Anne M.; Engineer, Crystal T.; Sloan, Andrew M.; Rennaker, Robert L.; LoTurco, Joseph J.; Kilgard, Michael P.

2014-01-01

In utero RNAi of the dyslexia-associated gene Kiaa0319 in rats (KIA-) degrades cortical responses to speech sounds and increases trial-by-trial variability in onset latency. We tested the hypothesis that KIA- rats would be impaired at speech sound discrimination. KIA- rats needed twice as much training in quiet conditions to perform at control levels and remained impaired at several speech tasks. Focused training using truncated speech sounds was able to normalize speech discrimination in quiet and background noise conditions. Training also normalized trial-by-trial neural variability and temporal phase locking. Cortical activity from speech trained KIA- rats was sufficient to accurately discriminate between similar consonant sounds. These results provide the first direct evidence that assumed reduced expression of the dyslexia-associated gene KIAA0319 can cause phoneme processing impairments similar to those seen in dyslexia and that intensive behavioral therapy can eliminate these impairments. PMID:24871331
Acoustic Processing of Temporally Modulated Sounds in Infants: Evidence from a Combined Near-Infrared Spectroscopy and EEG Study

PubMed Central

Telkemeyer, Silke; Rossi, Sonja; Nierhaus, Till; Steinbrink, Jens; Obrig, Hellmuth; Wartenburger, Isabell

2010-01-01

Speech perception requires rapid extraction of the linguistic content from the acoustic signal. The ability to efficiently process rapid changes in auditory information is important for decoding speech and thereby crucial during language acquisition. Investigating functional networks of speech perception in infancy might elucidate neuronal ensembles supporting perceptual abilities that gate language acquisition. Interhemispheric specializations for language have been demonstrated in infants. How these asymmetries are shaped by basic temporal acoustic properties is under debate. We recently provided evidence that newborns process non-linguistic sounds sharing temporal features with language in a differential and lateralized fashion. The present study used the same material while measuring brain responses of 6 and 3 month old infants using simultaneous recordings of electroencephalography (EEG) and near-infrared spectroscopy (NIRS). NIRS reveals that the lateralization observed in newborns remains constant over the first months of life. While fast acoustic modulations elicit bilateral neuronal activations, slow modulations lead to right-lateralized responses. Additionally, auditory-evoked potentials and oscillatory EEG responses show differential responses for fast and slow modulations indicating a sensitivity for temporal acoustic variations. Oscillatory responses reveal an effect of development, that is, 6 but not 3 month old infants show stronger theta-band desynchronization for slowly modulated sounds. Whether this developmental effect is due to increasing fine-grained perception for spectrotemporal sounds in general remains speculative. Our findings support the notion that a more general specialization for acoustic properties can be considered the basis for lateralization of speech perception. The results show that concurrent assessment of vascular based imaging and electrophysiological responses have great potential in the research on language acquisition. PMID:21716574
The use of listening devices to ameliorate auditory deficit in children with autism.

PubMed

Rance, Gary; Saunders, Kerryn; Carew, Peter; Johansson, Marlin; Tan, Johanna

2014-02-01

To evaluate both monaural and binaural processing skills in a group of children with autism spectrum disorder (ASD) and to determine the degree to which personal frequency modulation (radio transmission) (FM) listening systems could ameliorate their listening difficulties. Auditory temporal processing (amplitude modulation detection), spatial listening (integration of binaural difference cues), and functional hearing (speech perception in background noise) were evaluated in 20 children with ASD. Ten of these subsequently underwent a 6-week device trial in which they wore the FM system for up to 7 hours per day. Auditory temporal processing and spatial listening ability were poorer in subjects with ASD than in matched controls (temporal: P = .014 [95% CI -6.4 to -0.8 dB], spatial: P = .003 [1.0 to 4.4 dB]), and performance on both of these basic processing measures was correlated with speech perception ability (temporal: r = -0.44, P = .022; spatial: r = -0.50, P = .015). The provision of FM listening systems resulted in improved discrimination of speech in noise (P < .001 [11.6% to 21.7%]). Furthermore, both participant and teacher questionnaire data revealed device-related benefits across a range of evaluation categories including Effect of Background Noise (P = .036 [-60.7% to -2.8%]) and Ease of Communication (P = .019 [-40.1% to -5.0%]). Eight of the 10 participants who undertook the 6-week device trial remained consistent FM users at study completion. Sustained use of FM listening devices can enhance speech perception in noise, aid social interaction, and improve educational outcomes in children with ASD. Copyright © 2014 Mosby, Inc. All rights reserved.
Sentence intelligibility during segmental interruption and masking by speech-modulated noise: Effects of age and hearing loss

PubMed Central

Fogerty, Daniel; Ahlstrom, Jayne B.; Bologna, William J.; Dubno, Judy R.

2015-01-01

This study investigated how single-talker modulated noise impacts consonant and vowel cues to sentence intelligibility. Younger normal-hearing, older normal-hearing, and older hearing-impaired listeners completed speech recognition tests. All listeners received spectrally shaped speech matched to their individual audiometric thresholds to ensure sufficient audibility with the exception of a second younger listener group who received spectral shaping that matched the mean audiogram of the hearing-impaired listeners. Results demonstrated minimal declines in intelligibility for older listeners with normal hearing and more evident declines for older hearing-impaired listeners, possibly related to impaired temporal processing. A correlational analysis suggests a common underlying ability to process information during vowels that is predictive of speech-in-modulated noise abilities. Whereas, the ability to use consonant cues appears specific to the particular characteristics of the noise and interruption. Performance declines for older listeners were mostly confined to consonant conditions. Spectral shaping accounted for the primary contributions of audibility. However, comparison with the young spectral controls who received identical spectral shaping suggests that this procedure may reduce wideband temporal modulation cues due to frequency-specific amplification that affected high-frequency consonants more than low-frequency vowels. These spectral changes may impact speech intelligibility in certain modulation masking conditions. PMID:26093436
Status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications

NASA Astrophysics Data System (ADS)

Liberman, A. M.

1983-09-01

This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: The association between comprehension of spoken sentences and early reading ability: The role of phonetic representation; Phonetic coding and order memory in relation to reading proficiency: A comparison of short-term memory for temporal and spatial order information; Exploring the oral and written language errors made by language disabled children; Perceiving phonetic events; Converging evidence in support of common dynamical principles for speech and movement coordination; Phase transitions and critical behavior in human bimanual coordination; Timing and coarticulation for alveolo-palatals and sequences of alveolar +J in Catalan; V-to-C coarticulation in Catalan VCV sequences: An articulatory and acoustical study; Prosody and the /S/-/c/ distinction; Intersections of tone and intonation in Thai; Simultaneous measurements of vowels produced by a hearing-impaired speaker; Extending format transitions may not improve aphasics' perception of stop consonant place of articulation; Against a role of chirp identification in duplex perception; Further evidence for the role of relative timing in speech: A reply to Barry; Review (Phonological intervention: Concepts and procedures); and Review (Temporal variables in speech).
Phase effects in masking by harmonic complexes: speech recognition.

PubMed

Deroche, Mickael L D; Culling, John F; Chatterjee, Monita

2013-12-01

Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.
Noise on, voicing off: Speech perception deficits in children with specific language impairment.

PubMed

Ziegler, Johannes C; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian

2011-11-01

Speech perception of four phonetic categories (voicing, place, manner, and nasality) was investigated in children with specific language impairment (SLI) (n=20) and age-matched controls (n=19) in quiet and various noise conditions using an AXB two-alternative forced-choice paradigm. Children with SLI exhibited robust speech perception deficits in silence, stationary noise, and amplitude-modulated noise. Comparable deficits were obtained for fast, intermediate, and slow modulation rates, and this speaks against the various temporal processing accounts of SLI. Children with SLI exhibited normal "masking release" effects (i.e., better performance in fluctuating noise than in stationary noise), again suggesting relatively spared spectral and temporal auditory resolution. In terms of phonetic categories, voicing was more affected than place, manner, or nasality. The specific nature of this voicing deficit is hard to explain with general processing impairments in attention or memory. Finally, speech perception in noise correlated with an oral language component but not with either a memory or IQ component, and it accounted for unique variance beyond IQ and low-level auditory perception. In sum, poor speech perception seems to be one of the primary deficits in children with SLI that might explain poor phonological development, impaired word production, and poor word comprehension. Copyright © 2011 Elsevier Inc. All rights reserved.
Can a model of overlapping gestures account for scanning speech patterns?

PubMed

Tjaden, K

1999-06-01

A simple acoustic model of overlapping, sliding gestures was used to evaluate whether coproduction was reduced for neurologic speakers with scanning speech patterns. F2 onset frequency was used as an acoustic measure of coproduction or gesture overlap. The effects of speaking rate (habitual versus fast) and utterance position (initial versus medial) on F2 frequency, and presumably gesture overlap, were examined. Regression analyses also were used to evaluate the extent to which across-repetition temporal variability in F2 trajectories could be explained as variation in coproduction for consonants and vowels. The lower F2 onset frequencies for disordered speakers suggested that gesture overlap was reduced for neurologic individuals with scanning speech. Speaking rate change did not influence F2 onset frequencies, and presumably gesture overlap, for healthy or disordered speakers. F2 onset frequency differences for utterance-initial and -medial repetitions were interpreted to suggest reduced coproduction for the utterance-initial position. The utterance-position effects on F2 onset frequency, however, likely were complicated by position-related differences in articulatory scaling. The results of the regression analysis indicated that gesture sliding accounts, in part, for temporal variability in F2 trajectories. Taken together, the results of this study provide support for the idea that speech production theory for healthy talkers helps to account for disordered speech production.
Listeners modulate temporally selective attention during natural speech processing

PubMed Central

Astheimer, Lori B.; Sanders, Lisa D.

2009-01-01

Spatially selective attention allows for the preferential processing of relevant stimuli when more information than can be processed in detail is presented simultaneously at distinct locations. Temporally selective attention may serve a similar function during speech perception by allowing listeners to allocate attentional resources to time windows that contain highly relevant acoustic information. To test this hypothesis, event-related potentials were compared in response to attention probes presented in six conditions during a narrative: concurrently with word onsets, beginning 50 and 100 ms before and after word onsets, and at random control intervals. Times for probe presentation were selected such that the acoustic environments of the narrative were matched for all conditions. Linguistic attention probes presented at and immediately following word onsets elicited larger amplitude N1s than control probes over medial and anterior regions. These results indicate that native speakers selectively process sounds presented at specific times during normal speech perception. PMID:18395316
What iconic gesture fragments reveal about gesture-speech integration: when synchrony is lost, memory can help.

PubMed

Obermeier, Christian; Holle, Henning; Gunter, Thomas C

2011-07-01

The present series of experiments explores several issues related to gesture-speech integration and synchrony during sentence processing. To be able to more precisely manipulate gesture-speech synchrony, we used gesture fragments instead of complete gestures, thereby avoiding the usual long temporal overlap of gestures with their coexpressive speech. In a pretest, the minimal duration of an iconic gesture fragment needed to disambiguate a homonym (i.e., disambiguation point) was therefore identified. In three subsequent ERP experiments, we then investigated whether the gesture information available at the disambiguation point has immediate as well as delayed consequences on the processing of a temporarily ambiguous spoken sentence, and whether these gesture-speech integration processes are susceptible to temporal synchrony. Experiment 1, which used asynchronous stimuli as well as an explicit task, showed clear N400 effects at the homonym as well as at the target word presented further downstream, suggesting that asynchrony does not prevent integration under explicit task conditions. No such effects were found when asynchronous stimuli were presented using a more shallow task (Experiment 2). Finally, when gesture fragment and homonym were synchronous, similar results as in Experiment 1 were found, even under shallow task conditions (Experiment 3). We conclude that when iconic gesture fragments and speech are in synchrony, their interaction is more or less automatic. When they are not, more controlled, active memory processes are necessary to be able to combine the gesture fragment and speech context in such a way that the homonym is disambiguated correctly.
Neural dynamics of speech act comprehension: an MEG study of naming and requesting.

PubMed

Egorova, Natalia; Pulvermüller, Friedemann; Shtyrov, Yury

2014-05-01

The neurobiological basis and temporal dynamics of communicative language processing pose important yet unresolved questions. It has previously been suggested that comprehension of the communicative function of an utterance, i.e. the so-called speech act, is supported by an ensemble of neural networks, comprising lexico-semantic, action and mirror neuron as well as theory of mind circuits, all activated in concert. It has also been demonstrated that recognition of the speech act type occurs extremely rapidly. These findings however, were obtained in experiments with insufficient spatio-temporal resolution, thus possibly concealing important facets of the neural dynamics of the speech act comprehension process. Here, we used magnetoencephalography to investigate the comprehension of Naming and Request actions performed with utterances controlled for physical features, psycholinguistic properties and the probability of occurrence in variable contexts. The results show that different communicative actions are underpinned by a dynamic neural network, which differentiates between speech act types very early after the speech act onset. Within 50-90 ms, Requests engaged mirror-neuron action-comprehension systems in sensorimotor cortex, possibly for processing action knowledge and intentions. Still, within the first 200 ms of stimulus onset (100-150 ms), Naming activated brain areas involved in referential semantic retrieval. Subsequently (200-300 ms), theory of mind and mentalising circuits were activated in medial prefrontal and temporo-parietal areas, possibly indexing processing of intentions and assumptions of both communication partners. This cascade of stages of processing information about actions and intentions, referential semantics, and theory of mind may underlie dynamic and interactive speech act comprehension.
Speech identification in noise: Contribution of temporal, spectral, and visual speech cues.

PubMed

Kim, Jeesun; Davis, Chris; Groot, Christopher

2009-12-01

This study investigated the degree to which two types of reduced auditory signals (cochlear implant simulations) and visual speech cues combined for speech identification. The auditory speech stimuli were filtered to have only amplitude envelope cues or both amplitude envelope and spectral cues and were presented with/without visual speech. In Experiment 1, IEEE sentences were presented in quiet and noise. For in-quiet presentation, speech identification was enhanced by the addition of both spectral and visual speech cues. Due to a ceiling effect, the degree to which these effects combined could not be determined. In noise, these facilitation effects were more marked and were additive. Experiment 2 examined consonant and vowel identification in the context of CVC or VCV syllables presented in noise. For consonants, both spectral and visual speech cues facilitated identification and these effects were additive. For vowels, the effect of combined cues was underadditive, with the effect of spectral cues reduced when presented with visual speech cues. Analysis indicated that without visual speech, spectral cues facilitated the transmission of place information and vowel height, whereas with visual speech, they facilitated lip rounding, with little impact on the transmission of place information.
Stereotactic probability and variability of speech arrest and anomia sites during stimulation mapping of the language dominant hemisphere.

PubMed

Chang, Edward F; Breshears, Jonathan D; Raygor, Kunal P; Lau, Darryl; Molinaro, Annette M; Berger, Mitchel S

2017-01-01

OBJECTIVE Functional mapping using direct cortical stimulation is the gold standard for the prevention of postoperative morbidity during resective surgery in dominant-hemisphere perisylvian regions. Its role is necessitated by the significant interindividual variability that has been observed for essential language sites. The aim in this study was to determine the statistical probability distribution of eliciting aphasic errors for any given stereotactically based cortical position in a patient cohort and to quantify the variability at each cortical site. METHODS Patients undergoing awake craniotomy for dominant-hemisphere primary brain tumor resection between 1999 and 2014 at the authors' institution were included in this study, which included counting and picture-naming tasks during dense speech mapping via cortical stimulation. Positive and negative stimulation sites were collected using an intraoperative frameless stereotactic neuronavigation system and were converted to Montreal Neurological Institute coordinates. Data were iteratively resampled to create mean and standard deviation probability maps for speech arrest and anomia. Patients were divided into groups with a "classic" or an "atypical" location of speech function, based on the resultant probability maps. Patient and clinical factors were then assessed for their association with an atypical location of speech sites by univariate and multivariate analysis. RESULTS Across 102 patients undergoing speech mapping, the overall probabilities of speech arrest and anomia were 0.51 and 0.33, respectively. Speech arrest was most likely to occur with stimulation of the posterior inferior frontal gyrus (maximum probability from individual bin = 0.025), and variance was highest in the dorsal premotor cortex and the posterior superior temporal gyrus. In contrast, stimulation within the posterior perisylvian cortex resulted in the maximum mean probability of anomia (maximum probability = 0.012), with large variance in the regions surrounding the posterior superior temporal gyrus, including the posterior middle temporal, angular, and supramarginal gyri. Patients with atypical speech localization were far more likely to have tumors in canonical Broca's or Wernicke's areas (OR 7.21, 95% CI 1.67-31.09, p < 0.01) or to have multilobar tumors (OR 12.58, 95% CI 2.22-71.42, p < 0.01), than were patients with classic speech localization. CONCLUSIONS This study provides statistical probability distribution maps for aphasic errors during cortical stimulation mapping in a patient cohort. Thus, the authors provide an expected probability of inducing speech arrest and anomia from specific 10-mm 2 cortical bins in an individual patient. In addition, they highlight key regions of interindividual mapping variability that should be considered preoperatively. They believe these results will aid surgeons in their preoperative planning of eloquent cortex resection.
Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus

PubMed Central

2017-01-01

Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of the pSTS. Different pSTS regions prefer visually presented faces containing either a moving mouth or moving eyes, but only mouth-preferring regions respond strongly to voices. PMID:28179553
The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

PubMed Central

Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

2017-01-01

We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework of the multimodal nature of human communication. PMID:28424636

Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus.

PubMed

Zhu, Lin L; Beauchamp, Michael S

2017-03-08

Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of the pSTS. Different pSTS regions prefer visually presented faces containing either a moving mouth or moving eyes, but only mouth-preferring regions respond strongly to voices. Copyright © 2017 the authors 0270-6474/17/372697-12$15.00/0.
Neural initialization of audiovisual integration in prereaders at varying risk for developmental dyslexia.

PubMed

I Karipidis, Iliana; Pleisch, Georgette; Röthlisberger, Martina; Hofstetter, Christoph; Dornbierer, Dario; Stämpfli, Philipp; Brem, Silvia

2017-02-01

Learning letter-speech sound correspondences is a major step in reading acquisition and is severely impaired in children with dyslexia. Up to now, it remains largely unknown how quickly neural networks adopt specific functions during audiovisual integration of linguistic information when prereading children learn letter-speech sound correspondences. Here, we simulated the process of learning letter-speech sound correspondences in 20 prereading children (6.13-7.17 years) at varying risk for dyslexia by training artificial letter-speech sound correspondences within a single experimental session. Subsequently, we acquired simultaneously event-related potentials (ERP) and functional magnetic resonance imaging (fMRI) scans during implicit audiovisual presentation of trained and untrained pairs. Audiovisual integration of trained pairs correlated with individual learning rates in right superior temporal, left inferior temporal, and bilateral parietal areas and with phonological awareness in left temporal areas. In correspondence, a differential left-lateralized parietooccipitotemporal ERP at 400 ms for trained pairs correlated with learning achievement and familial risk. Finally, a late (650 ms) posterior negativity indicating audiovisual congruency of trained pairs was associated with increased fMRI activation in the left occipital cortex. Taken together, a short (<30 min) letter-speech sound training initializes audiovisual integration in neural systems that are responsible for processing linguistic information in proficient readers. To conclude, the ability to learn grapheme-phoneme correspondences, the familial history of reading disability, and phonological awareness of prereading children account for the degree of audiovisual integration in a distributed brain network. Such findings on emerging linguistic audiovisual integration could allow for distinguishing between children with typical and atypical reading development. Hum Brain Mapp 38:1038-1055, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Neural mechanisms of phonemic restoration for speech comprehension revealed by magnetoencephalography.

PubMed

Sunami, Kishiko; Ishii, Akira; Takano, Sakurako; Yamamoto, Hidefumi; Sakashita, Tetsushi; Tanaka, Masaaki; Watanabe, Yasuyoshi; Yamane, Hideo

2013-11-06

In daily communication, we can usually still hear the spoken words as if they had not been masked and can comprehend the speech when spoken words are masked by background noise. This phenomenon is known as phonemic restoration. Since little is known about the neural mechanisms underlying phonemic restoration for speech comprehension, we aimed to identify the neural mechanisms using magnetoencephalography (MEG). Twelve healthy male volunteers with normal hearing participated in the study. Participants were requested to carefully listen to and understand recorded spoken Japanese stories, which were either played forward (forward condition) or in reverse (reverse condition), with their eyes closed. Several syllables of spoken words were replaced by 300-ms white-noise stimuli with an inter-stimulus interval of 1.6-20.3s. We compared MEG responses to white-noise stimuli during the forward condition with those during the reverse condition using time-frequency analyses. Increased 3-5 Hz band power in the forward condition compared with the reverse condition was continuously observed in the left inferior frontal gyrus [Brodmann's areas (BAs) 45, 46, and 47] and decreased 18-22 Hz band powers caused by white-noise stimuli were seen in the left transverse temporal gyrus (BA 42) and superior temporal gyrus (BA 22). These results suggest that the left inferior frontal gyrus and left transverse and superior temporal gyri are involved in phonemic restoration for speech comprehension. Our findings may help clarify the neural mechanisms of phonemic restoration as well as develop innovative treatment methods for individuals suffering from impaired speech comprehension, particularly in noisy environments. © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Long-term temporal tracking of speech rate affects spoken-word recognition.

PubMed

Baese-Berk, Melissa M; Heffner, Christopher C; Dilley, Laura C; Pitt, Mark A; Morrill, Tuuli H; McAuley, J Devin

2014-08-01

Humans unconsciously track a wide array of distributional characteristics in their sensory environment. Recent research in spoken-language processing has demonstrated that the speech rate surrounding a target region within an utterance influences which words, and how many words, listeners hear later in that utterance. On the basis of hypotheses that listeners track timing information in speech over long timescales, we investigated the possibility that the perception of words is sensitive to speech rate over such a timescale (e.g., an extended conversation). Results demonstrated that listeners tracked variation in the overall pace of speech over an extended duration (analogous to that of a conversation that listeners might have outside the lab) and that this global speech rate influenced which words listeners reported hearing. The effects of speech rate became stronger over time. Our findings are consistent with the hypothesis that neural entrainment by speech occurs on multiple timescales, some lasting more than an hour. © The Author(s) 2014.
Using Temporal Modulation Sensitivity to Select Stimulation Sites for Processor MAPs in Cochlear Implant Listeners

PubMed Central

Garadat, Soha N.; Zwolan, Teresa A.; Pfingst, Bryan E.

2013-01-01

Previous studies in our laboratory showed that temporal acuity as assessed by modulation detection thresholds (MDTs) varied across activation sites and that this site-to-site variability was subject specific. Using two 10-channel MAPs, the previous experiments showed that processor MAPs that had better across-site mean (ASM) MDTs yielded better speech recognition than MAPs with poorer ASM MDTs tested in the same subject. The current study extends our earlier work on developing more optimal fitting strategies to test the feasibility of using a site-selection approach in the clinical domain. This study examined the hypothesis that revising the clinical speech processor MAP for cochlear implant (CI) recipients by turning off selected sites that have poorer temporal acuity and reallocating frequencies to the remaining electrodes would lead to improved speech recognition. Twelve CI recipients participated in the experiments. We found that site selection procedure based on MDTs in the presence of a masker resulted in improved performance on consonant recognition and recognition of sentences in noise. In contrast, vowel recognition was poorer with the experimental MAP than with the clinical MAP, possibly due to reduced spectral resolution when sites were removed from the experimental MAP. Overall, these results suggest a promising path for improving recipient outcomes using personalized processor-fitting strategies based on a psychophysical measure of temporal acuity. PMID:23881208
The Influence of Cochlear Mechanical Dysfunction, Temporal Processing Deficits, and Age on the Intelligibility of Audible Speech in Noise for Hearing-Impaired Listeners

PubMed Central

Johannesen, Peter T.; Pérez-González, Patricia; Kalluri, Sridhar; Blanco, José L.

2016-01-01

The aim of this study was to assess the relative importance of cochlear mechanical dysfunction, temporal processing deficits, and age on the ability of hearing-impaired listeners to understand speech in noisy backgrounds. Sixty-eight listeners took part in the study. They were provided with linear, frequency-specific amplification to compensate for their audiometric losses, and intelligibility was assessed for speech-shaped noise (SSN) and a time-reversed two-talker masker (R2TM). Behavioral estimates of cochlear gain loss and residual compression were available from a previous study and were used as indicators of cochlear mechanical dysfunction. Temporal processing abilities were assessed using frequency modulation detection thresholds. Age, audiometric thresholds, and the difference between audiometric threshold and cochlear gain loss were also included in the analyses. Stepwise multiple linear regression models were used to assess the relative importance of the various factors for intelligibility. Results showed that (a) cochlear gain loss was unrelated to intelligibility, (b) residual cochlear compression was related to intelligibility in SSN but not in a R2TM, (c) temporal processing was strongly related to intelligibility in a R2TM and much less so in SSN, and (d) age per se impaired intelligibility. In summary, all factors affected intelligibility, but their relative importance varied across maskers. PMID:27604779
Temporally Regular Musical Primes Facilitate Subsequent Syntax Processing in Children with Specific Language Impairment.

PubMed

Bedoin, Nathalie; Brisseau, Lucie; Molinier, Pauline; Roch, Didier; Tillmann, Barbara

2016-01-01

Children with developmental language disorders have been shown to be also impaired in rhythm and meter perception. Temporal processing and its link to language processing can be understood within the dynamic attending theory. An external stimulus can stimulate internal oscillators, which orient attention over time and drive speech signal segmentation to provide benefits for syntax processing, which is impaired in various patient populations. For children with Specific Language Impairment (SLI) and dyslexia, previous research has shown the influence of an external rhythmic stimulation on subsequent language processing by comparing the influence of a temporally regular musical prime to that of a temporally irregular prime. Here we tested whether the observed rhythmic stimulation effect is indeed due to a benefit provided by the regular musical prime (rather than a cost subsequent to the temporally irregular prime). Sixteen children with SLI and 16 age-matched controls listened to either a regular musical prime sequence or an environmental sound scene (without temporal regularities in event occurrence; i.e., referred to as "baseline condition") followed by grammatically correct and incorrect sentences. They were required to perform grammaticality judgments for each auditorily presented sentence. Results revealed that performance for the grammaticality judgments was better after the regular prime sequences than after the baseline sequences. Our findings are interpreted in the theoretical framework of the dynamic attending theory (Jones, 1976) and the temporal sampling (oscillatory) framework for developmental language disorders (Goswami, 2011). Furthermore, they encourage the use of rhythmic structures (even in non-verbal materials) to boost linguistic structure processing and outline perspectives for rehabilitation.
Children Discover the Spectral Skeletons in Their Native Language before the Amplitude Envelopes

ERIC Educational Resources Information Center

Nittrouer, Susan; Lowenstein, Joanna H.; Packer, Robert R.

2009-01-01

Much of speech perception research has focused on brief spectro-temporal properties in the signal, but some studies have shown that adults can recover linguistic form when those properties are absent. In this experiment, 7-year-old English-speaking children demonstrated adultlike abilities to understand speech when only sine waves (SWs)…
The Effect of Temporal Gap Identification on Speech Perception by Users of Cochlear Implants

ERIC Educational Resources Information Center

Sagi, Elad; Kaiser, Adam R.; Meyer, Ted A.; Svirsky, Mario A.

2009-01-01

Purpose: This study examined the ability of listeners using cochlear implants (CIs) and listeners with normal hearing (NH) to identify silent gaps of different duration and the relation of this ability to speech understanding in CI users. Method: Sixteen NH adults and 11 postlingually deafened adults with CIs identified synthetic vowel-like…
Temporal Envelope Changes of Compression and Speech Rate: Combined Effects on Recognition for Older Adults

ERIC Educational Resources Information Center

Jenstad, Lorienne M.; Souza, Pamela E.

2007-01-01

Purpose: When understanding speech in complex listening situations, older adults with hearing loss face the double challenge of cochlear hearing loss and deficits of the aging auditory system. Wide-dynamic range compression (WDRC) is used in hearing aids as remediation for the loss of audibility associated with hearing loss. WDRC processing has…
Effects of Hearing and Aging on Sentence-Level Time-Gated Word Recognition

ERIC Educational Resources Information Center

Molis, Michelle R.; Kampel, Sean D.; McMillan, Garnett P.; Gallun, Frederick J.; Dann, Serena M.; Konrad-Martin, Dawn

2015-01-01

Purpose: Aging is known to influence temporal processing, but its relationship to speech perception has not been clearly defined. To examine listeners' use of contextual and phonetic information, the Revised Speech Perception in Noise test (R-SPIN) was used to develop a time-gated word (TGW) task. Method: In Experiment 1, R-SPIN sentence lists…
Native Speakers' Perceptions of Fluency and Accent in L2 Speech

ERIC Educational Resources Information Center

Pinget, Anne-France; Bosker, Hans Rutger; Quené, Hugo; de Jong, Nivja H.

2014-01-01

Oral fluency and foreign accent distinguish L2 from L1 speech production. In language testing practices, both fluency and accent are usually assessed by raters. This study investigates what exactly native raters of fluency and accent take into account when judging L2. Our aim is to explore the relationship between objectively measured temporal,…
Tongue Motion Averaging from Contour Sequences

ERIC Educational Resources Information Center

Li, Min; Kambhamettu, Chandra; Stone, Maureen

2005-01-01

In this paper, a method to get the best representation of a speech motion from several repetitions is presented. Each repetition is a representation of the same speech captured at different times by sequence of ultrasound images and is composed of a set of 2D spatio-temporal contours. These 2D contours in different repetitions are time aligned…
Predictive top-down integration of prior knowledge during speech perception.

PubMed

Sohoglu, Ediz; Peelle, Jonathan E; Carlyon, Robert P; Davis, Matthew H

2012-06-20

A striking feature of human perception is that our subjective experience depends not only on sensory information from the environment but also on our prior knowledge or expectations. The precise mechanisms by which sensory information and prior knowledge are integrated remain unclear, with longstanding disagreement concerning whether integration is strictly feedforward or whether higher-level knowledge influences sensory processing through feedback connections. Here we used concurrent EEG and MEG recordings to determine how sensory information and prior knowledge are integrated in the brain during speech perception. We manipulated listeners' prior knowledge of speech content by presenting matching, mismatching, or neutral written text before a degraded (noise-vocoded) spoken word. When speech conformed to prior knowledge, subjective perceptual clarity was enhanced. This enhancement in clarity was associated with a spatiotemporal profile of brain activity uniquely consistent with a feedback process: activity in the inferior frontal gyrus was modulated by prior knowledge before activity in lower-level sensory regions of the superior temporal gyrus. In parallel, we parametrically varied the level of speech degradation, and therefore the amount of sensory detail, so that changes in neural responses attributable to sensory information and prior knowledge could be directly compared. Although sensory detail and prior knowledge both enhanced speech clarity, they had an opposite influence on the evoked response in the superior temporal gyrus. We argue that these data are best explained within the framework of predictive coding in which sensory activity is compared with top-down predictions and only unexplained activity propagated through the cortical hierarchy.
Spatial and temporal modifications of multitalker speech can improve speech perception in older adults.

PubMed

Gygi, Brian; Shafiro, Valeriy

2014-04-01

Speech perception in multitalker environments often requires listeners to divide attention among several concurrent talkers before focusing on one talker with pertinent information. Such attentionally demanding tasks are particularly difficult for older adults due both to age-related hearing loss (presbacusis) and general declines in attentional processing and associated cognitive abilities. This study investigated two signal-processing techniques that have been suggested as a means of improving speech perception accuracy of older adults: time stretching and spatial separation of target talkers. Stimuli in each experiment comprised 2-4 fixed-form utterances in which listeners were asked to consecutively 1) detect concurrently spoken keywords in the beginning of the utterance (divided attention); and, 2) identify additional keywords from only one talker at the end of the utterance (selective attention). In Experiment 1, the overall tempo of each utterance was unaltered or slowed down by 25%; in Experiment 2 the concurrent utterances were spatially coincident or separated across a 180-degree hemifield. Both manipulations improved performance for elderly adults with age-appropriate hearing on both tasks. Increasing the divided attention load by attending to more concurrent keywords had a marked negative effect on performance of the selective attention task only when the target talker was identified by a keyword, but not by spatial location. These findings suggest that the temporal and spatial modifications of multitalker speech improved perception of multitalker speech primarily by reducing competition among cognitive resources required to perform attentionally demanding tasks. Published by Elsevier B.V.
The temporal representation of speech in a nonlinear model of the guinea pig cochlea

NASA Astrophysics Data System (ADS)

Holmes, Stephen D.; Sumner, Christian J.; O'Mard, Lowel P.; Meddis, Ray

2004-12-01

The temporal representation of speechlike stimuli in the auditory-nerve output of a guinea pig cochlea model is described. The model consists of a bank of dual resonance nonlinear filters that simulate the vibratory response of the basilar membrane followed by a model of the inner hair cell/auditory nerve complex. The model is evaluated by comparing its output with published physiological auditory nerve data in response to single and double vowels. The evaluation includes analyses of individual fibers, as well as ensemble responses over a wide range of best frequencies. In all cases the model response closely follows the patterns in the physiological data, particularly the tendency for the temporal firing pattern of each fiber to represent the frequency of a nearby formant of the speech sound. In the model this behavior is largely a consequence of filter shapes; nonlinear filtering has only a small contribution at low frequencies. The guinea pig cochlear model produces a useful simulation of the measured physiological response to simple speech sounds and is therefore suitable for use in more advanced applications including attempts to generalize these principles to the response of human auditory system, both normal and impaired. .
Impact of Audio-Visual Asynchrony on Lip-Reading Effects -Neuromagnetic and Psychophysical Study-

PubMed Central

Yahata, Izumi; Kanno, Akitake; Sakamoto, Shuichi; Takanashi, Yoshitaka; Takata, Shiho; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio

2016-01-01

The effects of asynchrony between audio and visual (A/V) stimuli on the N100m responses of magnetoencephalography in the left hemisphere were compared with those on the psychophysical responses in 11 participants. The latency and amplitude of N100m were significantly shortened and reduced in the left hemisphere by the presentation of visual speech as long as the temporal asynchrony between A/V stimuli was within 100 ms, but were not significantly affected with audio lags of -500 and +500 ms. However, some small effects were still preserved on average with audio lags of 500 ms, suggesting similar asymmetry of the temporal window to that observed in psychophysical measurements, which tended to be more robust (wider) for audio lags; i.e., the pattern of visual-speech effects as a function of A/V lag observed in the N100m in the left hemisphere grossly resembled that in psychophysical measurements on average, although the individual responses were somewhat varied. The present results suggest that the basic configuration of the temporal window of visual effects on auditory-speech perception could be observed from the early auditory processing stage. PMID:28030631
Spectral-temporal EEG dynamics of speech discrimination processing in infants during sleep.

PubMed

Gilley, Phillip M; Uhler, Kristin; Watson, Kaylee; Yoshinaga-Itano, Christine

2017-03-22

Oddball paradigms are frequently used to study auditory discrimination by comparing event-related potential (ERP) responses from a standard, high probability sound and to a deviant, low probability sound. Previous research has established that such paradigms, such as the mismatch response or mismatch negativity, are useful for examining auditory processes in young children and infants across various sleep and attention states. The extent to which oddball ERP responses may reflect subtle discrimination effects, such as speech discrimination, is largely unknown, especially in infants that have not yet acquired speech and language. Mismatch responses for three contrasts (non-speech, vowel, and consonant) were computed as a spectral-temporal probability function in 24 infants, and analyzed at the group level by a modified multidimensional scaling. Immediately following an onset gamma response (30-50 Hz), the emergence of a beta oscillation (12-30 Hz) was temporally coupled with a lower frequency theta oscillation (2-8 Hz). The spectral-temporal probability of this coupling effect relative to a subsequent theta modulation corresponds with discrimination difficulty for non-speech, vowel, and consonant contrast features. The theta modulation effect suggests that unexpected sounds are encoded as a probabilistic measure of surprise. These results support the notion that auditory discrimination is driven by the development of brain networks for predictive processing, and can be measured in infants during sleep. The results presented here have implications for the interpretation of discrimination as a probabilistic process, and may provide a basis for the development of single-subject and single-trial classification in a clinically useful context. An infant's brain is processing information about the environment and performing computations, even during sleep. These computations reflect subtle differences in acoustic feature processing that are necessary for language-learning. Results from this study suggest that brain responses to deviant sounds in an oddball paradigm follow a cascade of oscillatory modulations. This cascade begins with a gamma response that later emerges as a beta synchronization, which is temporally coupled with a theta modulation, and followed by a second, subsequent theta modulation. The difference in frequency and timing of the theta modulations appears to reflect a measure of surprise. These insights into the neurophysiological mechanisms of auditory discrimination provide a basis for exploring the clinically utility of the MMR TF and other auditory oddball responses.
Neural integration of iconic and unrelated coverbal gestures: a functional MRI study.

PubMed

Green, Antonia; Straube, Benjamin; Weis, Susanne; Jansen, Andreas; Willmes, Klaus; Konrad, Kerstin; Kircher, Tilo

2009-10-01

Gestures are an important part of interpersonal communication, for example by illustrating physical properties of speech contents (e.g., "the ball is round"). The meaning of these so-called iconic gestures is strongly intertwined with speech. We investigated the neural correlates of the semantic integration for verbal and gestural information. Participants watched short videos of five speech and gesture conditions performed by an actor, including variation of language (familiar German vs. unfamiliar Russian), variation of gesture (iconic vs. unrelated), as well as isolated familiar language, while brain activation was measured using functional magnetic resonance imaging. For familiar speech with either of both gesture types contrasted to Russian speech-gesture pairs, activation increases were observed at the left temporo-occipital junction. Apart from this shared location, speech with iconic gestures exclusively engaged left occipital areas, whereas speech with unrelated gestures activated bilateral parietal and posterior temporal regions. Our results demonstrate that the processing of speech with speech-related versus speech-unrelated gestures occurs in two distinct but partly overlapping networks. The distinct processing streams (visual versus linguistic/spatial) are interpreted in terms of "auxiliary systems" allowing the integration of speech and gesture in the left temporo-occipital region.
Engaged listeners: shared neural processing of powerful political speeches

PubMed Central

Häcker, Frank E. K.; Honey, Christopher J.; Hasson, Uri

2015-01-01

Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners’ brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. PMID:25653012

The interlanguage speech intelligibility benefit for native speakers of Mandarin: Production and perception of English word-final voicing contrasts

PubMed Central

Hayes-Harb, Rachel; Smith, Bruce L.; Bent, Tessa; Bradlow, Ann R.

2009-01-01

This study investigated the intelligibility of native and Mandarin-accented English speech for native English and native Mandarin listeners. The word-final voicing contrast was considered (as in minimal pairs such as `cub' and `cup') in a forced-choice word identification task. For these particular talkers and listeners, there was evidence of an interlanguage speech intelligibility benefit for listeners (i.e., native Mandarin listeners were more accurate than native English listeners at identifying Mandarin-accented English words). However, there was no evidence of an interlanguage speech intelligibility benefit for talkers (i.e., native Mandarin listeners did not find Mandarin-accented English speech more intelligible than native English speech). When listener and talker phonological proficiency (operationalized as accentedness) was taken into account, it was found that the interlanguage speech intelligibility benefit for listeners held only for the low phonological proficiency listeners and low phonological proficiency speech. The intelligibility data were also considered in relation to various temporal-acoustic properties of native English and Mandarin-accented English speech in effort to better understand the properties of speech that may contribute to the interlanguage speech intelligibility benefit. PMID:19606271
It's about time: Presentation in honor of Ira Hirsh

NASA Astrophysics Data System (ADS)

Grant, Ken

2002-05-01

Over his long and illustrious career, Ira Hirsh has returned time and time again to his interest in the temporal aspects of pattern perception. Although Hirsh has studied and published articles and books pertaining to many aspects of the auditory system, such as sound conduction in the ear, cochlear mechanics, masking, auditory localization, psychoacoustic behavior in animals, speech perception, medical and audiological applications, coupling between psychophysics and physiology, and ecological acoustics, it is his work on auditory timing of simple and complex rhythmic patterns, the backbone of speech and music, that are at the heart of his more recent work. Here, we will focus on several aspects of temporal processing of simple and complex signals, both within and across sensory systems. Data will be reviewed on temporal order judgments of simple tones, and simultaneity judgments and intelligibility of unimodal and bimodal complex stimuli where stimulus components are presented either synchronously or asynchronously. Differences in the symmetry and shape of ``temporal windows'' derived from these data sets will be highlighted.
DETECTION AND IDENTIFICATION OF SPEECH SOUNDS USING CORTICAL ACTIVITY PATTERNS

PubMed Central

Centanni, T.M.; Sloan, A.M.; Reed, A.C.; Engineer, C.T.; Rennaker, R.; Kilgard, M.P.

2014-01-01

We have developed a classifier capable of locating and identifying speech sounds using activity from rat auditory cortex with an accuracy equivalent to behavioral performance without the need to specify the onset time of the speech sounds. This classifier can identify speech sounds from a large speech set within 40 ms of stimulus presentation. To compare the temporal limits of the classifier to behavior, we developed a novel task that requires rats to identify individual consonant sounds from a stream of distracter consonants. The classifier successfully predicted the ability of rats to accurately identify speech sounds for syllable presentation rates up to 10 syllables per second (up to 17.9 ± 1.5 bits/sec), which is comparable to human performance. Our results demonstrate that the spatiotemporal patterns generated in primary auditory cortex can be used to quickly and accurately identify consonant sounds from a continuous speech stream without prior knowledge of the stimulus onset times. Improved understanding of the neural mechanisms that support robust speech processing in difficult listening conditions could improve the identification and treatment of a variety of speech processing disorders. PMID:24286757
Attentional Gain Control of Ongoing Cortical Speech Representations in a “Cocktail Party”

PubMed Central

Kerlin, Jess R.; Shahin, Antoine J.; Miller, Lee M.

2010-01-01

Normal listeners possess the remarkable perceptual ability to select a single speech stream among many competing talkers. However, few studies of selective attention have addressed the unique nature of speech as a temporally extended and complex auditory object. We hypothesized that sustained selective attention to speech in a multi-talker environment would act as gain control on the early auditory cortical representations of speech. Using high-density electroencephalography and a template-matching analysis method, we found selective gain to the continuous speech content of an attended talker, greatest at a frequency of 4–8 Hz, in auditory cortex. In addition, the difference in alpha power (8–12 Hz) at parietal sites across hemispheres indicated the direction of auditory attention to speech, as has been previously found in visual tasks. The strength of this hemispheric alpha lateralization, in turn, predicted an individual’s attentional gain of the cortical speech signal. These results support a model of spatial speech stream segregation, mediated by a supramodal attention mechanism, enabling selection of the attended representation in auditory cortex. PMID:20071526
The Involvement of Speed-of-Processing in Story Listening in Preschool Children: A Functional and Structural Connectivity Study.

PubMed

Horowitz-Kraus, Tzipi; Farah, Rola; DiFrancesco, Mark; Vannest, Jennifer

2017-02-01

Story listening in children relies on brain regions supporting speech perception, auditory word recognition, syntax, semantics, and discourse abilities, along with the ability to attend and process information (part of executive functions). Speed-of-processing is an early-developed executive function. We used functional and structural magnetic resonance imaging (MRI) to demonstrate the relationship between story listening and speed-of-processing in preschool-age children. Eighteen participants performed story-listening tasks during MRI scans. Functional and structural connectivity analysis was performed using the speed-of-processing scores as regressors. Activation in the superior frontal gyrus during story listening positively correlated with speed-of-processing scores. This region was functionally connected with the superior temporal gyrus, insula, and hippocampus. Fractional anisotropy in the inferior frontooccipital fasciculus, which connects the superior frontal and temporal gyri, was positively correlated with speed-of-processing scores. Our results suggest that speed-of-processing skills in preschool-age children are reflected in functional activation and connectivity during story listening and may act as a biomarker for future academic abilities. Georg Thieme Verlag KG Stuttgart · New York.
[Memory peculiarities in patients with schizophrenia and their first-degree relatives].

PubMed

Savina, T D; Orlova, V A; Shcherbakova, N P; Korsakova, N K; Malova, Iu A; Efanova, N N; Ganisheva, T K; Nikolaev, R A

2008-01-01

Eighty-four families with schizophrenia: 84 patients (probands) and 73 their first-degree unaffected relatives as well as 37 normals and their relatives have been studied using pathopsychological (pictogram) and Luria's neuropsychological tests. The most prominent abnormalities both in patients and relatives were global characteristics of auditory-speech memory predominantly related to left subcortical and left temporal regions. Abnormalities of immediate recall of short logic story (SLS) were connected with dysfunction of the same brain regions. Less prominent delayed recall abnormalities of SLS were revealed only in patients and connected with left subcortical, left subcortical-frontal and left subcortical-temporal zones. This abnormality was absent in relatives and age-matched controls. The span of mediated retention was decreased in patients and, to a less degree, in relatives. A quantitative psychological analysis has demonstrated the disintegration ("schizys") between semantic conception and image memory structure in patients and, to a less degree, in relatives. Data obtained show primary memory abnormalities in families with schizophrenia related to the impairment of decoding information process in the subcortical structures, the left-side dysfunction of brain structures being predominantly typical.
Audiovisual Simultaneity Judgment and Rapid Recalibration throughout the Lifespan.

PubMed

Noel, Jean-Paul; De Niear, Matthew; Van der Burg, Erik; Wallace, Mark T

2016-01-01

Multisensory interactions are well established to convey an array of perceptual and behavioral benefits. One of the key features of multisensory interactions is the temporal structure of the stimuli combined. In an effort to better characterize how temporal factors influence multisensory interactions across the lifespan, we examined audiovisual simultaneity judgment and the degree of rapid recalibration to paired audiovisual stimuli (Flash-Beep and Speech) in a sample of 220 participants ranging from 7 to 86 years of age. Results demonstrate a surprisingly protracted developmental time-course for both audiovisual simultaneity judgment and rapid recalibration, with neither reaching maturity until well into adolescence. Interestingly, correlational analyses revealed that audiovisual simultaneity judgments (i.e., the size of the audiovisual temporal window of simultaneity) and rapid recalibration significantly co-varied as a function of age. Together, our results represent the most complete description of age-related changes in audiovisual simultaneity judgments to date, as well as being the first to describe changes in the degree of rapid recalibration as a function of age. We propose that the developmental time-course of rapid recalibration scaffolds the maturation of more durable audiovisual temporal representations.
Sounds and silence: An optical topography study of language recognition at birth

NASA Astrophysics Data System (ADS)

Peña, Marcela; Maki, Atsushi; Kovaic, Damir; Dehaene-Lambertz, Ghislaine; Koizumi, Hideaki; Bouquet, Furio; Mehler, Jacques

2003-09-01

Does the neonate's brain have left hemisphere (LH) dominance for speech? Twelve full-term neonates participated in an optical topography study designed to assess whether the neonate brain responds specifically to linguistic stimuli. Participants were tested with normal infant-directed speech, with the same utterances played in reverse and without auditory stimulation. We used a 24-channel optical topography device to assess changes in the concentration of total hemoglobin in response to auditory stimulation in 12 areas of the right hemisphere and 12 areas of the LH. We found that LH temporal areas showed significantly more activation when infants were exposed to normal speech than to backward speech or silence. We conclude that neonates are born with an LH superiority to process specific properties of speech.
Functional relevance of interindividual differences in temporal lobe callosal pathways: a DTI tractography study.

PubMed

Westerhausen, René; Grüner, Renate; Specht, Karsten; Hugdahl, Kenneth

2009-06-01

The midsagittal corpus callosum is topographically organized, that is, with regard to their cortical origin several subtracts can be distinguished within the corpus callosum that belong to specific functional brain networks. Recent diffusion tensor tractography studies have also revealed remarkable interindividual differences in the size and exact localization of these tracts. To examine the functional relevance of interindividual variability in callosal tracts, 17 right-handed male participants underwent structural and diffusion tensor magnetic resonance imaging. Probabilistic tractography was carried out to identify the callosal subregions that interconnect left and right temporal lobe auditory processing areas, and the midsagittal size of this tract was seen as indicator of the (anatomical) strength of this connection. Auditory information transfer was assessed applying an auditory speech perception task with dichotic presentations of consonant-vowel syllables (e.g., /ba-ga/). The frequency of correct left ear reports in this task served as a functional measure of interhemispheric transfer. Statistical analysis showed that a stronger anatomical connection between the superior temporal lobe areas supports a better information transfer. This specific structure-function association in the auditory modality supports the general notion that interindividual differences in callosal topography possess functional relevance.
Structural connectivity of right frontal hyperactive areas scales with stuttering severity

PubMed Central

Neef, Nicole E; Bütfering, Christoph; Schmidt-Samoa, Carsten; Friederici, Angela D; Paulus, Walter; Sommer, Martin

2018-01-01

Abstract A neuronal sign of persistent developmental stuttering is the magnified coactivation of right frontal brain regions during speech production. Whether and how stuttering severity relates to the connection strength of these hyperactive right frontal areas to other brain areas is an open question. Scrutinizing such brain–behaviour and structure–function relationships aims at disentangling suspected underlying neuronal mechanisms of stuttering. Here, we acquired diffusion-weighted and functional images from 31 adults who stutter and 34 matched control participants. Using a newly developed structural connectivity measure, we calculated voxel-wise correlations between connection strength and stuttering severity within tract volumes that originated from functionally hyperactive right frontal regions. Correlation analyses revealed that with increasing speech motor deficits the connection strength increased in the right frontal aslant tract, the right anterior thalamic radiation, and in U-shaped projections underneath the right precentral sulcus. In contrast, with decreasing speech motor deficits connection strength increased in the right uncinate fasciculus. Additional group comparisons of whole-brain white matter skeletons replicated the previously reported reduction of fractional anisotropy in the left and right superior longitudinal fasciculus as well as at the junction of right frontal aslant tract and right superior longitudinal fasciculus in adults who stutter compared to control participants. Overall, our investigation suggests that right fronto-temporal networks play a compensatory role as a fluency enhancing mechanism. In contrast, the increased connection strength within subcortical-cortical pathways may be implied in an overly active global response suppression mechanism in stuttering. Altogether, this combined functional MRI–diffusion tensor imaging study disentangles different networks involved in the neuronal underpinnings of the speech motor deficit in persistent developmental stuttering. PMID:29228195
Mismatch Negativity with Visual-only and Audiovisual Speech

PubMed Central

Ponton, Curtis W.; Bernstein, Lynne E.; Auer, Edward T.

2009-01-01

The functional organization of cortical speech processing is thought to be hierarchical, increasing in complexity and proceeding from primary sensory areas centrifugally. The current study used the mismatch negativity (MMN) obtained with electrophysiology (EEG) to investigate the early latency period of visual speech processing under both visual-only (VO) and audiovisual (AV) conditions. Current density reconstruction (CDR) methods were used to model the cortical MMN generator locations. MMNs were obtained with VO and AV speech stimuli at early latencies (approximately 82-87 ms peak in time waveforms relative to the acoustic onset) and in regions of the right lateral temporal and parietal cortices. Latencies were consistent with bottom-up processing of the visible stimuli. We suggest that a visual pathway extracts phonetic cues from visible speech, and that previously reported effects of AV speech in classical early auditory areas, given later reported latencies, could be attributable to modulatory feedback from visual phonetic processing. PMID:19404730
Clustered functional MRI of overt speech production.

PubMed

Sörös, Peter; Sokoloff, Lisa Guttman; Bose, Arpita; McIntosh, Anthony R; Graham, Simon J; Stuss, Donald T

2006-08-01

To investigate the neural network of overt speech production, event-related fMRI was performed in 9 young healthy adult volunteers. A clustered image acquisition technique was chosen to minimize speech-related movement artifacts. Functional images were acquired during the production of oral movements and of speech of increasing complexity (isolated vowel as well as monosyllabic and trisyllabic utterances). This imaging technique and behavioral task enabled depiction of the articulo-phonologic network of speech production from the supplementary motor area at the cranial end to the red nucleus at the caudal end. Speaking a single vowel and performing simple oral movements involved very similar activation of the cortical and subcortical motor systems. More complex, polysyllabic utterances were associated with additional activation in the bilateral cerebellum, reflecting increased demand on speech motor control, and additional activation in the bilateral temporal cortex, reflecting the stronger involvement of phonologic processing.
Comparison of the Spectral-Temporally Modulated Ripple Test With the Arizona Biomedical Institute Sentence Test in Cochlear Implant Users.

PubMed

Lawler, Marshall; Yu, Jeffrey; Aronoff, Justin M

Although speech perception is the gold standard for measuring cochlear implant (CI) users' performance, speech perception tests often require extensive adaptation to obtain accurate results, particularly after large changes in maps. Spectral ripple tests, which measure spectral resolution, are an alternate measure that has been shown to correlate with speech perception. A modified spectral ripple test, the spectral-temporally modulated ripple test (SMRT) has recently been developed, and the objective of this study was to compare speech perception and performance on the SMRT for a heterogeneous population of unilateral CI users, bilateral CI users, and bimodal users. Twenty-five CI users (eight using unilateral CIs, nine using bilateral CIs, and eight using a CI and a hearing aid) were tested on the Arizona Biomedical Institute Sentence Test (AzBio) with a +8 dB signal to noise ratio, and on the SMRT. All participants were tested with their clinical programs. There was a significant correlation between SMRT and AzBio performance. After a practice block, an improvement of one ripple per octave for SMRT corresponded to an improvement of 12.1% for AzBio. Additionally, there was no significant difference in slope or intercept between any of the CI populations. The results indicate that performance on the SMRT correlates with speech recognition in noise when measured across unilateral, bilateral, and bimodal CI populations. These results suggest that SMRT scores are strongly associated with speech recognition in noise ability in experienced CI users. Further studies should focus on increasing both the size and diversity of the tested participants, and on determining whether the SMRT technique can be used for early predictions of long-term speech scores, or for evaluating differences among different stimulation strategies or parameter settings.
Investigation of habitual pitch during free play activities for preschool-aged children.

PubMed

Chen, Yang; Kimelman, Mikael D Z; Micco, Katie

2009-01-01

This study is designed to compare the habitual pitch measured in two different speech activities (free play activity and traditionally used structured speech activity) for normally developing preschool-aged children to explore to what extent preschoolers vary their vocal pitch among different speech environments. Habitual pitch measurements were conducted for 10 normally developing children (2 boys, 8 girls) between the ages of 31 months and 71 months during two different activities: (1) free play; and (2) structured speech. Speech samples were recorded using a throat microphone connected with a wireless transmitter in both activities. The habitual pitch (in Hz) was measured for all collected speech samples by using voice analysis software (Real-Time Pitch). Significantly higher habitual pitch is found during free play in contrast to structured speech activities. In addition, there is no showing of significant difference of habitual pitch elicited across a variety of structured speech activities. Findings suggest that the vocal usage of preschoolers appears to be more effortful during free play than during structured activities. It is recommended that a comprehensive evaluation for young children's voice needs to be based on the speech/voice samples collected from both free play and structured activities.
Atypical neural synchronization to speech envelope modulations in dyslexia.

PubMed

De Vos, Astrid; Vanvooren, Sophie; Vanderauwera, Jolijn; Ghesquière, Pol; Wouters, Jan

2017-01-01

A fundamental deficit in the synchronization of neural oscillations to temporal information in speech could underlie phonological processing problems in dyslexia. In this study, the hypothesis of a neural synchronization impairment is investigated more specifically as a function of different neural oscillatory bands and temporal information rates in speech. Auditory steady-state responses to 4, 10, 20 and 40Hz modulations were recorded in normal reading and dyslexic adolescents to measure neural synchronization of theta, alpha, beta and low-gamma oscillations to syllabic and phonemic rate information. In comparison to normal readers, dyslexic readers showed reduced non-synchronized theta activity, reduced synchronized alpha activity and enhanced synchronized beta activity. Positive correlations between alpha synchronization and phonological skills were found in normal readers, but were absent in dyslexic readers. In contrast, dyslexic readers exhibited positive correlations between beta synchronization and phonological skills. Together, these results suggest that auditory neural synchronization of alpha and beta oscillations is atypical in dyslexia, indicating deviant neural processing of both syllabic and phonemic rate information. Impaired synchronization of alpha oscillations in particular demonstrated to be the most prominent neural anomaly possibly hampering speech and phonological processing in dyslexic readers. Copyright © 2016 Elsevier Inc. All rights reserved.
Role of contextual cues on the perception of spectrally reduced interrupted speech.

PubMed

Patro, Chhayakanta; Mendel, Lisa Lucks

2016-08-01

Understanding speech within an auditory scene is constantly challenged by interfering noise in suboptimal listening environments when noise hinders the continuity of the speech stream. In such instances, a typical auditory-cognitive system perceptually integrates available speech information and "fills in" missing information in the light of semantic context. However, individuals with cochlear implants (CIs) find it difficult and effortful to understand interrupted speech compared to their normal hearing counterparts. This inefficiency in perceptual integration of speech could be attributed to further degradations in the spectral-temporal domain imposed by CIs making it difficult to utilize the contextual evidence effectively. To address these issues, 20 normal hearing adults listened to speech that was spectrally reduced and spectrally reduced interrupted in a manner similar to CI processing. The Revised Speech Perception in Noise test, which includes contextually rich and contextually poor sentences, was used to evaluate the influence of semantic context on speech perception. Results indicated that listeners benefited more from semantic context when they listened to spectrally reduced speech alone. For the spectrally reduced interrupted speech, contextual information was not as helpful under significant spectral reductions, but became beneficial as the spectral resolution improved. These results suggest top-down processing facilitates speech perception up to a point, and it fails to facilitate speech understanding when the speech signals are significantly degraded.
Direct cortical stimulation of inferior frontal cortex disrupts both speech and music production in highly trained musicians.

PubMed

Leonard, Matthew K; Desai, Maansi; Hungate, Dylan; Cai, Ruofan; Singhal, Nilika S; Knowlton, Robert C; Chang, Edward F

2018-05-22

Music and speech are human-specific behaviours that share numerous properties, including the fine motor skills required to produce them. Given these similarities, previous work has suggested that music and speech may at least partially share neural substrates. To date, much of this work has focused on perception, and has not investigated the neural basis of production, particularly in trained musicians. Here, we report two rare cases of musicians undergoing neurosurgical procedures, where it was possible to directly stimulate the left hemisphere cortex during speech and piano/guitar music production tasks. We found that stimulation to left inferior frontal cortex, including pars opercularis and ventral pre-central gyrus, caused slowing and arrest for both speech and music, and note sequence errors for music. Stimulation to posterior superior temporal cortex only caused production errors during speech. These results demonstrate partially dissociable networks underlying speech and music production, with a shared substrate in frontal regions.
Evaluation of the comprehension of noncontinuous sped-up vocoded speech - A strategy for coping with fading HF channels

NASA Astrophysics Data System (ADS)

Lynch, John T.

1987-02-01

The present technique for coping with fading and burst noise on HF channels used in digital voice communications transmits digital voice only during high S/N time intervals, and speeds up the speech when necessary to avoid conversation-hindering delays. On the basis of informal listening tests, four test conditions were selected in order to characterize those conditions of speech interruption which would render it comprehensible or incomprehensible. One of the test conditions, 2 secs on and 1/2-sec off, yielded test scores comparable to the reference continuous speech case and is a reasonable match to the temporal variations of a disturbed ionosphere.
Left-Dominant Temporal-Frontal Hypercoupling in Schizophrenia Patients With Hallucinations During Speech Perception

PubMed Central

Lavigne, Katie M.; Rapin, Lucile A.; Metzak, Paul D.; Whitman, Jennifer C.; Jung, Kwanghee; Dohen, Marion; Lœvenbruck, Hélène; Woodward, Todd S.

2015-01-01

Background: Task-based functional neuroimaging studies of schizophrenia have not yet replicated the increased coordinated hyperactivity in speech-related brain regions that is reported with symptom-capture and resting-state studies of hallucinations. This may be due to suboptimal selection of cognitive tasks. Methods: In the current study, we used a task that allowed experimental manipulation of control over verbal material and compared brain activity between 23 schizophrenia patients (10 hallucinators, 13 nonhallucinators), 22 psychiatric (bipolar), and 27 healthy controls. Two conditions were presented, one involving inner verbal thought (in which control over verbal material was required) and another involving speech perception (SP; in which control verbal material was not required). Results: A functional connectivity analysis resulted in a left-dominant temporal-frontal network that included speech-related auditory and motor regions and showed hypercoupling in past-week hallucinating schizophrenia patients (relative to nonhallucinating patients) during SP only. Conclusions: These findings replicate our previous work showing generalized speech-related functional network hypercoupling in schizophrenia during inner verbal thought and SP, but extend them by suggesting that hypercoupling is related to past-week hallucination severity scores during SP only, when control over verbal material is not required. This result opens the possibility that practicing control over inner verbal thought processes may decrease the likelihood or severity of hallucinations. PMID:24553150
Conflict monitoring in speech processing: An fMRI study of error detection in speech production and perception.

PubMed

Gauvin, Hanna S; De Baene, Wouter; Brass, Marcel; Hartsuiker, Robert J

2016-02-01

To minimize the number of errors in speech, and thereby facilitate communication, speech is monitored before articulation. It is, however, unclear at which level during speech production monitoring takes place, and what mechanisms are used to detect and correct errors. The present study investigated whether internal verbal monitoring takes place through the speech perception system, as proposed by perception-based theories of speech monitoring, or whether mechanisms independent of perception are applied, as proposed by production-based theories of speech monitoring. With the use of fMRI during a tongue twister task we observed that error detection in internal speech during noise-masked overt speech production and error detection in speech perception both recruit the same neural network, which includes pre-supplementary motor area (pre-SMA), dorsal anterior cingulate cortex (dACC), anterior insula (AI), and inferior frontal gyrus (IFG). Although production and perception recruit similar areas, as proposed by perception-based accounts, we did not find activation in superior temporal areas (which are typically associated with speech perception) during internal speech monitoring in speech production as hypothesized by these accounts. On the contrary, results are highly compatible with a domain general approach to speech monitoring, by which internal speech monitoring takes place through detection of conflict between response options, which is subsequently resolved by a domain general executive center (e.g., the ACC). Copyright © 2015 Elsevier Inc. All rights reserved.

Production and Perception of Temporal Patterns in Native and Non-Native Speech

PubMed Central

Bent, Tessa; Bradlow, Ann R.; Smith, Bruce L.

2012-01-01

Two experiments examined production and perception of English temporal patterns by native and non-native participants. Experiment 1 indicated that native and non-native (L1 = Chinese) talkers differed significantly in their production of one English duration pattern (i.e., vowel lengthening before voiced versus voiceless consonants) but not another (i.e., tense versus lax vowels). Experiment 2 tested native and non-native listener identification of words that differed in voicing of the final consonant by the native and non-native talkers whose productions were substantially different in experiment 1. Results indicated that differences in native and non-native intelligibility may be partially explained by temporal pattern differences in vowel duration although other cues such as presence of stop releases and burst duration may also contribute. Additionally, speech intelligibility depends on shared phonetic knowledge between talkers and listeners rather than only on accuracy relative to idealized production norms. PMID:18679042
The right hemisphere's contribution to discourse processing: A study in temporal lobe epilepsy.

PubMed

Lomlomdjian, Carolina; Múnera, Claudia P; Low, Daniel M; Terpiluk, Verónica; Solís, Patricia; Abusamra, Valeria; Kochen, Silvia

2017-08-01

Discourse skills - in which the right hemisphere has an important role - enables verbal communication by selecting contextually relevant information and integrating it coherently to infer the correct meaning. However, language research in epilepsy has focused on single word analysis related mainly to left hemisphere processing. The purpose of this study was to investigate discourse abilities in patients with right lateralized medial temporal lobe epilepsy (RTLE) by comparing their performance to that of patients with left temporal lobe epilepsy (LTLE). 74 pharmacoresistant temporal lobe epilepsy (TLE) patients were evaluated: 34 with RTLE and 40 with LTLE. Subjects underwent a battery of tests that measure comprehension and production of conversational and narrative discourse. Disease related variables and general neuropsychological data were evaluated. The RTLE group presented deficits in interictal conversational and narrative discourse, with a disintegrated speech, lack of categorization and misinterpretation of social meaning. LTLE group, on the other hand, showed a tendency to lower performance in logical-temporal sequencing. RTLE patients showed discourse deficits which have been described in right hemisphere damaged patients due to other etiologies. Medial and anterior temporal lobe structures appear to link semantic, world knowledge, and social cognition associated areas to construct a contextually related coherent meaning. Copyright © 2017 Elsevier Inc. All rights reserved.
Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment

PubMed Central

PONS, FERRAN; ANDREU, LLORENC.; SANZ-TORRENT, MONICA; BUIL-LEGAZ, LUCIA; LEWKOWICZ, DAVID J.

2014-01-01

Speech perception involves the integration of auditory and visual articulatory information and, thus, requires the perception of temporal synchrony between this information. There is evidence that children with Specific Language Impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component followed the visual component. None of the groups perceived an audiovisual asynchrony of 366ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception. PMID:22874648
Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment.

PubMed

Pons, Ferran; Andreu, Llorenç; Sanz-Torrent, Monica; Buil-Legaz, Lucía; Lewkowicz, David J

2013-06-01

Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666 ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component preceded [corrected] the visual component. None of the groups perceived an audiovisual asynchrony of 366 ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception.
Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences

PubMed Central

Brand, Thomas

2018-01-01

In studies investigating binaural processing in human listeners, relatively long and task-dependent time constants of a binaural window ranging from 10 ms to 250 ms have been observed. Such time constants are often thought to reflect “binaural sluggishness.” In this study, the effect of binaural sluggishness on binaural unmasking of speech in stationary speech-shaped noise is investigated in 10 listeners with normal hearing. In order to design a masking signal with temporally varying binaural cues, the interaural phase difference of the noise was modulated sinusoidally with frequencies ranging from 0.25 Hz to 64 Hz. The lowest, that is the best, speech reception thresholds (SRTs) were observed for the lowest modulation frequency. SRTs increased with increasing modulation frequency up to 4 Hz. For higher modulation frequencies, SRTs remained constant in the range of 1 dB to 1.5 dB below the SRT determined in the diotic situation. The outcome of the experiment was simulated using a short-term binaural speech intelligibility model, which combines an equalization–cancellation (EC) model with the speech intelligibility index. This model segments the incoming signal into 23.2-ms time frames in order to predict release from masking in modulated noises. In order to predict the results from this study, the model required a further time constant applied to the EC mechanism representing binaural sluggishness. The best agreement with perceptual data was achieved using a temporal window of 200 ms in the EC mechanism. PMID:29338577
Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences.

PubMed

Hauth, Christopher F; Brand, Thomas

2018-01-01

In studies investigating binaural processing in human listeners, relatively long and task-dependent time constants of a binaural window ranging from 10 ms to 250 ms have been observed. Such time constants are often thought to reflect "binaural sluggishness." In this study, the effect of binaural sluggishness on binaural unmasking of speech in stationary speech-shaped noise is investigated in 10 listeners with normal hearing. In order to design a masking signal with temporally varying binaural cues, the interaural phase difference of the noise was modulated sinusoidally with frequencies ranging from 0.25 Hz to 64 Hz. The lowest, that is the best, speech reception thresholds (SRTs) were observed for the lowest modulation frequency. SRTs increased with increasing modulation frequency up to 4 Hz. For higher modulation frequencies, SRTs remained constant in the range of 1 dB to 1.5 dB below the SRT determined in the diotic situation. The outcome of the experiment was simulated using a short-term binaural speech intelligibility model, which combines an equalization-cancellation (EC) model with the speech intelligibility index. This model segments the incoming signal into 23.2-ms time frames in order to predict release from masking in modulated noises. In order to predict the results from this study, the model required a further time constant applied to the EC mechanism representing binaural sluggishness. The best agreement with perceptual data was achieved using a temporal window of 200 ms in the EC mechanism.
Scaling and universality in the human voice.

PubMed

Luque, Jordi; Luque, Bartolo; Lacasa, Lucas

2015-04-06

Speech is a distinctive complex feature of human capabilities. In order to understand the physics underlying speech production, in this work, we empirically analyse the statistics of large human speech datasets ranging several languages. We first show that during speech, the energy is unevenly released and power-law distributed, reporting a universal robust Gutenberg-Richter-like law in speech. We further show that such 'earthquakes in speech' show temporal correlations, as the interevent statistics are again power-law distributed. As this feature takes place in the intraphoneme range, we conjecture that the process responsible for this complex phenomenon is not cognitive, but it resides in the physiological (mechanical) mechanisms of speech production. Moreover, we show that these waiting time distributions are scale invariant under a renormalization group transformation, suggesting that the process of speech generation is indeed operating close to a critical point. These results are put in contrast with current paradigms in speech processing, which point towards low dimensional deterministic chaos as the origin of nonlinear traits in speech fluctuations. As these latter fluctuations are indeed the aspects that humanize synthetic speech, these findings may have an impact in future speech synthesis technologies. Results are robust and independent of the communication language or the number of speakers, pointing towards a universal pattern and yet another hint of complexity in human speech. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Infants’ brain responses to speech suggest Analysis by Synthesis

PubMed Central

Kuhl, Patricia K.; Ramírez, Rey R.; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

2014-01-01

Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners’ knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca’s area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding. PMID:25024207
Infants' brain responses to speech suggest analysis by synthesis.

PubMed

Kuhl, Patricia K; Ramírez, Rey R; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

2014-08-05

Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners' knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca's area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of "motherese" on early language learning, and (iii) the "social-gating" hypothesis and humans' development of social understanding.
Engaged listeners: shared neural processing of powerful political speeches.

PubMed

Schmälzle, Ralf; Häcker, Frank E K; Honey, Christopher J; Hasson, Uri

2015-08-01

Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners' brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Giving speech a hand: gesture modulates activity in auditory cortex during speech perception.

PubMed

Hubbard, Amy L; Wilson, Stephen M; Callan, Daniel E; Dapretto, Mirella

2009-03-01

Viewing hand gestures during face-to-face communication affects speech perception and comprehension. Despite the visible role played by gesture in social interactions, relatively little is known about how the brain integrates hand gestures with co-occurring speech. Here we used functional magnetic resonance imaging (fMRI) and an ecologically valid paradigm to investigate how beat gesture-a fundamental type of hand gesture that marks speech prosody-might impact speech perception at the neural level. Subjects underwent fMRI while listening to spontaneously-produced speech accompanied by beat gesture, nonsense hand movement, or a still body; as additional control conditions, subjects also viewed beat gesture, nonsense hand movement, or a still body all presented without speech. Validating behavioral evidence that gesture affects speech perception, bilateral nonprimary auditory cortex showed greater activity when speech was accompanied by beat gesture than when speech was presented alone. Further, the left superior temporal gyrus/sulcus showed stronger activity when speech was accompanied by beat gesture than when speech was accompanied by nonsense hand movement. Finally, the right planum temporale was identified as a putative multisensory integration site for beat gesture and speech (i.e., here activity in response to speech accompanied by beat gesture was greater than the summed responses to speech alone and beat gesture alone), indicating that this area may be pivotally involved in synthesizing the rhythmic aspects of both speech and gesture. Taken together, these findings suggest a common neural substrate for processing speech and gesture, likely reflecting their joint communicative role in social interactions.
Giving Speech a Hand: Gesture Modulates Activity in Auditory Cortex During Speech Perception

PubMed Central

Hubbard, Amy L.; Wilson, Stephen M.; Callan, Daniel E.; Dapretto, Mirella

2008-01-01

Viewing hand gestures during face-to-face communication affects speech perception and comprehension. Despite the visible role played by gesture in social interactions, relatively little is known about how the brain integrates hand gestures with co-occurring speech. Here we used functional magnetic resonance imaging (fMRI) and an ecologically valid paradigm to investigate how beat gesture – a fundamental type of hand gesture that marks speech prosody – might impact speech perception at the neural level. Subjects underwent fMRI while listening to spontaneously-produced speech accompanied by beat gesture, nonsense hand movement, or a still body; as additional control conditions, subjects also viewed beat gesture, nonsense hand movement, or a still body all presented without speech. Validating behavioral evidence that gesture affects speech perception, bilateral nonprimary auditory cortex showed greater activity when speech was accompanied by beat gesture than when speech was presented alone. Further, the left superior temporal gyrus/sulcus showed stronger activity when speech was accompanied by beat gesture than when speech was accompanied by nonsense hand movement. Finally, the right planum temporale was identified as a putative multisensory integration site for beat gesture and speech (i.e., here activity in response to speech accompanied by beat gesture was greater than the summed responses to speech alone and beat gesture alone), indicating that this area may be pivotally involved in synthesizing the rhythmic aspects of both speech and gesture. Taken together, these findings suggest a common neural substrate for processing speech and gesture, likely reflecting their joint communicative role in social interactions. PMID:18412134
Atypical right hemisphere response to slow temporal modulations in children with developmental dyslexia.

PubMed

Cutini, Simone; Szűcs, Dénes; Mead, Natasha; Huss, Martina; Goswami, Usha

2016-12-01

Phase entrainment of neuronal oscillations is thought to play a central role in encoding speech. Children with developmental dyslexia show impaired phonological processing of speech, proposed theoretically to be related to atypical phase entrainment to slower temporal modulations in speech (<10Hz). While studies of children with dyslexia have found atypical phase entrainment in the delta band (~2Hz), some studies of adults with developmental dyslexia have shown impaired entrainment in the low gamma band (~35-50Hz). Meanwhile, studies of neurotypical adults suggest asymmetric temporal sensitivity in auditory cortex, with preferential processing of slower modulations by right auditory cortex, and faster modulations processed bilaterally. Here we compared neural entrainment to slow (2Hz) versus faster (40Hz) amplitude-modulated noise using fNIRS to study possible hemispheric asymmetry effects in children with developmental dyslexia. We predicted atypical right hemisphere responding to 2Hz modulations for the children with dyslexia in comparison to control children, but equivalent responding to 40Hz modulations in both hemispheres. Analyses of HbO concentration revealed a right-lateralised region focused on the supra-marginal gyrus that was more active in children with dyslexia than in control children for 2Hz stimulation. We discuss possible links to linguistic prosodic processing, and interpret the data with respect to a neural 'temporal sampling' framework for conceptualizing the phonological deficits that characterise children with developmental dyslexia across languages. Copyright Â© 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Cerebral bases of subliminal speech priming.

PubMed

Kouider, Sid; de Gardelle, Vincent; Dehaene, Stanislas; Dupoux, Emmanuel; Pallier, Christophe

2010-01-01

While the neural correlates of unconscious perception and subliminal priming have been largely studied for visual stimuli, little is known about their counterparts in the auditory modality. Here we used a subliminal speech priming method in combination with fMRI to investigate which regions of the cerebral network for language can respond in the absence of awareness. Participants performed a lexical decision task on target items preceded by subliminal primes, which were either phonetically identical or different from the target. Moreover, the prime and target could be spoken by the same speaker or by two different speakers. Word repetition reduced the activity in the insula and in the left superior temporal gyrus. Although the priming effect on reaction times was independent of voice manipulation, neural repetition suppression was modulated by speaker change in the superior temporal gyrus while the insula showed voice-independent priming. These results provide neuroimaging evidence of subliminal priming for spoken words and inform us on the first, unconscious stages of speech perception.
Auditory temporal processing in healthy aging: a magnetoencephalographic study

PubMed Central

Sörös, Peter; Teismann, Inga K; Manemann, Elisabeth; Lütkenhöner, Bernd

2009-01-01

Background Impaired speech perception is one of the major sequelae of aging. In addition to peripheral hearing loss, central deficits of auditory processing are supposed to contribute to the deterioration of speech perception in older individuals. To test the hypothesis that auditory temporal processing is compromised in aging, auditory evoked magnetic fields were recorded during stimulation with sequences of 4 rapidly recurring speech sounds in 28 healthy individuals aged 20 – 78 years. Results The decrement of the N1m amplitude during rapid auditory stimulation was not significantly different between older and younger adults. The amplitudes of the middle-latency P1m wave and of the long-latency N1m, however, were significantly larger in older than in younger participants. Conclusion The results of the present study do not provide evidence for the hypothesis that auditory temporal processing, as measured by the decrement (short-term habituation) of the major auditory evoked component, the N1m wave, is impaired in aging. The differences between these magnetoencephalographic findings and previously published behavioral data might be explained by differences in the experimental setting between the present study and previous behavioral studies, in terms of speech rate, attention, and masking noise. Significantly larger amplitudes of the P1m and N1m waves suggest that the cortical processing of individual sounds differs between younger and older individuals. This result adds to the growing evidence that brain functions, such as sensory processing, motor control and cognitive processing, can change during healthy aging, presumably due to experience-dependent neuroplastic mechanisms. PMID:19351410
Effect of Energy Equalization on the Intelligibility of Speech in Fluctuating Background Interference for Listeners With Hearing Impairment

PubMed Central

D’Aquila, Laura A.; Desloge, Joseph G.; Braida, Louis D.

2017-01-01

The masking release (MR; i.e., better speech recognition in fluctuating compared with continuous noise backgrounds) that is evident for listeners with normal hearing (NH) is generally reduced or absent for listeners with sensorineural hearing impairment (HI). In this study, a real-time signal-processing technique was developed to improve MR in listeners with HI and offer insight into the mechanisms influencing the size of MR. This technique compares short-term and long-term estimates of energy, increases the level of short-term segments whose energy is below the average energy, and normalizes the overall energy of the processed signal to be equivalent to that of the original long-term estimate. This signal-processing algorithm was used to create two types of energy-equalized (EEQ) signals: EEQ1, which operated on the wideband speech plus noise signal, and EEQ4, which operated independently on each of four bands with equal logarithmic width. Consonant identification was tested in backgrounds of continuous and various types of fluctuating speech-shaped Gaussian noise including those with both regularly and irregularly spaced temporal fluctuations. Listeners with HI achieved similar scores for EEQ and the original (unprocessed) stimuli in continuous-noise backgrounds, while superior performance was obtained for the EEQ signals in fluctuating background noises that had regular temporal gaps but not for those with irregularly spaced fluctuations. Thus, in noise backgrounds with regularly spaced temporal fluctuations, the energy-normalized signals led to larger values of MR and higher intelligibility than obtained with unprocessed signals. PMID:28602128
Inferior colliculus contributions to phase encoding of stop consonants in an animal model

PubMed Central

Warrier, Catherine M; Abrams, Daniel A; Nicol, Trent G; Kraus, Nina

2011-01-01

The human auditory brainstem is known to be exquisitely sensitive to fine-grained spectro-temporal differences between speech sound contrasts, and the ability of the brainstem to discriminate between these contrasts is important for speech perception. Recent work has described a novel method for translating brainstem timing differences in response to speech contrasts into frequency-specific phase differentials. Results from this method have shown that the human brainstem response is surprisingly sensitive to phase-differences inherent to the stimuli across a wide extent of the spectrum. Here we use an animal model of the auditory brainstem to examine whether the stimulus-specific phase signatures measured in human brainstem responses represent an epiphenomenon associated with far field (i.e., scalp-recorded) measurement of neural activity, or alternatively whether these specific activity patterns are also evident in auditory nuclei that contribute to the scalp-recorded response, thereby representing a more fundamental temporal processing phenomenon. Responses in anaesthetized guinea pigs to three minimally-contrasting consonant-vowel stimuli were collected simultaneously from the cortical surface vertex and directly from central nucleus of the inferior colliculus (ICc), measuring volume conducted neural activity and multiunit, near-field activity, respectively. Guinea pig surface responses were similar to human scalp-recorded responses to identical stimuli in gross morphology as well as phase characteristics. Moreover, surface recorded potentials shared many phase characteristics with near-field ICc activity. Response phase differences were prominent during formant transition periods, reflecting spectro-temporal differences between syllables, and showed more subtle differences during the identical steady-state periods. ICc encoded stimulus distinctions over a broader frequency range, with differences apparent in the highest frequency ranges analyzed, up to 3000 Hz. Based on the similarity of phase encoding across sites, and the consistency and sensitivity of response phase measured within ICc, results suggest that a general property of the auditory system is a high degree of sensitivity to fine-grained phase information inherent to complex acoustical stimuli. Furthermore, results suggest that temporal encoding in ICc contributes to temporal features measured in speech-evoked scalp-recorded responses. PMID:21945200
Pathological speech signal analysis and classification using empirical mode decomposition.

PubMed

Kaleem, Muhammad; Ghoraani, Behnaz; Guergachi, Aziz; Krishnan, Sridhar

2013-07-01

Automated classification of normal and pathological speech signals can provide an objective and accurate mechanism for pathological speech diagnosis, and is an active area of research. A large part of this research is based on analysis of acoustic measures extracted from sustained vowels. However, sustained vowels do not reflect real-world attributes of voice as effectively as continuous speech, which can take into account important attributes of speech such as rapid voice onset and termination, changes in voice frequency and amplitude, and sudden discontinuities in speech. This paper presents a methodology based on empirical mode decomposition (EMD) for classification of continuous normal and pathological speech signals obtained from a well-known database. EMD is used to decompose randomly chosen portions of speech signals into intrinsic mode functions, which are then analyzed to extract meaningful temporal and spectral features, including true instantaneous features which can capture discriminative information in signals hidden at local time-scales. A total of six features are extracted, and a linear classifier is used with the feature vector to classify continuous speech portions obtained from a database consisting of 51 normal and 161 pathological speakers. A classification accuracy of 95.7 % is obtained, thus demonstrating the effectiveness of the methodology.
Deconvolution of magnetic acoustic change complex (mACC).

PubMed

Bardy, Fabrice; McMahon, Catherine M; Yau, Shu Hui; Johnson, Blake W

2014-11-01

The aim of this study was to design a novel experimental approach to investigate the morphological characteristics of auditory cortical responses elicited by rapidly changing synthesized speech sounds. Six sound-evoked magnetoencephalographic (MEG) responses were measured to a synthesized train of speech sounds using the vowels /e/ and /u/ in 17 normal hearing young adults. Responses were measured to: (i) the onset of the speech train, (ii) an F0 increment; (iii) an F0 decrement; (iv) an F2 decrement; (v) an F2 increment; and (vi) the offset of the speech train using short (jittered around 135ms) and long (1500ms) stimulus onset asynchronies (SOAs). The least squares (LS) deconvolution technique was used to disentangle the overlapping MEG responses in the short SOA condition only. Comparison between the morphology of the recovered cortical responses in the short and long SOAs conditions showed high similarity, suggesting that the LS deconvolution technique was successful in disentangling the MEG waveforms. Waveform latencies and amplitudes were different for the two SOAs conditions and were influenced by the spectro-temporal properties of the sound sequence. The magnetic acoustic change complex (mACC) for the short SOA condition showed significantly lower amplitudes and shorter latencies compared to the long SOA condition. The F0 transition showed a larger reduction in amplitude from long to short SOA compared to the F2 transition. Lateralization of the cortical responses were observed under some stimulus conditions and appeared to be associated with the spectro-temporal properties of the acoustic stimulus. The LS deconvolution technique provides a new tool to study the properties of the auditory cortical response to rapidly changing sound stimuli. The presence of the cortical auditory evoked responses for rapid transition of synthesized speech stimuli suggests that the temporal code is preserved at the level of the auditory cortex. Further, the reduced amplitudes and shorter latencies might reflect intrinsic properties of the cortical neurons to rapidly presented sounds. This is the first demonstration of the separation of overlapping cortical responses to rapidly changing speech sounds and offers a potential new biomarker of discrimination of rapid transition of sound. Crown Copyright © 2014. Published by Elsevier Ireland Ltd. All rights reserved.
Cerebellar contributions to motor timing: a PET study of auditory and visual rhythm reproduction.

PubMed

Penhune, V B; Zattore, R J; Evans, A C

1998-11-01

The perception and production of temporal patterns, or rhythms, is important for both music and speech. However, the way in which the human brain achieves accurate timing of perceptual input and motor output is as yet little understood. Central control of both motor timing and perceptual timing across modalities has been linked to both the cerebellum and the basal ganglia (BG). The present study was designed to test the hypothesized central control of temporal processing and to examine the roles of the cerebellum, BG, and sensory association areas. In this positron emission tomography (PET) activation paradigm, subjects reproduced rhythms of increasing temporal complexity that were presented separately in the auditory and visual modalities. The results provide support for a supramodal contribution of the lateral cerebellar cortex and cerebellar vermis to the production of a timed motor response, particularly when it is complex and/or novel. The results also give partial support to the involvement of BG structures in motor timing, although this may be more directly related to implementation of the motor response than to timing per se. Finally, sensory association areas and the ventrolateral frontal cortex were found to be involved in modality-specific encoding and retrieval of the temporal stimuli. Taken together, these results point to the participation of a number of neural structures in the production of a timed motor response from an external stimulus. The role of the cerebellum in timing is conceptualized not as a clock or counter but simply as the structure that provides the necessary circuitry for the sensory system to extract temporal information and for the motor system to learn to produce a precisely timed response.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.