Holt, Lori L.; Lotto, Andrew J.
Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition. PMID:20601702
Başkent, Deniz; Gaudrain, Etienne
Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level auditory cognitive functions, such as attention. Indeed, despite the few non-musicians who performed as well as musicians, on a group level, there was a strong musician benefit for speech perception in a speech masker. This benefit does not seem to result from better voice processing and could instead be related to better stream segregation or enhanced cognitive functions.
Preston, Jonathan L.; Irwin, Julia R.; Turcios, Jacqueline
Children with speech sound disorders may perceive speech differently than children with typical speech development. The nature of these speech differences is reviewed with an emphasis on assessing phoneme-specific perception for speech sounds that are produced in error. Category goodness judgment, or the ability to judge accurate and inaccurate tokens of speech sounds, plays an important role in phonological development. The software Speech Assessment and Interactive Learning System (Rvachew, 1994), which has been effectively used to assess preschoolers’ ability to perform goodness judgments, is explored for school-age children with residual speech errors (RSE). However, data suggest that this particular task may not be sensitive to perceptual differences in school-age children. The need for the development of clinical tools for assessment of speech perception in school-age children with RSE is highlighted, and clinical suggestions are provided. PMID:26458198
Preston, Jonathan L; Irwin, Julia R; Turcios, Jacqueline
Children with speech sound disorders may perceive speech differently than children with typical speech development. The nature of these speech differences is reviewed with an emphasis on assessing phoneme-specific perception for speech sounds that are produced in error. Category goodness judgment, or the ability to judge accurate and inaccurate tokens of speech sounds, plays an important role in phonological development. The software Speech Assessment and Interactive Learning System, which has been effectively used to assess preschoolers' ability to perform goodness judgments, is explored for school-aged children with residual speech errors (RSEs). However, data suggest that this particular task may not be sensitive to perceptual differences in school-aged children. The need for the development of clinical tools for assessment of speech perception in school-aged children with RSE is highlighted, and clinical suggestions are provided.
McGowan, Kevin B
Listeners' use of social information during speech perception was investigated by measuring transcription accuracy of Chinese-accented speech in noise while listeners were presented with a congruent Chinese face, an incongruent Caucasian face, or an uninformative silhouette. When listeners were presented with a Chinese face they transcribed more accurately than when presented with the Caucasian face. This difference existed both for listeners with a relatively high level of experience and for listeners with a relatively low level of experience with Chinese-accented English. Overall, these results are inconsistent with a model of social speech perception in which listener bias reduces attendance to the acoustic signal. These results are generally consistent with exemplar models of socially indexed speech perception predicting that activation of a social category will raise base activation levels of socially appropriate episodic traces, but the similar performance of more and less experienced listeners suggests the need for a more nuanced view with a role for both detailed experience and listener stereotypes.
Gick, Bryan; Derrick, Donald
Visual information from a speaker’s face can enhance1 or interfere with2 accurate auditory perception. This integration of information across auditory and visual streams has been observed in functional imaging studies3,4, and has typically been attributed to the frequency and robustness with which perceivers jointly encounter event-specific information from these two modalities5. Adding the tactile modality has long been considered a crucial next step in understanding multisensory integration. However, previous studies have found an influence of tactile input on speech perception only under limited circumstances, either where perceivers were aware of the task6,7 or where they had received training to establish a cross-modal mapping8–10. Here we show that perceivers integrate naturalistic tactile information during auditory speech perception without previous training. Drawing on the observation that some speech sounds produce tiny bursts of aspiration (such as English ‘p’)11, we applied slight, inaudible air puffs on participants’ skin at one of two locations: the right hand or the neck. Syllables heard simultaneously with cutaneous air puffs were more likely to be heard as aspirated (for example, causing participants to mishear ‘b’ as ‘p’). These results demonstrate that perceivers integrate event-relevant tactile information in auditory perception in much the same way as they do visual information. PMID:19940925
McQueen, James M.; Norris, Dennis; Cutler, Anne
The speech perception system must be flexible in responding to the variability in speech sounds caused by differences among speakers and by language change over the lifespan of the listener. Indeed, listeners use lexical knowledge to retune perception of novel speech (Norris, McQueen, & Cutler, 2003). In that study, Dutch listeners made…
Vouloumanos, Athena; Gelfand, Hanna M.
The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…
contract. 0 Elar, .l... & .McC’lelland .1.1. Speech perception a, a cognitive proces,: The interactive act ia- %e., tion model of speech perception. In...attempts to provide a machine solution to the problem of speech perception. A second kind of model, growing out of Cognitive Psychology, attempts to...architectures to cognitive and perceptual problems. We also owe a debt to what we might call the computational connectionists -- those who have applied highly
Strömbergsson, Sofia; Wengelin, Asa; House, David
We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.
Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F
The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.
Zeng, Fan-Gang; Liu, Sheng
Purpose: Speech perception in participants with auditory neuropathy (AN) was systematically studied to answer the following 2 questions: Does noise present a particular problem for people with AN: Can clear speech and cochlear implants alleviate this problem? Method: The researchers evaluated the advantage in intelligibility of clear speech over…
Turkeltaub, Peter E.; Coslett, H. Branch
Models of speech perception are in general agreement with respect to the major cortical regions involved, but lack precision with regard to localization and lateralization of processing units. To refine these models we conducted two Activation Likelihood Estimation (ALE) meta-analyses of the neuroimaging literature on sublexical speech perception.…
Bradlow, Ann R.
When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.
The perceptual boundaries between speech sounds are malleable and can shift after repeated exposure to contextual information. This shift is known as recalibration. To date, the known inducers of recalibration are lexical (including phonotactic) information, lip-read information and reading. The experiments reported here are a proof-of-effect demonstration that speech imagery can also induce recalibration.
Investigates the degree to which native speech perception is superior to non-native speech perception. Shows that language specific speech perception is a linguistic rather than an acoustic phenomenon. Discusses results in terms of early speech perception abilities, experience with oral communication, cognitive ability, alphabetic versus…
Carbonell, Kathy M.
One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.
Bernstein, Lynne E.; Liebenthal, Einat
This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611
Heald, Shannon L M; Nusbaum, Howard C
One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy.
Investigates the silent pauses in continuous speech in three genres: political speeches, political interviews, and casual interviews in order to see how the semantic-syntactic information of the message, the duration of silent pauses, and the acoustic environment of these pauses interact to produce the listener's perception of pauses. (Author/SED)
Homae, Fumitaka; Watanabe, Hama; Taga, Gentaro
Infants often pay special attention to speech sounds, and they appear to detect key features of these sounds. To investigate the neural foundation of speech perception in infants, we measured cortical activation using near-infrared spectroscopy. We presented the following three types of auditory stimuli while 3-month-old infants watched a silent…
Bruderer, Alison G.; Danielson, D. Kyle; Kandhadai, Padmapriya; Werker, Janet F.
The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception–production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants’ speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants’ tongues. With a looking-time procedure, we found that temporarily restraining infants’ articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral–motor movements influence speech sound discrimination. Moreover, an experimentally induced “impairment” in articulator movement can compromise speech perception performance, raising the question of whether long-term oral–motor impairments may impact perceptual development. PMID:26460030
Blomert, Leo; Mitterer, Holger
A number of studies reported that developmental dyslexics are impaired in speech perception, especially for speech signals consisting of rapid auditory transitions. These studies mostly made use of a categorical-perception task with synthetic-speech samples. In this study, we show that deficits in the perception of synthetic speech do not…
Serniclaes, Willy; Van Heghe, Sandra; Mousty, Philippe; Carre, Rene; Sprenger-Charolles, Liliane
Perceptual discrimination between speech sounds belonging to different phoneme categories is better than that between sounds falling within the same category. This property, known as ''categorical perception,'' is weaker in children affected by dyslexia. Categorical perception develops from the predispositions of newborns for discriminating all…
Lynch, Michael P.; And Others
Experiments using portable tactile aids in speech perception are reviewed, focusing on training studies, additive benefit studies, and device comparison studies (including the "Tactaid II,""Tactaid V,""Tacticon 1600," and "Tickle Talker"). The potential of tactual information in perception of the overall…
Postma-Nilsenová, Marie; Postma, Eric
In an experimental study, we explored the role of auditory perception bias in vocal pitch imitation. Psychoacoustic tasks involving a missing fundamental indicate that some listeners are attuned to the relationship between all the higher harmonics present in the signal, which supports their perception of the fundamental frequency (the primary acoustic correlate of pitch). Other listeners focus on the lowest harmonic constituents of the complex sound signal which may hamper the perception of the fundamental. These two listener types are referred to as fundamental and spectral listeners, respectively. We hypothesized that the individual differences in speakers' capacity to imitate F 0 found in earlier studies, may at least partly be due to the capacity to extract information about F 0 from the speech signal. Participants' auditory perception bias was determined with a standard missing fundamental perceptual test. Subsequently, speech data were collected in a shadowing task with two conditions, one with a full speech signal and one with high-pass filtered speech above 300 Hz. The results showed that perception bias toward fundamental frequency was related to the degree of F 0 imitation. The effect was stronger in the condition with high-pass filtered speech. The experimental outcomes suggest advantages for fundamental listeners in communicative situations where F 0 imitation is used as a behavioral cue. Future research needs to determine to what extent auditory perception bias may be related to other individual properties known to improve imitation, such as phonetic talent.
Anderson, Karen L.; Goldstein, Howard
Children typically learn in classroom environments that have background noise and reverberation that interfere with accurate speech perception. Amplification technology can enhance the speech perception of students who are hard of hearing. Purpose: This study used a single-subject alternating treatments design to compare the speech recognition…
Peelle, Jonathan E; Sommers, Mitchell S
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
Holt, Lori L.
Despite a long and rich history of categorization research in cognitive psychology, very little work has addressed the issue of complex auditory category formation. This is especially unfortunate because the general underlying cognitive and perceptual mechanisms that guide auditory category formation are of great importance to understanding speech perception. I will discuss a new methodological approach to examining complex auditory category formation that specifically addresses issues relevant to speech perception. This approach utilizes novel nonspeech sound stimuli to gain full experimental control over listeners' history of experience. As such, the course of learning is readily measurable. Results from this methodology indicate that the structure and formation of auditory categories are a function of the statistical input distributions of sound that listeners hear, aspects of the operating characteristics of the auditory system, and characteristics of the perceptual categorization system. These results have important implications for phonetic acquisition and speech perception.
Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian
Speech perception deficits in developmental dyslexia were investigated in quiet and various noise conditions. Dyslexics exhibited clear speech perception deficits in noise but not in silence. "Place-of-articulation" was more affected than "voicing" or "manner-of-articulation." Speech-perception-in-noise deficits persisted when performance of…
Walshe, Margaret; Miller, Nick; Leahy, Margaret; Murray, Aisling
Background: Many factors influence listener perception of dysarthric speech. Final consensus on the role of gender and listener experience is still to be reached. The speaker's perception of his/her speech has largely been ignored. Aims: (1) To compare speaker and listener perception of the intelligibility of dysarthric speech; (2) to explore the…
speech perception? For one thing, TRACE blurs the distinction between perception and other aspects of cognitive pro- cessing. There is really no...Research Ctr Redondo Beach, CA 90277 3800 E. Colfax Ave. Denver, CC 80206 1 Dr. Donald A Norman Cognitive Science, C-015 1 Dr. H. Wallace Sinaiko Univ. of...Behavioral and Social Sciences 1 Dr. Joseph L. Young, Director 5001 Eisenhower Avenue Memory & Cognitive Processes Alexandria , VA 22333 National
Geers, Ann; Brenner, Chris
This paper describes changes in speech perception performance of deaf children using cochlear implants, tactile aids, or conventional hearing aids over a three-year period. Eleven of the 13 children with cochlear implants were able to identify words on the basis of auditory consonant cues. Significant lipreading enhancement was also achieved with…
Key, Michael Parrish
This dissertation investigates how knowledge of phonological generalizations influences speech perception, with a particular focus on evidence that phonological processing is autonomous from (rather than interactive with) auditory processing. A model is proposed in which auditory cue constraints and markedness constraints interact to determine a…
GALANTUCCI, BRUNO; FOWLER, CAROL A.; TURVEY, M. T.
More than 50 years after the appearance of the motor theory of speech perception, it is timely to evaluate its three main claims that (1) speech processing is special, (2) perceiving speech is perceiving gestures, and (3) the motor system is recruited for perceiving speech. We argue that to the extent that it can be evaluated, the first claim is likely false. As for the second claim, we review findings that support it and argue that although each of these findings may be explained by alternative accounts, the claim provides a single coherent account. As for the third claim, we review findings in the literature that support it at different levels of generality and argue that the claim anticipated a theme that has become widespread in cognitive science. PMID:17048719
Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z
The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.
Blomert, Leo; Mitterer, Holger
A number of studies reported that developmental dyslexics are impaired in speech perception, especially for speech signals consisting of rapid auditory transitions. These studies mostly made use of a categorical-perception task with synthetic-speech samples. In this study, we show that deficits in the perception of synthetic speech do not generalise to the perception of more naturally sounding speech, even if the same experimental paradigm is used. This contrasts with the assumption that dyslexics are impaired in the perception of rapid auditory transitions.
Turkeltaub, Peter E; Coslett, H. Branch
Models of speech perception are in general agreement with respect to the major cortical regions involved, but lack precision with regard to localization and lateralization of processing units. To refine these models we conducted two Activation Likelihood Estimation (ALE) meta-analyses of the neuroimaging literature on sublexical speech perception. Based on foci reported in 23 fMRI experiments, we identified significant activation likelihoods in left and right superior temporal cortex and the left posterior middle frontal gyrus. Subanalyses examining phonetic and phonological processes revealed only left mid-posterior superior temporal sulcus activation likelihood. A lateralization analysis demonstrated temporal lobe left lateralization in terms of magnitude, extent, and consistency of activity. Experiments requiring explicit attention to phonology drove this lateralization. An ALE analysis of eight fMRI studies on categorical phoneme perception revealed significant activation likelihood in the left supramarginal gyrus and angular gyrus. These results are consistent with a speech processing network in which the bilateral superior temporal cortices perform acoustic analysis of speech and nonspeech auditory stimuli, the left mid-posterior superior temporal sulcus performs phonetic and phonological analysis, and the left inferior parietal lobule is involved in detection of differences between phoneme categories. These results modify current speech perception models in three ways: 1) specifying the most likely locations of dorsal stream processing units, 2) clarifying that phonetic and phonological superior temporal sulcus processing is left lateralized and localized to the mid-posterior portion, and 3) suggesting that both the supramarginal gyrus and angular gyrus may be involved in phoneme discrimination. PMID:20413149
Clement, Bart Richard
Although speech perception has been considered a predominantly auditory phenomenon, large benefits from vision in degraded acoustic conditions suggest integration of audition and vision. More direct evidence of this comes from studies of audiovisual disparity that demonstrate vision can bias and even dominate perception (McGurk & MacDonald, 1976). It has been observed that hearing-impaired listeners demonstrate more visual biasing than normally hearing listeners (Walden et al., 1990). It is argued here that stimulus audibility must be equated across groups before true differences can be established. In the present investigation, effects of visual biasing on perception were examined as audibility was degraded for 12 young normally hearing listeners. Biasing was determined by quantifying the degree to which listener identification functions for a single synthetic auditory /ba-da-ga/ continuum changed across two conditions: (1)an auditory-only listening condition; and (2)an auditory-visual condition in which every item of the continuum was synchronized with visual articulations of the consonant-vowel (CV) tokens /ba/ and /ga/, as spoken by each of two talkers. Audibility was altered by presenting the conditions in quiet and in noise at each of three signal-to- noise (S/N) ratios. For the visual-/ba/ context, large effects of audibility were found. As audibility decreased, visual biasing increased. A large talker effect also was found, with one talker eliciting more biasing than the other. An independent lipreading measure demonstrated that this talker was more visually intelligible than the other. For the visual-/ga/ context, audibility and talker effects were less robust, possibly obscured by strong listener effects, which were characterized by marked differences in perceptual processing patterns among participants. Some demonstrated substantial biasing whereas others demonstrated little, indicating a strong reliance on audition even in severely degraded acoustic
Viviani, Paolo; Figliozzi, Francesca; Lacquaniti, Francesco
Four experiments investigated the perception of visible speech. Experiment 1 addressed the perception of speech rate. Observers were shown video-clips of the lower face of actors speaking at their spontaneous rate. Then, they were shown muted versions of the video-clips, which were either accelerated or decelerated. The task (scaling) was to compare visually the speech rate of the stimulus to the spontaneous rate of the actor being shown. Rate estimates were accurate when the video-clips were shown in the normal direction (forward mode). In contrast, speech rate was underestimated when the video-clips were shown in reverse (backward mode). Experiments 2-4 (2AFC) investigated how accurately one discriminates forward and backward speech movements. Unlike in Experiment 1, observers were never exposed to the sound track of the video-clips. Performance was well above chance when playback mode was crossed with rate modulation, and the number of repetitions of the stimuli allowed some amount of speechreading to take place in forward mode (Experiment 2). In Experiment 3, speechreading was made much more difficult by using a different and larger set of muted video-clips. Yet, accuracy decreased only slightly with respect to Experiment 2. Thus, kinematic rather then speechreading cues are most important for discriminating movement direction. Performance worsened, but remained above chance level when the same stimuli of Experiment 3 were rotated upside down (Experiment 4). We argue that the results are in keeping with the hypothesis that visual perception taps into implicit motor competence. Thus, lawful instances of biological movements (forward stimuli) are processed differently from backward stimuli representing movements that the observer cannot perform.
Lu, Chunming; Long, Yuhang; Zheng, Lifen; Shi, Guang; Liu, Li; Ding, Guosheng; Howell, Peter
Speech production difficulties are apparent in people who stutter (PWS). PWS also have difficulties in speech perception compared to controls. It is unclear whether the speech perception difficulties in PWS are independent of, or related to, their speech production difficulties. To investigate this issue, functional MRI data were collected on 13 PWS and 13 controls whilst the participants performed a speech production task and a speech perception task. PWS performed poorer than controls in the perception task and the poorer performance was associated with a functional activity difference in the left anterior insula (part of the speech motor area) compared to controls. PWS also showed a functional activity difference in this and the surrounding area [left inferior frontal cortex (IFC)/anterior insula] in the production task compared to controls. Conjunction analysis showed that the functional activity differences between PWS and controls in the left IFC/anterior insula coincided across the perception and production tasks. Furthermore, Granger Causality Analysis on the resting-state fMRI data of the participants showed that the causal connection from the left IFC/anterior insula to an area in the left primary auditory cortex (Heschl's gyrus) differed significantly between PWS and controls. The strength of this connection correlated significantly with performance in the perception task. These results suggest that speech perception difficulties in PWS are associated with anomalous functional activity in the speech motor area, and the altered functional connectivity from this area to the auditory area plays a role in the speech perception difficulties of PWS.
What is the neural representation of a speech code as it evolves in time? How do listeners integrate temporally distributed phonemic information into coherent representations of syllables and words? How does the brain extract invariant properties of variable-rate speech? This talk describes a neural model that suggests answers to these questions, while quantitatively simulating speech and word recognition data. The conscious speech and word recognition code is suggested to be a resonant wave, and a percept of silence a temporal discontinuity in the rate that resonance evolves. A resonant wave emerges when sequential activation and storage of phonemic items in working memory provides bottom-up input to list chunks that group together sequences of items of variable length. The list chunks compete and winning chunks activate top-down expectations that amplify and focus attention on consistent working memory items, while suppressing inconsistent ones. The ensuing resonance boosts activation levels of selected items and chunks. Because resonance occurs after working memory activation, it can incorporate information presented after intervening silence intervals, so future sounds can influence how we hear past sounds. The model suggests that resonant dynamics enable the brain to learn quickly without suffering catastrophic forgetting, as described within Adaptive Resonance Theory.
Bilodeau-Mercure, Mylène; Lortie, Catherine L; Sato, Marc; Guitton, Matthieu J; Tremblay, Pascale
Speech perception difficulties are common among elderlies; yet the underlying neural mechanisms are still poorly understood. New empirical evidence suggesting that brain senescence may be an important contributor to these difficulties has challenged the traditional view that peripheral hearing loss was the main factor in the etiology of these difficulties. Here, we investigated the relationship between structural and functional brain senescence and speech perception skills in aging. Following audiometric evaluations, participants underwent MRI while performing a speech perception task at different intelligibility levels. As expected, with age speech perception declined, even after controlling for hearing sensitivity using an audiological measure (pure tone averages), and a bioacoustical measure (DPOAEs recordings). Our results reveal that the core speech network, centered on the supratemporal cortex and ventral motor areas bilaterally, decreased in spatial extent in older adults. Importantly, our results also show that speech skills in aging are affected by changes in cortical thickness and in brain functioning. Age-independent intelligibility effects were found in several motor and premotor areas, including the left ventral premotor cortex and the right supplementary motor area (SMA). Age-dependent intelligibility effects were also found, mainly in sensorimotor cortical areas, and in the left dorsal anterior insula. In this region, changes in BOLD signal modulated the relationship between age and speech perception skills suggesting a role for this region in maintaining speech perception in older ages. These results provide important new insights into the neurobiology of speech perception in aging.
Functional lesion studies have yielded new information about the cortical organization of speech perception in the human brain. We will review a number of recent findings, focusing on studies of speech perception that use the techniques of electrocortical mapping by cortical stimulation and hemispheric anesthetization by intracarotid amobarbital.…
Tierney, Joseph; Mack, Molly
Stimuli used in research on the perception of the speech signal have often been obtained from simple filtering and distortion of the speech waveform, sometimes accompanied by noise. However, for more complex stimulus generation, the parameters of speech can be manipulated, after analysis and before synthesis, using various types of algorithms to…
Martínez-Montes, Eduardo; Hernández-Pérez, Heivet; Chobert, Julie; Morgado-Rodríguez, Lisbet; Suárez-Murias, Carlos; Valdés-Sosa, Pedro A; Besson, Mireille
The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN) design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT) or equivalent that were either far from (Large deviants) or close to (Small deviants) the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception are discussed.
Menard, Lucie; Schwartz, Jean-Luc; Boe, Louis-Jean; Aubin, Jerome
It has been shown that nonuniform growth of the supraglottal cavities, motor control development, and perceptual refinement shape the vowel systems during speech development. In this talk, we propose to investigate the role of perceptual constraints as a guide to the speakers task from birth to adulthood. Simulations with an articulatory-to-acoustic model, acoustic analyses of natural vowels, and results of perceptual tests provide evidence that the production-perception relationships evolve with age. At the perceptual level, results show that (i) linear combination of spectral peaks are good predictors of vowel targets, and (ii) focalization, defined as an acoustic pattern with close neighboring formants [J.-L. Schwartz, L.-J. Boe, N. Vallee, and C. Abry, J. Phonetics 25, 255-286 (1997)], is part of the speech task. At the production level, we propose that (i) frequently produced vowels in the baby's early sound inventory can in part be explained by perceptual templates, (ii) the achievement of these perceptual templates may require adaptive articulatory strategies for the child, compared with the adults, to cope with morphological differences. Results are discussed in the light of a perception for action control theory. [Work supported by the Social Sciences and Humanities Research Council of Canada.
Bertoncini, J; Cabrera, L
The development of speech perception relies upon early auditory capacities (i.e. discrimination, segmentation and representation). Infants are able to discriminate most of the phonetic contrasts occurring in natural languages, and at the end of the first year, this universal ability starts to narrow down to the contrasts used in the environmental language. During the second year, this specialization is characterized by the development of comprehension, lexical organization and word production. That process appears now as the result of multiple interactions between perceptual, cognitive and social developing abilities. Distinct factors like word acquisition, sensitivity to the statistical properties of the input, or even the nature of the social interactions, might play a role at one time or another during the acquisition of phonological patterns. Experience with the native language is necessary for phonetic segments to be functional units of perception and for speech sound representations (words, syllables) to be more specified and phonetically organized. This evolution goes on beyond 24 months of age in a learning context characterized from the early stages by the interaction with other developing (linguistic and non-linguistic) capacities.
Woodhouse, Lynn; Hickson, Louise; Dodd, Barbara
Background: Speech perception is often considered specific to the auditory modality, despite convincing evidence that speech processing is bimodal. The theoretical and clinical roles of speech-reading for speech perception, however, have received little attention in speech-language therapy. Aims: The role of speech-read information for speech…
Anderson, Samira; Skoe, Erika; Chandrasekaran, Bharath; Zecker, Steven; Kraus, Nina
Children often have difficulty understanding speech in challenging listening environments. In the absence of peripheral hearing loss, these speech perception difficulties may arise from dysfunction at more central levels in the auditory system, including subcortical structures. We examined brainstem encoding of pitch in a speech syllable in 38 school-age children. In children with poor speech-in-noise perception, we find impaired encoding of the fundamental frequency and the second harmonic, two important cues for pitch perception. Pitch, an important factor in speaker identification, aids the listener in tracking a specific voice from a background of voices. These results suggest that the robustness of subcortical neural encoding of pitch features in time-varying signals is an important factor in determining success with speech perception in noise. PMID:20708671
Geiser, Eveline; Shattuck-Hufnagel, Stefanie
Speech rhythm has been proposed to be of crucial importance for correct speech perception and language learning. This study investigated the influence of speech rhythm in second language processing. German pseudo-sentences were presented to participants in two conditions: 'naturally regular speech rhythm' and an 'emphasized regular rhythm'. Nine expert English speakers with 3.5±1.6 years of German training repeated each sentence after hearing it once over headphones. Responses were transcribed using the International Phonetic Alphabet and analyzed for the number of correct, false and missing consonants as well as for consonant additions. The over-all number of correct reproductions of consonants did not differ between the two experimental conditions. However, speech rhythmicization significantly affected the serial position curve of correctly reproduced syllables. The results of this pilot study are consistent with the view that speech rhythm is important for speech perception.
Lu, Chunming; Long, Yuhang; Zheng, Lifen; Shi, Guang; Liu, Li; Ding, Guosheng; Howell, Peter
Speech production difficulties are apparent in people who stutter (PWS). PWS also have difficulties in speech perception compared to controls. It is unclear whether the speech perception difficulties in PWS are independent of, or related to, their speech production difficulties. To investigate this issue, functional MRI data were collected on 13 PWS and 13 controls whilst the participants performed a speech production task and a speech perception task. PWS performed poorer than controls in the perception task and the poorer performance was associated with a functional activity difference in the left anterior insula (part of the speech motor area) compared to controls. PWS also showed a functional activity difference in this and the surrounding area [left inferior frontal cortex (IFC)/anterior insula] in the production task compared to controls. Conjunction analysis showed that the functional activity differences between PWS and controls in the left IFC/anterior insula coincided across the perception and production tasks. Furthermore, Granger Causality Analysis on the resting-state fMRI data of the participants showed that the causal connection from the left IFC/anterior insula to an area in the left primary auditory cortex (Heschl’s gyrus) differed significantly between PWS and controls. The strength of this connection correlated significantly with performance in the perception task. These results suggest that speech perception difficulties in PWS are associated with anomalous functional activity in the speech motor area, and the altered functional connectivity from this area to the auditory area plays a role in the speech perception difficulties of PWS. PMID:27242487
Hubbard, Amy L; Wilson, Stephen M; Callan, Daniel E; Dapretto, Mirella
Viewing hand gestures during face-to-face communication affects speech perception and comprehension. Despite the visible role played by gesture in social interactions, relatively little is known about how the brain integrates hand gestures with co-occurring speech. Here we used functional magnetic resonance imaging (fMRI) and an ecologically valid paradigm to investigate how beat gesture-a fundamental type of hand gesture that marks speech prosody-might impact speech perception at the neural level. Subjects underwent fMRI while listening to spontaneously-produced speech accompanied by beat gesture, nonsense hand movement, or a still body; as additional control conditions, subjects also viewed beat gesture, nonsense hand movement, or a still body all presented without speech. Validating behavioral evidence that gesture affects speech perception, bilateral nonprimary auditory cortex showed greater activity when speech was accompanied by beat gesture than when speech was presented alone. Further, the left superior temporal gyrus/sulcus showed stronger activity when speech was accompanied by beat gesture than when speech was accompanied by nonsense hand movement. Finally, the right planum temporale was identified as a putative multisensory integration site for beat gesture and speech (i.e., here activity in response to speech accompanied by beat gesture was greater than the summed responses to speech alone and beat gesture alone), indicating that this area may be pivotally involved in synthesizing the rhythmic aspects of both speech and gesture. Taken together, these findings suggest a common neural substrate for processing speech and gesture, likely reflecting their joint communicative role in social interactions.
Mitterer, Holger; Mattys, Sven L
Two experiments investigated the conditions under which cognitive load exerts an effect on the acuity of speech perception. These experiments extend earlier research by using a different speech perception task (four-interval oddity task) and by implementing cognitive load through a task often thought to be modular, namely, face processing. In the cognitive-load conditions, participants were required to remember two faces presented before the speech stimuli. In Experiment 1, performance in the speech-perception task under cognitive load was not impaired in comparison to a no-load baseline condition. In Experiment 2, we modified the load condition minimally such that it required encoding of the two faces simultaneously with the speech stimuli. As a reference condition, we also used a visual search task that in earlier experiments had led to poorer speech perception. Both concurrent tasks led to decrements in the speech task. The results suggest that speech perception is affected even by loads thought to be processed modularly, and that, critically, encoding in working memory might be the locus of interference.
Galvin, John J.; Fu, Qian-Jie
Combined use of a hearing aid (HA) and cochlear implant (CI) has been shown to improve CI users’ speech and music performance. However, different hearing devices, test stimuli, and listening tasks may interact and obscure bimodal benefits. In this study, speech and music perception were measured in bimodal listeners for CI-only, HA-only, and CI + HA conditions, using the Sung Speech Corpus, a database of monosyllabic words produced at different fundamental frequencies. Sentence recognition was measured using sung speech in which pitch was held constant or varied across words, as well as for spoken speech. Melodic contour identification (MCI) was measured using sung speech in which the words were held constant or varied across notes. Results showed that sentence recognition was poorer with sung speech relative to spoken, with little difference between sung speech with a constant or variable pitch; mean performance was better with CI-only relative to HA-only, and best with CI + HA. MCI performance was better with constant words versus variable words; mean performance was better with HA-only than with CI-only and was best with CI + HA. Relative to CI-only, a strong bimodal benefit was observed for speech and music perception. Relative to the better ear, bimodal benefits remained strong for sentence recognition but were marginal for MCI. While variations in pitch and timbre may negatively affect CI users’ speech and music perception, bimodal listening may partially compensate for these deficits. PMID:27837051
Crew, Joseph D; Galvin, John J; Fu, Qian-Jie
Combined use of a hearing aid (HA) and cochlear implant (CI) has been shown to improve CI users' speech and music performance. However, different hearing devices, test stimuli, and listening tasks may interact and obscure bimodal benefits. In this study, speech and music perception were measured in bimodal listeners for CI-only, HA-only, and CI + HA conditions, using the Sung Speech Corpus, a database of monosyllabic words produced at different fundamental frequencies. Sentence recognition was measured using sung speech in which pitch was held constant or varied across words, as well as for spoken speech. Melodic contour identification (MCI) was measured using sung speech in which the words were held constant or varied across notes. Results showed that sentence recognition was poorer with sung speech relative to spoken, with little difference between sung speech with a constant or variable pitch; mean performance was better with CI-only relative to HA-only, and best with CI + HA. MCI performance was better with constant words versus variable words; mean performance was better with HA-only than with CI-only and was best with CI + HA. Relative to CI-only, a strong bimodal benefit was observed for speech and music perception. Relative to the better ear, bimodal benefits remained strong for sentence recognition but were marginal for MCI. While variations in pitch and timbre may negatively affect CI users' speech and music perception, bimodal listening may partially compensate for these deficits.
Boets, Bart; Wouters, Jan; van Wieringen, Astrid; De Smedt, Bert; Ghesquiere, Pol
The general magnocellular theory postulates that dyslexia is the consequence of a multimodal deficit in the processing of transient and dynamic stimuli. In the auditory modality, this deficit has been hypothesized to interfere with accurate speech perception, and subsequently disrupt the development of phonological and later reading and spelling…
Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche
Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718
Biau, Emmanuel; Soto-Faraco, Salvador
Spontaneous beat gestures are an integral part of the paralinguistic context during face-to-face conversations. Here we investigated the time course of beat-speech integration in speech perception by measuring ERPs evoked by words pronounced with or without an accompanying beat gesture, while participants watched a spoken discourse. Words…
Conboy, Barbara T.; Sommerville, Jessica A.; Kuhl, Patricia K.
The development of speech perception during the 1st year reflects increasing attunement to native language features, but the mechanisms underlying this development are not completely understood. One previous study linked reductions in nonnative speech discrimination to performance on nonlinguistic tasks, whereas other studies have shown…
Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.
Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…
Pisoni, David B.; And Others
Summarizing research activities in 1986, this is the twelfth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report contains the following 23 articles: "Comprehension of Digitally Encoded Natural Speech…
Pisoni, David B.; And Others
Summarizing research activities in 1987, this is the thirteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information on…
Pisoni, David B.
Summarizing research activities in 1989, this is the fifteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report contains the following 21 articles: "Perceptual Learning of Nonnative Speech…
Pisoni, David B.; And Others
Summarizing research activities in 1988, this is the fourteenth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, and progress reports. The report contains…
Theoretical accounts of both speech perception and of short term memory must consider the extent to which perceptual representations of speech sounds might survive in relatively unprocessed form. This paper describes a novel version of the serial recall task that can be used to explore this area of shared interest. In immediate recall of digit…
Szenkovits, Gayaneh; Peelle, Jonathan E.; Norris, Dennis; Davis, Matthew H.
Although activity in premotor and motor cortices is commonly observed in neuroimaging studies of spoken language processing, the degree to which this activity is an obligatory part of everyday speech comprehension remains unclear. We hypothesised that rather than being a unitary phenomenon, the neural response to speech perception in motor regions…
one another in COHORT, the nodes for sell, your, light, and cellulite , wil all bc in active competition with one another. The system will have no way...7 AD-AI28 787 SPEECH PERCEPTION AS A COGNITIVE PROCESS: THE INTERACTIVE ACTIVATION MODE..(U) CALIFORNIA UNIV SAN D IEGO LA dOLLA INST FOR COGNITIVE...TYPE OF REPORT & PERIOD COVERED Speech Perception as a Cognitive Process: Technical Report The Interactive Activation Model S. PERFORMING ORG. REPORT
Black, Alan; Eskenazi, Maxine; Simmons, Reid
An aging population still needs to access information, such as bus schedules. It is evident that they will be doing so using computers and especially interfaces using speech input and output. This is a preliminary study to the use of synthetic speech for the elderly. In it twenty persons between the ages of 60 and 80 were asked to listen to speech emitted by a robot (CMU's VIKIA) and to write down what they heard. All of the speech was natural prerecorded speech (not synthetic) read by one female speaker. There were four listening conditions: (a) only speech emitted, (b) robot moves before emitting speech, (c) face has lip movement during speech, (d) both (b) and (c). There were very few errors for conditions (b), (c), and (d), but errors existed for condition (a). The presentation will discuss experimental conditions, show actual figures and try to draw conclusions for speech communication between computers and the elderly.
Lalonde, Kaylah; Holt, Rachael Frush
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children. PMID:27106318
Lalonde, Kaylah; Holt, Rachael Frush
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.
Dodd, Barbara; McIntosh, Beth; Erdener, Dogu; Burnham, Denis
An example of the auditory-visual illusion in speech perception, first described by McGurk and MacDonald, is the perception of [ta] when listeners hear [pa] in synchrony with the lip movements for [ka]. One account of the illusion is that lip-read and heard speech are combined in an articulatory code since people who mispronounce words respond…
Leybaert, Jacqueline; LaSasso, Carol J.
Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357
Lametti, Daniel R; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M; Ostry, David J
Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds.
Bicevskis, Katie; Derrick, Donald; Gick, Bryan
Audio-visual [McGurk and MacDonald (1976). Nature 264, 746-748] and audio-tactile [Gick and Derrick (2009). Nature 462(7272), 502-504] speech stimuli enhance speech perception over audio stimuli alone. In addition, multimodal speech stimuli form an asymmetric window of integration that is consistent with the relative speeds of the various signals [Munhall, Gribble, Sacco, and Ward (1996). Percept. Psychophys. 58(3), 351-362; Gick, Ikegami, and Derrick (2010). J. Acoust. Soc. Am. 128(5), EL342-EL346]. In this experiment, participants were presented video of faces producing /pa/ and /ba/ syllables, both alone and with air puffs occurring synchronously and at different timings up to 300 ms before and after the stop release. Perceivers were asked to identify the syllable they perceived, and were more likely to respond that they perceived /pa/ when air puffs were present, with asymmetrical preference for puffs following the video signal-consistent with the relative speeds of visual and air puff signals. The results demonstrate that visual-tactile integration of speech perception occurs much as it does with audio-visual and audio-tactile stimuli. This finding contributes to the understanding of multimodal speech perception, lending support to the idea that speech is not perceived as an audio signal that is supplemented by information from other modes, but rather that primitives of speech perception are, in principle, modality neutral.
Darwin, Chris; Rivenez, Marie
This talk will give an overview of experimental work on auditory grouping in speech perception including the use of grouping cues in the extraction of source-specific auditory information, and the tracking of sound sources across time. Work on the perception of unattended speech sounds will be briefly reviewed and some recent experiments described demonstrating the importance of pitch differences in allowing lexical processing of speech on the unattended ear. The relationship between auditory grouping and auditory continuity will also be discussed together with recent experiments on the role of grouping in the perceptual continuity of complex sounds.
Gauvin, Hanna S; De Baene, Wouter; Brass, Marcel; Hartsuiker, Robert J
To minimize the number of errors in speech, and thereby facilitate communication, speech is monitored before articulation. It is, however, unclear at which level during speech production monitoring takes place, and what mechanisms are used to detect and correct errors. The present study investigated whether internal verbal monitoring takes place through the speech perception system, as proposed by perception-based theories of speech monitoring, or whether mechanisms independent of perception are applied, as proposed by production-based theories of speech monitoring. With the use of fMRI during a tongue twister task we observed that error detection in internal speech during noise-masked overt speech production and error detection in speech perception both recruit the same neural network, which includes pre-supplementary motor area (pre-SMA), dorsal anterior cingulate cortex (dACC), anterior insula (AI), and inferior frontal gyrus (IFG). Although production and perception recruit similar areas, as proposed by perception-based accounts, we did not find activation in superior temporal areas (which are typically associated with speech perception) during internal speech monitoring in speech production as hypothesized by these accounts. On the contrary, results are highly compatible with a domain general approach to speech monitoring, by which internal speech monitoring takes place through detection of conflict between response options, which is subsequently resolved by a domain general executive center (e.g., the ACC).
Aubanel, Vincent; Davis, Chris; Kim, Jeesun
A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise. PMID:27630552
Bell-Berti, F; Raphael, L J; Pisoni, D B; Sawusch, J R
EMG studies of the American English vowel pairs /i-I/ and /e-epsilon/ reveal two different production strategies: some speakers appear to differentiate the members of each pair primarily on the basis to tongue height; for others the basis of differentiation appears to be tongue tension. There was no obvious reflection of these differences in the speech wave-forms or formant patterns of the two groups. To determine if these differences in production might correspond to differences in perception, two vowel identification tests were given to the EMG subjects. Subjects were asked to label the members of a seven-step vowel continuum, /i/ through /I/. In one condition each item had an equal probability of occurrence. The other condition was an anchoring test; the first stimulus, /i/, was heard four times as often as any other stimulus. Compared with the equal-probability test labelling boundary, the boundary in the anchoring test was displaced toward the more frequently occurring stimulus. The magnitude of the shift of the labelling boundary was greater for subjects using a production strategy based on tongue height than for subjects using tongue tension to differentiate these vowels, suggesting that the stimuli represent adjacent categories in the speakers' phonetic space for the former, but not for the latter, group.
Guediche, Sara; Holt, Lori L; Laurent, Patryk; Lim, Sung-Joo; Fiez, Julie A
Human speech perception rapidly adapts to maintain comprehension under adverse listening conditions. For example, with exposure listeners can adapt to heavily accented speech produced by a non-native speaker. Outside the domain of speech perception, adaptive changes in sensory and motor processing have been attributed to cerebellar functions. The present functional magnetic resonance imaging study investigates whether adaptation in speech perception also involves the cerebellum. Acoustic stimuli were distorted using a vocoding plus spectral-shift manipulation and presented in a word recognition task. Regions in the cerebellum that showed differences before versus after adaptation were identified, and the relationship between activity during adaptation and subsequent behavioral improvements was examined. These analyses implicated the right Crus I region of the cerebellum in adaptive changes in speech perception. A functional correlation analysis with the right Crus I as a seed region probed for cerebral cortical regions with covarying hemodynamic responses during the adaptation period. The results provided evidence of a functional network between the cerebellum and language-related regions in the temporal and parietal lobes of the cerebral cortex. Consistent with known cerebellar contributions to sensorimotor adaptation, cerebro-cerebellar interactions may support supervised learning mechanisms that rely on sensory prediction error signals in speech perception.
Recent literature in second language (L2) perceived fluency has focused on English as a second language, with a primary reliance on impressions from native-speaker judges, leaving learners' self-perceptions of speech production unexplored. This study investigates the relationship between learners' and judges' perceptions of French fluency under…
Thompson, Elaine C; Woodruff Carr, Kali; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina
From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3-5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ∼12 months), we followed a cohort of 59 preschoolers, ages 3.0-4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children.
van der Leij, Aryan; Blok, Henk; de Jong, Peter F.
This study investigated the role of speech perception accuracy and speed in fluent word decoding of reading disabled (RD) children. A same-different phoneme discrimination task with natural speech tested the perception of single consonants and consonant clusters by young but persistent RD children. RD children were slower than chronological age (CA) controls in recognizing identical sounds, suggesting less distinct phonemic categories. In addition, after controlling for phonetic similarity Tallal’s (Brain Lang 9:182–198, 1980) fast transitions account of RD children’s speech perception problems was contrasted with Studdert-Kennedy’s (Read Writ Interdiscip J 15:5–14, 2002) similarity explanation. Results showed no specific RD deficit in perceiving fast transitions. Both phonetic similarity and fast transitions influenced accurate speech perception for RD children as well as CA controls. PMID:20652455
Kröger, Bernd J.; Crawford, Eric; Bekolay, Trevor; Eliasmith, Chris
Production and comprehension of speech are closely interwoven. For example, the ability to detect an error in one's own speech, halt speech production, and finally correct the error can be explained by assuming an inner speech loop which continuously compares the word representations induced by production to those induced by perception at various cognitive levels (e.g., conceptual, word, or phonological levels). Because spontaneous speech errors are relatively rare, a picture naming and halt paradigm can be used to evoke them. In this paradigm, picture presentation (target word initiation) is followed by an auditory stop signal (distractor word) for halting speech production. The current study seeks to understand the neural mechanisms governing self-detection of speech errors by developing a biologically inspired neural model of the inner speech loop. The neural model is based on the Neural Engineering Framework (NEF) and consists of a network of about 500,000 spiking neurons. In the first experiment we induce simulated speech errors semantically and phonologically. In the second experiment, we simulate a picture naming and halt task. Target-distractor word pairs were balanced with respect to variation of phonological and semantic similarity. The results of the first experiment show that speech errors are successfully detected by a monitoring component in the inner speech loop. The results of the second experiment show that the model correctly reproduces human behavioral data on the picture naming and halt task. In particular, the halting rate in the production of target words was lower for phonologically similar words than for semantically similar or fully dissimilar distractor words. We thus conclude that the neural architecture proposed here to model the inner speech loop reflects important interactions in production and perception at phonological and semantic levels. PMID:27303287
Poeppel, David; Idsardi, William J; van Wassenhove, Virginie
Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.
Clayards, Meghan; Tanenhaus, Michael K.; Aslin, Richard N.; Jacobs, Robert A.
Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial…
Anderson, Melinda C; Arehart, Kathryn H; Kates, James M
Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech.
D'Ausilio, Alessandro; Bartoli, Eleonora; Maffongelli, Laura; Berry, Jeffrey James; Fadiga, Luciano
Audiovisual speech perception is likely based on the association between auditory and visual information into stable audiovisual maps. Conflicting audiovisual inputs generate perceptual illusions such as the McGurk effect. Audiovisual mismatch effects could be either driven by the detection of violations in the standard audiovisual statistics or via the sensorimotor reconstruction of the distal articulatory event that generated the audiovisual ambiguity. In order to disambiguate between the two hypotheses we exploit the fact that the tongue is hidden to vision. For this reason, tongue movement encoding can solely be learned via speech production but not via others׳ speech perception alone. Here we asked participants to identify speech sounds while matching or mismatching visual representations of tongue movements which were shown. Vision of congruent tongue movements facilitated auditory speech identification with respect to incongruent trials. This result suggests that direct visual experience of an articulator movement is not necessary for the generation of audiovisual mismatch effects. Furthermore, we suggest that audiovisual integration in speech may benefit from speech production learning.
Jantzen, McNeel G; Howe, Bradley M; Jantzen, Kelly J
Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.
Magnotti, John F.
Audiovisual speech integration combines information from auditory speech (talker’s voice) and visual speech (talker’s mouth movements) to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory “ba” + visual “ga” (AbaVga), that are integrated to produce a fused percept (“da”). This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba). We describe a simplified model of causal inference in multisensory speech perception (CIMS) that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others. PMID:28207734
Messaoud-Galusi, Souhila; Hazan, Valerie; Rosen, Stuart
Purpose: The claim that speech perception abilities are impaired in dyslexia was investigated in a group of 62 children with dyslexia and 51 average readers matched in age. Method: To test whether there was robust evidence of speech perception deficits in children with dyslexia, speech perception in noise and quiet was measured using 8 different…
Sato, Marc; Basirat, Anahita; Schwartz, Jean-Luc
The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.
Dodd, Barbara; McIntosh, Beth; Erdener, Dogu; Burnham, Denis
An example of the auditory-visual illusion in speech perception, first described by McGurk and MacDonald, is the perception of [ta] when listeners hear [pa] in synchrony with the lip movements for [ka]. One account of the illusion is that lip-read and heard speech are combined in an articulatory code since people who mispronounce words respond differently from controls on lip-reading tasks. A same-different judgment task assessing perception of the illusion showed no difference in performance between controls and children with speech difficulties. Another experiment compared children with delayed and disordered speech on perception of the illusion. While neither group perceived many illusions, a significant interaction indicated that children with disordered phonology were strongly biased to the auditory component while the delayed group's response was more evenly split between the auditory and visual components of the illusion. These findings suggest that phonological processing, rather than articulation, supports lip-reading ability.
Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.
Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558
Youse, Kathleen M; Cienkowski, Kathleen M; Coelho, Carl A
The evaluation of auditory-visual speech perception is not typically undertaken in the assessment of aphasia; however, treatment approaches utilise bimodal presentations. Research demonstrates that auditory and visual information are integrated for speech perception. The strongest evidence of this cross-modal integration is the McGurk effect. This indirect measure of integration shows that presentation of conflicting tokens may change perception (e.g. auditory /bi/ + visual /gi/ = /di/). The purpose of this study was to investigate the ability of a person with mild aphasia to identify tokens presented in auditory-only, visual-only and auditory-visual conditions. It was hypothesized that performance would be best in the bimodal condition and that presence of the McGurk effect would demonstrate integration of speech information. Findings did not support the hypotheses. It is suspected that successful integration of AV speech information was limited by a perseverative response pattern. This case study suggests the use of bisensory speech information may be impaired in adults with aphasia.
Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C
Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
Scott, Sophie K; McGettigan, Carolyn
It is not unusual to find it stated as a fact that the left hemisphere is specialized for the processing of rapid, or temporal aspects of sound, and that the dominance of the left hemisphere in the perception of speech can be a consequence of this specialization. In this review we explore the history of this claim and assess the weight of this assumption. We will demonstrate that instead of a supposed sensitivity of the left temporal lobe for the acoustic properties of speech, it is the right temporal lobe which shows a marked preference for certain properties of sounds, for example longer durations, or variations in pitch. We finish by outlining some alternative factors that contribute to the left lateralization of speech perception.
Scott, Sophie K; McGettigan, Carolyn
It is not unusual to find it stated as a fact that the left hemisphere is specialized for the processing of rapid, or temporal aspects of sound, and that the dominance of the left hemisphere in the perception of speech can be a consequence of this specialisation. In this review we explore the history of this claim and assess the weight of this assumption. We will demonstrate that instead of a supposed sensitivity of the left temporal lobe for the acoustic properties of speech, it is the right temporal lobe which shows a marked preference for certain properties of sounds, for example longer durations, or variations in pitch. We finish by outlining some alternative factors that contribute to the left lateralization of speech perception. PMID:24125574
Iarocci, Grace; Rombough, Adrienne; Yager, Jodi; Weeks, Daniel J; Chua, Romeo
The bimodal perception of speech sounds was examined in children with autism as compared to mental age-matched typically developing (TD) children. A computer task was employed wherein only the mouth region of the face was displayed and children reported what they heard or saw when presented with consonant-vowel sounds in unimodal auditory condition, unimodal visual condition, and a bimodal condition. Children with autism showed less visual influence and more auditory influence on their bimodal speech perception as compared to their TD peers, largely due to significantly worse performance in the unimodal visual condition (lip reading). Children with autism may not benefit to the same extent as TD children from visual cues such as lip reading that typically support the processing of speech sounds. The disadvantage in lip reading may be detrimental when auditory input is degraded, for example in school settings, whereby speakers are communicating in frequently noisy environments.
Lewkowicz, David J.
Three experiments investigated perception of audio-visual (A-V) speech synchrony in 4- to 10-month-old infants. Experiments 1 and 2 used a convergent-operations approach by habituating infants to an audiovisually synchronous syllable (Experiment 1) and then testing for detection of increasing degrees of A-V asynchrony (366, 500, and 666 ms) or by…
Nicholls, Michael E. R.; Searle, Dara A.
This study explored asymmetries for movement, expression and perception of visual speech. Sixteen dextral models were videoed as they articulated: "bat," "cat," "fat," and "sat." Measurements revealed that the right side of the mouth was opened wider and for a longer period than the left. The asymmetry was accentuated at the beginning and ends of…
Zhang, Juan; McBride-Chang, Catherine
While the importance of phonological sensitivity for understanding reading acquisition and impairment across orthographies is well documented, what underlies deficits in phonological sensitivity is not well understood. Some researchers have argued that speech perception underlies variability in phonological representations. Others have…
Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…
Iarocci, Grace; Rombough, Adrienne; Yager, Jodi; Weeks, Daniel J.; Chua, Romeo
The bimodal perception of speech sounds was examined in children with autism as compared to mental age--matched typically developing (TD) children. A computer task was employed wherein only the mouth region of the face was displayed and children reported what they heard or saw when presented with consonant-vowel sounds in unimodal auditory…
Woynaroski, Tiffany G.; Kwakye, Leslie D.; Foss-Feig, Jennifer H.; Stevenson, Ryan A.; Stone, Wendy L.; Wallace, Mark T.
This study examined unisensory and multisensory speech perception in 8-17 year old children with autism spectrum disorders (ASD) and typically developing controls matched on chronological age, sex, and IQ. Consonant-vowel syllables were presented in visual only, auditory only, matched audiovisual, and mismatched audiovisual ("McGurk")…
Rance, Gary; Fava, Rosanne; Baldock, Heath; Chong, April; Barker, Elizabeth; Corben, Louise; Delatycki
The aim of this study was to investigate auditory pathway function and speech perception ability in individuals with Friedreich ataxia (FRDA). Ten subjects confirmed by genetic testing as being homozygous for a GAA expansion in intron 1 of the FXN gene were included. While each of the subjects demonstrated normal, or near normal sound detection, 3…
Phonetic variation has been considered a barrier that listeners must overcome in speech perception, but has been proved beneficial in category learning. In this paper, I show that listeners use within-speaker variation to accommodate gross categorical variation. Within the perceptual learning paradigm, listeners are exposed to p-initial words in…
Boatman, Dana F.
Recent brain mapping studies have provided new insights into the cortical systems that mediate human speech perception. Electrocortical stimulation mapping (ESM) is a brain mapping method that is used clinically to localize cortical functions in neurosurgical patients. Recent ESM studies have yielded new insights into the cortical systems that…
Badino, Leonardo; D'Ausilio, Alessandro; Fadiga, Luciano; Metta, Giorgio
Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data).
Von Tiling, Johannes
This study examined listener perceptions of different ways of speaking often produced by people who stutter. Each of 115 independent listeners made quantitative and qualitative judgments upon watching one of four randomly assigned speech samples. Each of the four video clips showed the same everyday conversation between three young men, but…
Samuel, Arthur G.; Lieblich, Jerrold
The speech signal is often badly articulated, and heard under difficult listening conditions. To deal with these problems, listeners make use of various types of context. In the current study, we examine a type of context that in previous work has been shown to affect how listeners report what they hear: visual speech (i.e., the visible movements of the speaker’s articulators). Despite the clear utility of this type of context under certain conditions, prior studies have shown that visually-driven phonetic percepts (via the “McGurk” effect) are not “real” enough to affect perception of later-occurring speech; such percepts have not produced selective adaptation effects. This failure contrasts with successful adaptation by sounds that are generated by lexical context – the word that a sound occurs within. We demonstrate here that this dissociation is robust, leading to the conclusion that visual and lexical contexts operate differently. We suggest that the dissociation reflects the dual nature of speech as both a perceptual object and a linguistic object. Visual speech seems to contribute directly to the computations of the perceptual object, but not the linguistic one, while lexical context is used in both types of computations. PMID:24749935
Samuel, Arthur G; Lieblich, Jerrold
The speech signal is often badly articulated, and heard under difficult listening conditions. To deal with these problems, listeners make use of various types of context. In the current study, we examine a type of context that in previous work has been shown to affect how listeners report what they hear: visual speech (i.e., the visible movements of the speaker's articulators). Despite the clear utility of this type of context under certain conditions, prior studies have shown that visually driven phonetic percepts (via the "McGurk" effect) are not "real" enough to affect perception of later-occurring speech; such percepts have not produced selective adaptation effects. This failure contrasts with successful adaptation by sounds that are generated by lexical context-the word that a sound occurs within. We demonstrate here that this dissociation is robust, leading to the conclusion that visual and lexical contexts operate differently. We suggest that the dissociation reflects the dual nature of speech as both a perceptual object and a linguistic object. Visual speech seems to contribute directly to the computations of the perceptual object but not the linguistic one, while lexical context is used in both types of computations.
Fowler, Carol A; Shankweiler, Donald; Studdert-Kennedy, Michael
We revisit an article, "Perception of the Speech Code" (PSC), published in this journal 50 years ago (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) and address one of its legacies concerning the status of phonetic segments, which persists in theories of speech today. In the perspective of PSC, segments both exist (in language as known) and do not exist (in articulation or the acoustic speech signal). Findings interpreted as showing that speech is not a sound alphabet, but, rather, phonemes are encoded in the signal, coupled with findings that listeners perceive articulation, led to the motor theory of speech perception, a highly controversial legacy of PSC. However, a second legacy, the paradoxical perspective on segments has been mostly unquestioned. We remove the paradox by offering an alternative supported by converging evidence that segments exist in language both as known and as used. We support the existence of segments in both language knowledge and in production by showing that phonetic segments are articulatory and dynamic and that coarticulation does not eliminate them. We show that segments leave an acoustic signature that listeners can track. This suggests that speech is well-adapted to public communication in facilitating, not creating a barrier to, exchange of language forms.
Dornan, Dimity; Hickson, Louise; Murdoch, Bruce; Houston, Todd
This study examined the speech perception, speech, and language developmental progress of 25 children with hearing loss (mean Pure-Tone Average [PTA] 79.37 dB HL) in an auditory verbal therapy program. Children were tested initially and then 21 months later on a battery of assessments. The speech and language results over time were compared with…
Speech utterances are more than the linear concatenation of individual phonemes or words. They are organized by prosodic structures comprising phonological units of different sizes (e.g., syllable, foot, word, and phrase) and the prominence relations among them. As the linguistic structure of spoken languages, prosody serves an important function…
Massaro, Dominic W.; Light, Joanna
The main goal of this study was to implement a computer-animated talking head, Baldi, as a language tutor for speech perception and production for individuals with hearing loss. Baldi can speak slowly; illustrate articulation by making the skin transparent to reveal the tongue, teeth, and palate; and show supplementary articulatory features, such…
Baum, Sarah H.; Martin, Randi C.; Hamilton, A. Cris; Beauchamp, Michael S.
Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. PMID:22634292
Perception and action are coupled via bidirectional relationships between sensory and motor systems. Motor systems influence sensory areas by imparting a feedforward influence on sensory processing termed "motor efference copy" (MEC). MEC is suggested to occur in humans because speech preparation and production modulate neural measures of auditory cortical activity. However, it is not known if MEC can affect auditory perception. We tested the hypothesis that during speech preparation auditory thresholds will increase relative to a control condition, and that the increase would be most evident for frequencies that match the upcoming vocal response. Participants performed trials in a speech condition that contained a visual cue indicating a vocal response to prepare (one of two frequencies), followed by a go signal to speak. To determine threshold shifts, voice-matched or -mismatched pure tones were presented at one of three time points between the cue and target. The control condition was the same except the visual cues did not specify a response and subjects did not speak. For each participant, we measured f0 thresholds in isolation from the task in order to establish baselines. Results indicated that auditory thresholds were highest during speech preparation, relative to baselines and a non-speech control condition, especially at suprathreshold levels. Thresholds for tones that matched the frequency of planned responses gradually increased over time, but sharply declined for the mismatched tones shortly before targets. Findings support the hypothesis that MEC influences auditory perception by modulating thresholds during speech preparation, with some specificity relative to the planned response. The threshold increase in tasks vs. baseline may reflect attentional demands of the tasks.
unlimited. DTIC" S EL. _ KAR 15 19%5 A LEXINGTON MASSACHUSETTS .- i~i! ABSTRACT r,’ ...A speech analysis system based on a combination of physiological ...AUDITORY MODEL BASED ON PHYSIOLOGICAL RESULTSL................................................. 8 2.3 A SIMPLIFIED AUDITORY MODEL INCORPORATING... physiological studies of the auditory system are applied, it may be possible to design improved ASR machines. When applying auditory system results to the
Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador
Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech.
Knowland, Victoria CP; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael SC
Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language learning. We therefore explored this at the neural level. The event-related potential (ERP) technique has been used to assess the mechanisms of audio-visual speech perception in adults, with visual cues reliably modulating auditory ERP responses to speech. Previous work has shown congruence-dependent shortening of auditory N1/P2 latency and congruence-independent attenuation of amplitude in the presence of auditory and visual speech signals, compared to auditory alone. The aim of this study was to chart the development of these well-established modulatory effects over mid-to-late childhood. Experiment 1 employed an adult sample to validate a child-friendly stimulus set and paradigm by replicating previously observed effects of N1/P2 amplitude and latency modulation by visual speech cues; it also revealed greater attenuation of component amplitude given incongruent audio-visual stimuli, pointing to a new interpretation of the amplitude modulation effect. Experiment 2 used the same paradigm to map cross-sectional developmental change in these ERP responses between 6 and 11 years of age. The effect of amplitude modulation by visual cues emerged over development, while the effect of latency modulation was stable over the child sample. These data suggest that auditory ERP modulation by visual speech represents separable underlying cognitive processes, some of which show earlier maturation than others over the course of development. PMID:24176002
Farhood, Zachary; Nguyen, Shaun A; Miller, Stephen C; Holcomb, Meredith A; Meyer, Ted A; Rizk, And Habib G
Objective (1) To analyze reported speech perception outcomes in patients with inner ear malformations who undergo cochlear implantation, (2) to review the surgical complications and findings, and (3) to compare the 2 classification systems of Jackler and Sennaroglu. Data Sources PubMed, Scopus (including Embase), Medline, and CINAHL Plus. Review Methods Fifty-nine articles were included that contained speech perception and/or intraoperative data. Cases were differentiated depending on whether the Jackler or Sennaroglu malformation classification was used. A meta-analysis of proportions examined incidences of complete insertion, gusher, and facial nerve aberrancy. For speech perception data, weighted means and standard deviations were calculated for all malformations for short-, medium-, and long-term follow-up. Speech tests were grouped into 3 categories-closed-set words, open-set words, and open-set sentences-and then compared through a comparison-of-means t test. Results Complete insertion was seen in 81.8% of all inner ear malformations (95% CI: 72.6-89.5); gusher was reported in 39.1% of cases (95% CI: 30.3-48.2); and facial nerve anomalies were encountered in 34.4% (95% CI: 20.1-50.3). Significant improvements in average performance were seen for closed- and open-set tests across all malformation types at 12 months postoperatively. Conclusions Cochlear implantation outcomes are favorable for those with inner ear malformations from a surgical and speech outcome standpoint. Accurate classification of anatomic malformations, as well as standardization of postimplantation speech outcomes, is necessary to improve understanding of the impact of implantation in this difficult patient population.
Liu, Fang; Jiang, Cunmei; Wang, Bei; Xu, Yi; Patel, Aniruddh D
This study investigated the underlying link between speech and music by examining whether and to what extent congenital amusia, a musical disorder characterized by degraded pitch processing, would impact spoken sentence comprehension for speakers of Mandarin, a tone language. Sixteen Mandarin-speaking amusics and 16 matched controls were tested on the intelligibility of news-like Mandarin sentences with natural and flat fundamental frequency (F0) contours (created via speech resynthesis) under four signal-to-noise (SNR) conditions (no noise, +5, 0, and -5dB SNR). While speech intelligibility in quiet and extremely noisy conditions (SNR=-5dB) was not significantly compromised by flattened F0, both amusic and control groups achieved better performance with natural-F0 sentences than flat-F0 sentences under moderately noisy conditions (SNR=+5 and 0dB). Relative to normal listeners, amusics demonstrated reduced speech intelligibility in both quiet and noise, regardless of whether the F0 contours of the sentences were natural or flattened. This deficit in speech intelligibility was not associated with impaired pitch perception in amusia. These findings provide evidence for impaired speech comprehension in congenital amusia, suggesting that the deficit of amusics extends beyond pitch processing and includes segmental processing.
Scott, Sophie K; McGettigan, Carolyn; Eisner, Frank
The motor theory of speech perception assumes that activation of the motor system is essential in the perception of speech. However, deficits in speech perception and comprehension do not arise from damage that is restricted to the motor cortex, few functional imaging studies reveal activity in motor cortex during speech perception, and the motor cortex is strongly activated by many different sound categories. Here, we evaluate alternative roles for the motor cortex in spoken communication and suggest a specific role in sensorimotor processing in conversation. We argue that motor-cortex activation it is essential in joint speech, particularly for the timing of turn-taking. PMID:19277052
Stilp, Christian E
Normal-hearing listeners' speech perception is widely influenced by spectral contrast effects (SCEs), where perception of a given sound is biased away from stable spectral properties of preceding sounds. Despite this influence, it is not clear how these contrast effects affect speech perception for cochlear implant (CI) users whose spectral resolution is notoriously poor. This knowledge is important for understanding how CIs might better encode key spectral properties of the listening environment. Here, SCEs were measured in normal-hearing listeners using noise-vocoded speech to simulate poor spectral resolution. Listeners heard a noise-vocoded sentence where low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequency regions were amplified to encourage "eh" (/ɛ/) or "ih" (/ɪ/) responses to the following target vowel, respectively. This was done by filtering with +20 dB (experiment 1a) or +5 dB gain (experiment 1b) or filtering using 100 % of the difference between spectral envelopes of /ɛ/ and /ɪ/ endpoint vowels (experiment 2a) or only 25 % of this difference (experiment 2b). SCEs influenced identification of noise-vocoded vowels in each experiment at every level of spectral resolution. In every case but one, SCE magnitudes exceeded those reported for full-spectrum speech, particularly when spectral peaks in the preceding sentence were large (+20 dB gain, 100 % of the spectral envelope difference). Even when spectral resolution was insufficient for accurate vowel recognition, SCEs were still evident. Results are suggestive of SCEs influencing CI users' speech perception as well, encouraging further investigation of CI users' sensitivity to acoustic context.
Wayland, Ratree P.; Eckhouse, Erin; Lombardino, Linda; Roberts, Rosalyn
This study investigated the relationship between speech perception, phonological processing and reading skills among school-aged children classified as "skilled" and "less skilled" readers based on their ability to read words, decode non-words, and comprehend short passages. Three speech perception tasks involving categorization of speech continua…
The premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Cascaded cortical oscillations in the theta, beta, and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these oscillations remain phase locked to the auditory input rhythm. A model (Tempo) is presented which is capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of “packaging” rate (Ghitza and Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when the information stream is re-packaged by the insertion of silent gaps in between successive compressed-signal intervals – a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture. PMID:21743809
Gygi, Brian; Shafiro, Valeriy
Speech perception in multitalker environments often requires listeners to divide attention among several concurrent talkers before focusing on one talker with pertinent information. Such attentionally demanding tasks are particularly difficult for older adults due both to age-related hearing loss (presbacusis) and general declines in attentional processing and associated cognitive abilities. This study investigated two signal-processing techniques that have been suggested as a means of improving speech perception accuracy of older adults: time stretching and spatial separation of target talkers. Stimuli in each experiment comprised 2-4 fixed-form utterances in which listeners were asked to consecutively 1) detect concurrently spoken keywords in the beginning of the utterance (divided attention); and, 2) identify additional keywords from only one talker at the end of the utterance (selective attention). In Experiment 1, the overall tempo of each utterance was unaltered or slowed down by 25%; in Experiment 2 the concurrent utterances were spatially coincident or separated across a 180-degree hemifield. Both manipulations improved performance for elderly adults with age-appropriate hearing on both tasks. Increasing the divided attention load by attending to more concurrent keywords had a marked negative effect on performance of the selective attention task only when the target talker was identified by a keyword, but not by spatial location. These findings suggest that the temporal and spatial modifications of multitalker speech improved perception of multitalker speech primarily by reducing competition among cognitive resources required to perform attentionally demanding tasks.
Lev-Ari, Shiri; Peperkamp, Sharon
Speech perception is known to be influenced by listeners' expectations of the speaker. This paper tests whether the demographic makeup of individuals' communities can influence their perception of foreign sounds by influencing their expectations of the language. Using online experiments with participants from all across the U.S. and matched census data on the proportion of Spanish and other foreign language speakers in participants' communities, this paper shows that the demographic makeup of individuals' communities influences their expectations of foreign languages to have an alveolar trill versus a tap (Experiment 1), as well as their consequent perception of these sounds (Experiment 2). Thus, the paper shows that while individuals' expectations of foreign language to have a trill occasionally lead them to misperceive a tap in a foreign language as a trill, a higher proportion of non-trill language speakers in one's community decreases this likelihood. These results show that individuals' environment can influence their perception by shaping their linguistic expectations.
Schellenberg, E Glenn
Claims of beneficial side effects of music training are made for many different abilities, including verbal and visuospatial abilities, executive functions, working memory, IQ, and speech perception in particular. Such claims assume that music training causes the associations even though children who take music lessons are likely to differ from other children in music aptitude, which is associated with many aspects of speech perception. Music training in childhood is also associated with cognitive, personality, and demographic variables, and it is well established that IQ and personality are determined largely by genetics. Recent evidence also indicates that the role of genetics in music aptitude and music achievement is much larger than previously thought. In short, music training is an ideal model for the study of gene-environment interactions but far less appropriate as a model for the study of plasticity. Children seek out environments, including those with music lessons, that are consistent with their predispositions; such environments exaggerate preexisting individual differences.
Boets, Bart; Wouters, Jan; van Wieringen, Astrid; De Smedt, Bert; Ghesquière, Pol
The general magnocellular theory postulates that dyslexia is the consequence of a multimodal deficit in the processing of transient and dynamic stimuli. In the auditory modality, this deficit has been hypothesized to interfere with accurate speech perception, and subsequently disrupt the development of phonological and later reading and spelling skills. In the visual modality, an analogous problem might interfere with literacy development by affecting orthographic skills. In this prospective longitudinal study, we tested dynamic auditory and visual processing, speech-in-noise perception, phonological ability and orthographic ability in 62 five-year-old preschool children. Predictive relations towards first grade reading and spelling measures were explored and the validity of the global magnocellular model was evaluated using causal path analysis. In particular, we demonstrated that dynamic auditory processing was related to speech perception, which itself was related to phonological awareness. Similarly, dynamic visual processing was related to orthographic ability. Subsequently, phonological awareness, orthographic ability and verbal short-term memory were unique predictors of reading and spelling development.
Remez, Robert E.
A varied psychological vocabulary now describes the cognitive and social conditions of language production, the ultimate result of which is the mechanical action of vocal musculature in spoken expression. Following the logic of the speech chain, descriptions of production have often exhibited a clear analogy to accounts of perception. This reciprocality is especially evident in explanations that rely on reafference to control production, on articulation to inform perception, and on strict parity between produced and perceived form to provide invariance in the relation between abstract linguistic objects and observed expression. However, a causal account of production and perception cannot derive solely from this hopeful analogy. Despite sharing of abstract linguistic representations, the control functions in production and perception as well as the constraints on their use stand in fundamental disanalogy. This is readily seen in the different adaptive challenges to production — to speak in a single voice — and perception — to resolve familiar linguistic properties in any voice. This acknowledgment sets descriptive and theoretical challenges that break the symmetry of production and perception. As a consequence, this recognition dislodges an old impasse between the psychoacoustic and motoric accounts in the regulation of production and perception. PMID:25642428
Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A
Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests. Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study. Forty-four listeners aged between 50 and 74 years with mild sensorineural hearing loss were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet), to medium (digit triplet perception in speech-shaped noise) to high (sentence perception in modulated noise); cognitive tests of attention, memory, and non-verbal intelligence quotient; and self-report questionnaires of general health-related and hearing-specific quality of life. Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that
Won, Jong Ho; Moon, Il Joon; Jin, Sunhwa; Park, Heesung; Woo, Jihwan; Cho, Yang-Sun; Chung, Won-Ho; Hong, Sung Hwa
Spectrotemporal modulation (STM) detection performance was examined for cochlear implant (CI) users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz) and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave) were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH) and hearing-impaired (HI) listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test-retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information.
Won, Jong Ho; Moon, Il Joon; Jin, Sunhwa; Park, Heesung; Woo, Jihwan; Cho, Yang-Sun; Chung, Won-Ho; Hong, Sung Hwa
Spectrotemporal modulation (STM) detection performance was examined for cochlear implant (CI) users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz) and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave) were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH) and hearing-impaired (HI) listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test–retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information. PMID:26485715
Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Ihlefeld, Antje; Litovsky, Ruth Y
A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.
Abram, Samantha V; Karpouzian, Tatiana M; Reilly, James L; Derntl, Birgit; Habel, Ute; Smith, Matthew J
Several studies suggest facial affect perception (FAP) deficits in schizophrenia are linked to poorer social functioning. However, whether reduced functioning is associated with inaccurate perception of specific emotional valence or a global FAP impairment remains unclear. The present study examined whether impairment in the perception of specific emotional valences (positive, negative) and neutrality were uniquely associated with social functioning, using a multimodal social functioning battery. A sample of 59 individuals with schizophrenia and 41 controls completed a computerized FAP task, and measures of functional capacity, social competence, and social attainment. Participants also underwent neuropsychological testing and symptom assessment. Regression analyses revealed that only accurately perceiving negative emotions explained significant variance (7.9%) in functional capacity after accounting for neurocognitive function and symptoms. Partial correlations indicated that accurately perceiving anger, in particular, was positively correlated with functional capacity. FAP for positive, negative, or neutral emotions were not related to social competence or social attainment. Our findings were consistent with prior literature suggesting negative emotions are related to functional capacity in schizophrenia. Furthermore, the observed relationship between perceiving anger and performance of everyday living skills is novel and warrants further exploration.
Kilman, Lisa; Zekveld, Adriana; Hällgren, Mathias; Rönnberg, Jerker
This study evaluated how hearing-impaired listeners perceive native (Swedish) and nonnative (English) speech in the presence of noise- and speech maskers. Speech reception thresholds were measured for four different masker types for each target language. The maskers consisted of stationary and fluctuating noise and two-talker babble in Swedish and English. Twenty-three hearing-impaired native Swedish listeners participated, aged between 28 and 65 years. The participants also performed cognitive tests of working memory capacity in Swedish and English, nonverbal reasoning, and an English proficiency test. Results indicated that the speech maskers were more interfering than the noise maskers in both target languages. The larger need for phonetic and semantic cues in a nonnative language makes a stationary masker relatively more challenging than a fluctuating-noise masker. Better hearing acuity (pure tone average) was associated with better perception of the target speech in Swedish, and better English proficiency was associated with better speech perception in English. Larger working memory and better pure tone averages were related to the better perception of speech masked with fluctuating noise in the nonnative language. This suggests that both are relevant in highly taxing conditions. A large variance in performance between the listeners was observed, especially for speech perception in the nonnative language.
Dagenais, Paul A; Stallworth, Jamequa A
The purpose of this study was to determine the influence of dialect upon the perception of dysarthric speech. Speakers and listeners were self-identifying as either Caucasian American or African American. Three speakers were Caucasian American, three were African American. Four speakers had experienced a CVA and were dysarthric. Listeners were age matched and were equally divided for gender. Readers recorded 14 word sentences from the Assessment of Intelligibility of Dysarthric Speech. Listeners provided ratings of intelligibility, comprehensibility, and acceptability. Own-race biases were found for all measures; however, significant findings were found for intelligibility and comprehensibility in that the Caucasian Americans provided significantly higher scores for Caucasian American speakers. Clinical implications are discussed.
Ocklenburg, Sebastian; Arning, Larissa; Gerding, Wanda M; Epplen, Jörg T; Güntürkün, Onur; Beste, Christian
Left-hemispheric language dominance is a well-known characteristic of the human language system, but the molecular mechanisms underlying this crucial feature of vocal communication are still far from being understood. The forkhead box P2 gene FOXP2, which has been related to speech development, constitutes an interesting candidate gene in this regard. Therefore, the present study was aimed at investigating effects of variation in FOXP2 on individual language dominance. To this end, we used a dichotic listening and a visual half-field task in a sample of 456 healthy adults. The FOXP2 SNPs rs2396753 and rs12533005 were found to be significantly associated with the distribution of correct answers on the dichotic listening task. These results show that variation in FOXP2 may contribute to the inter-individual variability in hemispheric asymmetries for speech perception.
Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A
Good speech perception and communication skills in everyday life are crucial for participation and well-being, and are therefore an overarching aim of auditory rehabilitation. Both behavioral and self-report measures can be used to assess these skills. However, correlations between behavioral and self-report speech perception measures are often low. One possible explanation is that there is a mismatch between the specific situations used in the assessment of these skills in each method, and a more careful matching across situations might improve consistency of results. The role that cognition plays in specific speech situations may also be important for understanding communication, as speech perception tests vary in their cognitive demands. In this study, the role of executive function, working memory (WM) and attention in behavioral and self-report measures of speech perception was investigated. Thirty existing hearing aid users with mild-to-moderate hearing loss aged between 50 and 74 years completed a behavioral test battery with speech perception tests ranging from phoneme discrimination in modulated noise (easy) to words in multi-talker babble (medium) and keyword perception in a carrier sentence against a distractor voice (difficult). In addition, a self-report measure of aided communication, residual disability from the Glasgow Hearing Aid Benefit Profile, was obtained. Correlations between speech perception tests and self-report measures were higher when specific speech situations across both were matched. Cognition correlated with behavioral speech perception test results but not with self-report. Only the most difficult speech perception test, keyword perception in a carrier sentence with a competing distractor voice, engaged executive functions in addition to WM. In conclusion, any relationship between behavioral and self-report speech perception is not mediated by a shared correlation with cognition.
Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.
Good speech perception and communication skills in everyday life are crucial for participation and well-being, and are therefore an overarching aim of auditory rehabilitation. Both behavioral and self-report measures can be used to assess these skills. However, correlations between behavioral and self-report speech perception measures are often low. One possible explanation is that there is a mismatch between the specific situations used in the assessment of these skills in each method, and a more careful matching across situations might improve consistency of results. The role that cognition plays in specific speech situations may also be important for understanding communication, as speech perception tests vary in their cognitive demands. In this study, the role of executive function, working memory (WM) and attention in behavioral and self-report measures of speech perception was investigated. Thirty existing hearing aid users with mild-to-moderate hearing loss aged between 50 and 74 years completed a behavioral test battery with speech perception tests ranging from phoneme discrimination in modulated noise (easy) to words in multi-talker babble (medium) and keyword perception in a carrier sentence against a distractor voice (difficult). In addition, a self-report measure of aided communication, residual disability from the Glasgow Hearing Aid Benefit Profile, was obtained. Correlations between speech perception tests and self-report measures were higher when specific speech situations across both were matched. Cognition correlated with behavioral speech perception test results but not with self-report. Only the most difficult speech perception test, keyword perception in a carrier sentence with a competing distractor voice, engaged executive functions in addition to WM. In conclusion, any relationship between behavioral and self-report speech perception is not mediated by a shared correlation with cognition. PMID:27242564
Cason, Nia; Astésano, Corine; Schön, Daniele
Following findings that musical rhythmic priming enhances subsequent speech perception, we investigated whether rhythmic priming for spoken sentences can enhance phonological processing - the building blocks of speech - and whether audio-motor training enhances this effect. Participants heard a metrical prime followed by a sentence (with a matching/mismatching prosodic structure), for which they performed a phoneme detection task. Behavioural (RT) data was collected from two groups: one who received audio-motor training, and one who did not. We hypothesised that 1) phonological processing would be enhanced in matching conditions, and 2) audio-motor training with the musical rhythms would enhance this effect. Indeed, providing a matching rhythmic prime context resulted in faster phoneme detection, thus revealing a cross-domain effect of musical rhythm on phonological processing. In addition, our results indicate that rhythmic audio-motor training enhances this priming effect. These results have important implications for rhythm-based speech therapies, and suggest that metrical rhythm in music and speech may rely on shared temporal processing brain resources.
Bidelman, Gavin M; Moreno, Sylvain; Alain, Claude
Speech perception requires the effortless mapping from smooth, seemingly continuous changes in sound features into discrete perceptual units, a conversion exemplified in the phenomenon of categorical perception. Explaining how/when the human brain performs this acoustic-phonetic transformation remains an elusive problem in current models and theories of speech perception. In previous attempts to decipher the neural basis of speech perception, it is often unclear whether the alleged brain correlates reflect an underlying percept or merely changes in neural activity that covary with parameters of the stimulus. Here, we recorded neuroelectric activity generated at both cortical and subcortical levels of the auditory pathway elicited by a speech vowel continuum whose percept varied categorically from /u/ to /a/. This integrative approach allows us to characterize how various auditory structures code, transform, and ultimately render the perception of speech material as well as dissociate brain responses reflecting changes in stimulus acoustics from those that index true internalized percepts. We find that activity from the brainstem mirrors properties of the speech waveform with remarkable fidelity, reflecting progressive changes in speech acoustics but not the discrete phonetic classes reported behaviorally. In comparison, patterns of late cortical evoked activity contain information reflecting distinct perceptual categories and predict the abstract phonetic speech boundaries heard by listeners. Our findings demonstrate a critical transformation in neural speech representations between brainstem and early auditory cortex analogous to an acoustic-phonetic mapping necessary to generate categorical speech percepts. Analytic modeling demonstrates that a simple nonlinearity accounts for the transformation between early (subcortical) brain activity and subsequent cortical/behavioral responses to speech (>150-200 ms) thereby describing a plausible mechanism by which the
Baker, Mallory; Buss, Emily; Jacks, Adam; Taylor, Crystal; Leibold, Lori J.
Purpose: This study evaluated the degree to which children benefit from the acoustic modifications made by talkers when they produce speech in noise. Method: A repeated measures design compared the speech perception performance of children (5-11 years) and adults in a 2-talker masker. Target speech was produced in a 2-talker background or in…
Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo
Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…
Gerrits, Ellen; de Bree, Elise
Speech perception and speech production were examined in 3-year-old Dutch children at familial risk of developing dyslexia. Their performance in speech sound categorisation and their production of words was compared to that of age-matched children with specific language impairment (SLI) and typically developing controls. We found that speech…
Möttönen, Riikka; Watkins, Kate E
Listening to speech modulates activity in human motor cortex. It is unclear, however, whether the motor cortex has an essential role in speech perception. Here, we aimed to determine whether the motor representations of articulators contribute to categorical perception of speech sounds. Categorization of continuously variable acoustic signals into discrete phonemes is a fundamental feature of speech communication. We used repetitive transcranial magnetic stimulation (rTMS) to temporarily disrupt the lip representation in the left primary motor cortex. This disruption impaired categorical perception of artificial acoustic continua ranging between two speech sounds that differed in place of articulation, in that the vocal tract is opened and closed rapidly either with the lips or the tip of the tongue (/ba/-/da/ and /pa/-/ta/). In contrast, it did not impair categorical perception of continua ranging between speech sounds that do not involve the lips in their articulation (/ka/-/ga/ and /da/-/ga/). Furthermore, an rTMS-induced disruption of the hand representation had no effect on categorical perception of either of the tested continua (/ba/-da/ and /ka/-/ga/). These findings indicate that motor circuits controlling production of speech sounds also contribute to their perception. Mapping acoustically highly variable speech sounds onto less variable motor representations may facilitate their phonemic categorization and be important for robust speech perception.
Goehring, Jenny L.; Hughes, Michelle L.; Baudhuin, Jacquelyn L.; Valente, Daniel L.; McCreery, Ryan W.; Diaz, Gina R.; Sanford, Todd; Harpster, Roger
Purpose: In this study, the authors evaluated the effect of remote system and acoustic environment on speech perception via telehealth with cochlear implant recipients. Method: Speech perception was measured in quiet and in noise. Systems evaluated were Polycom visual concert (PVC) and a hybrid presentation system (HPS). Each system was evaluated…
Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian
Speech perception of four phonetic categories (voicing, place, manner, and nasality) was investigated in children with specific language impairment (SLI) (n=20) and age-matched controls (n=19) in quiet and various noise conditions using an AXB two-alternative forced-choice paradigm. Children with SLI exhibited robust speech perception deficits in…
Mealings, Kiri T.; Demuth, Katherine; Buchholz, Jörg; Dillon, Harvey
Purpose: Open-plan classroom styles are increasingly being adopted in Australia despite evidence that their high intrusive noise levels adversely affect learning. The aim of this study was to develop a new Australian speech perception task (the Mealings, Demuth, Dillon, and Buchholz Classroom Speech Perception Test) and use it in an open-plan…
Saalasti, Satu; Katsyri, Jari; Tiippana, Kaisa; Laine-Hernandez, Mari; von Wendt, Lennart; Sams, Mikko
Audiovisual speech perception was studied in adults with Asperger syndrome (AS), by utilizing the McGurk effect, in which conflicting visual articulation alters the perception of heard speech. The AS group perceived the audiovisual stimuli differently from age, sex and IQ matched controls. When a voice saying /p/ was presented with a face…
Cheung, Him; Chung, Kevin Kien Hoa; Wong, Simpson Wai Lap; McBride-Chang, Catherine; Penney, Trevor Bruce; Ho, Connie Suk-Han
In this study, we examined the intercorrelations among speech perception, metalinguistic (i.e., phonological and morphological) awareness, word reading, and vocabulary in a 1st language (L1) and a 2nd language (L2). Results from 3 age groups of Chinese-English bilingual children showed that speech perception was more predictive of reading and…
Hickok, Gregory; Costanzo, Maddalena; Capasso, Rita; Miceli, Gabriele
Motor theories of speech perception have been re-vitalized as a consequence of the discovery of mirror neurons. Some authors have even promoted a strong version of the motor theory, arguing that the motor speech system is critical for perception. Part of the evidence that is cited in favor of this claim is the observation from the early 1980s that…
Inoue, Tomohiro; Higashibara, Fumiko; Okazaki, Shinji; Maekawa, Hisao
We examined the effects of presentation rate on speech perception in noise and its relation to reading in 117 typically developing (TD) children and 10 children with reading difficulties (RD) in Japan. Responses in a speech perception task were measured for speed, accuracy, and stability in two conditions that varied stimulus presentation rate:…
Jain, Chandni; Mohamed, Hijas; Kumar, Ajith U.
The aim of the study was to assess the effect of short-term musical training on speech perception in noise. In the present study speech perception in noise was measured pre- and post- short-term musical training. The musical training involved auditory perceptual training for raga identification of two Carnatic ragas. The training was given for eight sessions. A total of 18 normal hearing adults in the age range of 18-25 years participated in the study wherein group 1 consisted of ten individuals who underwent musical training and group 2 consisted of eight individuals who did not undergo any training. Results revealed that post training, speech perception in noise improved significantly in group 1, whereas group 2 did not show any changes in speech perception scores. Thus, short-term musical training shows an enhancement of speech perception in the presence of noise. However, generalization and long-term maintenance of these benefits needs to be evaluated. PMID:26557359
Burgaleta, Miguel; Baus, Cristina; Díaz, Begoña; Sebastián-Gallés, Núria
Morphology of the human brain predicts the speed at which individuals learn to distinguish novel foreign speech sounds after laboratory training. However, little is known about the neuroanatomical basis of individual differences in speech perception when a second language (L2) has been learned in natural environments for extended periods of time. In the present study, two samples of highly proficient bilinguals were selected according to their ability to distinguish between very similar L2 sounds, either isolated (prelexical) or within words (lexical). Structural MRI was acquired and processed to estimate vertex-wise indices of cortical thickness (CT) and surface area (CSA), and the association between cortical morphology and behavioral performance was inspected. Results revealed that performance in the lexical task was negatively associated with the thickness of the left temporal cortex and angular gyrus, as well as with the surface area of the left precuneus. Our findings, consistently with previous fMRI studies, demonstrate that morphology of the reported areas is relevant for word recognition based on phonological information. Further, we discuss the possibility that increased CT and CSA in sound-to-meaning mapping regions, found for poor non-native speech sounds perceivers, would have plastically arisen after extended periods of increased functional activity during L2 exposure.
Smith, David R. R.; Patterson, Roy D.; Turner, Richard; Kawahara, Hideki; Irino, Toshio
There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds. .
Hakvoort, Britt; de Bree, Elise; van der Leij, Aryan; Maassen, Ben; van Setten, Ellie; Maurits, Natasha; van Zuijen, Titia L.
Purpose: This study assessed whether a categorical speech perception (CP) deficit is associated with dyslexia or familial risk for dyslexia, by exploring a possible cascading relation from speech perception to phonology to reading and by identifying whether speech perception distinguishes familial risk (FR) children with dyslexia (FRD) from those…
Mendel, Lisa Lucks; Roberts, Richard A; Walton, Julie H
The effects of sound field FM amplification (SFA) on speech perception performance were investigated in this 2-year study. Kindergarten children with normal hearing were randomly assigned to a treatment group, which comprised 7 classrooms that had SFA systems installed in them, and to a control group, which comprised another 7 classrooms that did not have any amplification available. The children were followed from the beginning of kindergarten through the end of first grade. Improvements in speech perception performance were measured for both groups, with the treatment group demonstrating progress much sooner than the control group. However, this difference was not apparent by the end of the study. The only significant difference measured between the treatment and control groups was that the treatment group performed significantly better than the control group when the stimuli were presented with SFA for the treatment group and without SFA for the control group. The teachers who used SFA enjoyed using amplification in their classrooms and felt that their students enjoyed using it as well.
Alexandrou, Anna Maria; Saarinen, Timo; Mäkelä, Sasu; Kujala, Jan; Salmelin, Riitta
Current understanding of the cortical mechanisms of speech perception and production stems mostly from studies that focus on single words or sentences. However, it has been suggested that processing of real-life connected speech may rely on additional cortical mechanisms. In the present study, we examined the neural substrates of natural speech production and perception with magnetoencephalography by modulating three central features related to speech: amount of linguistic content, speaking rate and social relevance. The amount of linguistic content was modulated by contrasting natural speech production and perception to speech-like non-linguistic tasks. Meaningful speech was produced and perceived at three speaking rates: normal, slow and fast. Social relevance was probed by having participants attend to speech produced by themselves and an unknown person. These speech-related features were each associated with distinct spatiospectral modulation patterns that involved cortical regions in both hemispheres. Natural speech processing markedly engaged the right hemisphere in addition to the left. In particular, the right temporo-parietal junction, previously linked to attentional processes and social cognition, was highlighted in the task modulations. The present findings suggest that its functional role extends to active generation and perception of meaningful, socially relevant speech.
Larsson, Matz; Ekström, Seth Reino; Ranjbar, Parivash
Human locomotion typically creates noise, a possible consequence of which is the masking of sound signals originating in the surroundings. When walking side by side, people often subconsciously synchronize their steps. The neurophysiological and evolutionary background of this behavior is unclear. The present study investigated the potential of sound created by walking to mask perception of speech and compared the masking produced by walking in step with that produced by unsynchronized walking. The masking sound (footsteps on gravel) and the target sound (speech) were presented through the same speaker to 15 normal-hearing subjects. The original recorded walking sound was modified to mimic the sound of two individuals walking in pace or walking out of synchrony. The participants were instructed to adjust the sound level of the target sound until they could just comprehend the speech signal (“just follow conversation” or JFC level) when presented simultaneously with synchronized or unsynchronized walking sound at 40 dBA, 50 dBA, 60 dBA, or 70 dBA. Synchronized walking sounds produced slightly less masking of speech than did unsynchronized sound. The median JFC threshold in the synchronized condition was 38.5 dBA, while the corresponding value for the unsynchronized condition was 41.2 dBA. Combined results at all sound pressure levels showed an improvement in the signal-to-noise ratio (SNR) for synchronized footsteps; the median difference was 2.7 dB and the mean difference was 1.2 dB [P < 0.001, repeated-measures analysis of variance (RM-ANOVA)]. The difference was significant for masker levels of 50 dBA and 60 dBA, but not for 40 dBA or 70 dBA. This study provides evidence that synchronized walking may reduce the masking potential of footsteps. PMID:26168953
Pons, Ferran; Lewkowicz, David J
We investigated the effects of linguistic experience and language familiarity on the perception of audio-visual (A-V) synchrony in fluent speech. In Experiment 1, we tested a group of monolingual Spanish- and Catalan-learning 8-month-old infants to a video clip of a person speaking Spanish. Following habituation to the audiovisually synchronous video, infants saw and heard desynchronized clips of the same video where the audio stream now preceded the video stream by 366, 500, or 666 ms. In Experiment 2, monolingual Catalan and Spanish infants were tested with a video clip of a person speaking English. Results indicated that in both experiments, infants detected a 666 and a 500 ms asynchrony. That is, their responsiveness to A-V synchrony was the same regardless of their specific linguistic experience or familiarity with the tested language. Compared to previous results from infant studies with isolated audiovisual syllables, these results show that infants are more sensitive to A-V temporal relations inherent in fluent speech. Furthermore, the absence of a language familiarity effect on the detection of A-V speech asynchrony at eight months of age is consistent with the broad perceptual tuning usually observed in infant response to linguistic input at this age.
Yamamoto, Kosuke; Kawabata, Hideaki
We ordinarily speak fluently, even though our perceptions of our own voices are disrupted by various environmental acoustic properties. The underlying mechanism of speech is supposed to monitor the temporal relationship between speech production and the perception of auditory feedback, as suggested by a reduction in speech fluency when the speaker is exposed to delayed auditory feedback (DAF). While many studies have reported that DAF influences speech motor processing, its relationship to the temporal tuning effect on multimodal integration, or temporal recalibration, remains unclear. We investigated whether the temporal aspects of both speech perception and production change due to adaptation to the delay between the motor sensation and the auditory feedback. This is a well-used method of inducing temporal recalibration. Participants continually read texts with specific DAF times in order to adapt to the delay. Then, they judged the simultaneity between the motor sensation and the vocal feedback. We measured the rates of speech with which participants read the texts in both the exposure and re-exposure phases. We found that exposure to DAF changed both the rate of speech and the simultaneity judgment, that is, participants' speech gained fluency. Although we also found that a delay of 200 ms appeared to be most effective in decreasing the rates of speech and shifting the distribution on the simultaneity judgment, there was no correlation between these measurements. These findings suggest that both speech motor production and multimodal perception are adaptive to temporal lag but are processed in distinct ways.
Arsenault, Jessica S; Buchsbaum, Bradley R
The motor theory of speech perception has experienced a recent revival due to a number of studies implicating the motor system during speech perception. In a key study, Pulvermüller et al. (2006) showed that premotor/motor cortex differentially responds to the passive auditory perception of lip and tongue speech sounds. However, no study has yet attempted to replicate this important finding from nearly a decade ago. The objective of the current study was to replicate the principal finding of Pulvermüller et al. (2006) and generalize it to a larger set of speech tokens while applying a more powerful statistical approach using multivariate pattern analysis (MVPA). Participants performed an articulatory localizer as well as a speech perception task where they passively listened to a set of eight syllables while undergoing fMRI. Both univariate and multivariate analyses failed to find evidence for somatotopic coding in motor or premotor cortex during speech perception. Positive evidence for the null hypothesis was further confirmed by Bayesian analyses. Results consistently show that while the lip and tongue areas of the motor cortex are sensitive to movements of the articulators, they do not appear to preferentially respond to labial and alveolar speech sounds during passive speech perception.
Reed, C M; Durlach, N I; Braida, L D; Schultz, M C
In the Tadoma method of communication, deaf-blind individuals receive speech by placing a hand on the face and neck of the talker and monitoring actions associated with speech production. Previous research has documented the speech perception, speech production, and linguistic abilities of highly experienced users of the Tadoma method. The current study was performed to gain further insight into the cues involved in the perception of speech segments through Tadoma. Small-set segmental identification experiments were conducted in which the subjects' access to various types of articulatory information was systematically varied by imposing limitations on the contact of the hand with the face. Results obtained on 3 deaf-blind, highly experienced users of Tadoma were examined in terms of percent-correct scores, information transfer, and reception of speech features for each of sixteen experimental conditions. The results were generally consistent with expectations based on the speech cues assumed to be available in the various hand positions.
Schwartz, Richard G.; Scheffler, Frances L. V.; Lopez, Karece
Using an identification task, we examined lexical effects on the perception of vowel duration as a cue to final consonant voicing in 12 children with specific language impairment (SLI) and 13 age-matched (6;6–9;6) peers with typical language development (TLD). Naturally recorded CV/t/sets [word–word (WW), nonword–nonword (NN), word–nonword (WN) and nonword–word (NW)] were edited to create four 12-step continua. Both groups used duration as an identification cue but it was a weaker cue for children with SLI. For NN, WN and NW continua, children with SLI demonstrated certainty at shorter vowel durations than their TLD peers. Except for the WN continuum, children with SLI demonstrated category boundaries at shorter vowel durations. Both groups exhibited lexical effects, but they were stronger in the SLI group. Performance on the WW continuum indicated adequate perception of fine-grained duration differences. Strong lexical effects indicated reliance on familiar words in speech perception. PMID:23635335
Leone, Dorothy; Levy, Erika S.
Purpose: Much of a child's day is spent listening to speech in the presence of background noise. Although accurate vowel perception is important for listeners' accurate speech perception and comprehension, little is known about children's vowel perception in noise. "Clear speech" is a speech style frequently used by talkers in the…
Bazon, Aline Cristine; Mantello, Erika Barioni; Gonçales, Alina Sanches; Isaac, Myriam de Lima; Hyppolito, Miguel Angelo; Reis, Ana Cláudia Mirândola Barbosa
Introduction The objective of the evaluation of auditory perception of cochlear implant users is to determine how the acoustic signal is processed, leading to the recognition and understanding of sound. Objective To investigate the differences in the process of auditory speech perception in individuals with postlingual hearing loss wearing a cochlear implant, using two different speech coding strategies, and to analyze speech perception and handicap perception in relation to the strategy used. Methods This study is prospective cross-sectional cohort study of a descriptive character. We selected ten cochlear implant users that were characterized by hearing threshold by the application of speech perception tests and of the Hearing Handicap Inventory for Adults. Results There was no significant difference when comparing the variables subject age, age at acquisition of hearing loss, etiology, time of hearing deprivation, time of cochlear implant use and mean hearing threshold with the cochlear implant with the shift in speech coding strategy. There was no relationship between lack of handicap perception and improvement in speech perception in both speech coding strategies used. Conclusion There was no significant difference between the strategies evaluated and no relation was observed between them and the variables studied. PMID:27413409
Uhler, Kristin M.; Baca, Rosalinda; Dudas, Emily; Fredrickson, Tammy
Background Speech perception measures have long been considered an integral piece of the audiological assessment battery. Currently, a prelinguistic, standardized measure of speech perception is missing in the clinical assessment battery for infants and young toddlers. Such a measure would allow systematic assessment of speech perception abilities of infants as well as the potential to investigate the impact early identification of hearing loss and early fitting of amplification have on the auditory pathways. Purpose To investigate the impact of sensation level (SL) on the ability of infants with NH to discriminate /a-i/ and /ba-da/ and to determine if performance on the two contrasts are significantly different in predicting the discrimination criterion. Research Design The design was based on a survival analysis model for event occurrence and a repeated measures logistic model for binary outcomes. The outcome for survival analysis was the minimum SL for criterion and the outcome for the logistic regression model was the presence/absence of achieving the criterion. Criterion achievement was designated when an infant’s proportion correct score was ≥0.75 on the discrimination performance task. Study Sample Twenty-two infants with NH sensitivity participated in this study. There were 9 males and 13 females, aged 6–14 months. Data Collection and Analysis Testing took place over two to three sessions. The first session consisted of a hearing test, threshold assessment of the two speech sounds (/a/ and /i/), and if time and attention allowed, Visual Reinforcement Infant Speech Discrimination (VRISD). The second session consisted of VRISD assessment for the two test contrasts (/a-i/ and /ba-da/). The presentation level started at 50 dBA. If the infant was unable to successfully achieve criterion (≥0.75) at 50 dBA, the presentation level was increased to 70 dBA followed by 60 dBA. Data examination included an event analysis, which provided the probability of
Nagaraj, Naveen K; Knapp, Andrea N
Understanding interrupted speech requires top-down linguistic and cognitive restoration mechanisms. To investigate the relation between working memory (WM) and perception of interrupted speech, 20 young adults were asked to recognize sentences interrupted at 2 Hz, 8 Hz, and a combination of 2 and 8 Hz. WM was measured using automated reading and operation span tasks. Interestingly, the results presented here revealed no statistical relation between any of the interrupted speech recognition scores and WM scores. This finding is in agreement with previous findings that suggest greater reliance on linguistic factors relative to cognitive factors during perception of interrupted speech.
Humphries, Colin; Sabri, Merav; Lewis, Kimberly; Liebenthal, Einat
Human speech consists of a variety of articulated sounds that vary dynamically in spectral composition. We investigated the neural activity associated with the perception of two types of speech segments: (a) the period of rapid spectral transition occurring at the beginning of a stop-consonant vowel (CV) syllable and (b) the subsequent spectral steady-state period occurring during the vowel segment of the syllable. Functional magnetic resonance imaging (fMRI) was recorded while subjects listened to series of synthesized CV syllables and non-phonemic control sounds. Adaptation to specific sound features was measured by varying either the transition or steady-state periods of the synthesized sounds. Two spatially distinct brain areas in the superior temporal cortex were found that were sensitive to either the type of adaptation or the type of stimulus. In a relatively large section of the bilateral dorsal superior temporal gyrus (STG), activity varied as a function of adaptation type regardless of whether the stimuli were phonemic or non-phonemic. Immediately adjacent to this region in a more limited area of the ventral STG, increased activity was observed for phonemic trials compared to non-phonemic trials, however, no adaptation effects were found. In addition, a third area in the bilateral medial superior temporal plane showed increased activity to non-phonemic compared to phonemic sounds. The results suggest a multi-stage hierarchical stream for speech sound processing extending ventrolaterally from the superior temporal plane to the superior temporal sulcus. At successive stages in this hierarchy, neurons code for increasingly more complex spectrotemporal features. At the same time, these representations become more abstracted from the original acoustic form of the sound. PMID:25565939
Law, Jeremy M; Vandermosten, Maaike; Ghesquiere, Pol; Wouters, Jan
This study investigated whether auditory, speech perception, and phonological skills are tightly interrelated or independently contributing to reading. We assessed each of these three skills in 36 adults with a past diagnosis of dyslexia and 54 matched normal reading adults. Phonological skills were tested by the typical threefold tasks, i.e., rapid automatic naming, verbal short-term memory and phonological awareness. Dynamic auditory processing skills were assessed by means of a frequency modulation (FM) and an amplitude rise time (RT); an intensity discrimination task (ID) was included as a non-dynamic control task. Speech perception was assessed by means of sentences and words-in-noise tasks. Group analyses revealed significant group differences in auditory tasks (i.e., RT and ID) and in phonological processing measures, yet no differences were found for speech perception. In addition, performance on RT discrimination correlated with reading but this relation was mediated by phonological processing and not by speech-in-noise. Finally, inspection of the individual scores revealed that the dyslexic readers showed an increased proportion of deviant subjects on the slow-dynamic auditory and phonological tasks, yet each individual dyslexic reader does not display a clear pattern of deficiencies across the processing skills. Although our results support phonological and slow-rate dynamic auditory deficits which relate to literacy, they suggest that at the individual level, problems in reading and writing cannot be explained by the cascading auditory theory. Instead, dyslexic adults seem to vary considerably in the extent to which each of the auditory and phonological factors are expressed and interact with environmental and higher-order cognitive influences.
Law, Jeremy M.; Vandermosten, Maaike; Ghesquiere, Pol; Wouters, Jan
This study investigated whether auditory, speech perception, and phonological skills are tightly interrelated or independently contributing to reading. We assessed each of these three skills in 36 adults with a past diagnosis of dyslexia and 54 matched normal reading adults. Phonological skills were tested by the typical threefold tasks, i.e., rapid automatic naming, verbal short-term memory and phonological awareness. Dynamic auditory processing skills were assessed by means of a frequency modulation (FM) and an amplitude rise time (RT); an intensity discrimination task (ID) was included as a non-dynamic control task. Speech perception was assessed by means of sentences and words-in-noise tasks. Group analyses revealed significant group differences in auditory tasks (i.e., RT and ID) and in phonological processing measures, yet no differences were found for speech perception. In addition, performance on RT discrimination correlated with reading but this relation was mediated by phonological processing and not by speech-in-noise. Finally, inspection of the individual scores revealed that the dyslexic readers showed an increased proportion of deviant subjects on the slow-dynamic auditory and phonological tasks, yet each individual dyslexic reader does not display a clear pattern of deficiencies across the processing skills. Although our results support phonological and slow-rate dynamic auditory deficits which relate to literacy, they suggest that at the individual level, problems in reading and writing cannot be explained by the cascading auditory theory. Instead, dyslexic adults seem to vary considerably in the extent to which each of the auditory and phonological factors are expressed and interact with environmental and higher-order cognitive influences. PMID:25071512
Välimaa, T T; Sorri, M J; Löppönen, H J
This study was done to investigate the effect of a multichannel cochlear implant on speech perception and the functional benefit of cochlear implantation in Finnish-speaking postlingually deafened adults. Fourteen subjects were enrolled. Sentence and word recognition were studied with open-set tests auditorily only. One year after implantation, the listening performance was assessed by case histories and interviews. Before implantation for subjects with a hearing aid, the mean recognition score was 38% for sentences and 17% for words. One year after switching on the implant, the mean recognition score was 84% for sentences and 70% for words. Before implantation, the majority of the subjects were not aware of environmental sounds and only a few were able to recognize some environmental sounds. One year after switching on the implant, the majority of the subjects were able to use the telephone with a familiar speaker. All the subjects were able to recognize speech auditorily only and had thus gained good functional benefit from the implant.
Xie, Zilong; Yi, Han-Gyol; Chandrasekaran, Bharath
Nonnative speech poses a challenge to speech perception, especially in challenging listening environments. Audiovisual (AV) cues are known to improve native speech perception in noise. The extent to which AV cues benefit nonnative speech perception in noise, however, is much less well-understood. Here, we examined native American English-speaking and native Korean-speaking listeners' perception of English sentences produced by a native American English speaker and a native Korean speaker across a range of signal-to-noise ratios (SNRs;-4 to -20 dB) in audio-only and audiovisual conditions. We employed psychometric function analyses to characterize the pattern of AV benefit across SNRs. For native English speech, the largest AV benefit occurred at intermediate SNR (i.e. -12 dB); but for nonnative English speech, the largest AV benefit occurred at a higher SNR (-4 dB). The psychometric function analyses demonstrated that the AV benefit patterns were different between native and nonnative English speech. The nativeness of the listener exerted negligible effects on the AV benefit across SNRs. However, the nonnative listeners' ability to gain AV benefit in native English speech was related to their proficiency in English. These findings suggest that the native language background of both the speaker and listener clearly modulate the optimal use of AV cues in speech recognition.
Nonnative speech poses a challenge to speech perception, especially in challenging listening environments. Audiovisual (AV) cues are known to improve native speech perception in noise. The extent to which AV cues benefit nonnative speech perception in noise, however, is much less well-understood. Here, we examined native American English-speaking and native Korean-speaking listeners' perception of English sentences produced by a native American English speaker and a native Korean speaker across a range of signal-to-noise ratios (SNRs;−4 to −20 dB) in audio-only and audiovisual conditions. We employed psychometric function analyses to characterize the pattern of AV benefit across SNRs. For native English speech, the largest AV benefit occurred at intermediate SNR (i.e. −12 dB); but for nonnative English speech, the largest AV benefit occurred at a higher SNR (−4 dB). The psychometric function analyses demonstrated that the AV benefit patterns were different between native and nonnative English speech. The nativeness of the listener exerted negligible effects on the AV benefit across SNRs. However, the nonnative listeners' ability to gain AV benefit in native English speech was related to their proficiency in English. These findings suggest that the native language background of both the speaker and listener clearly modulate the optimal use of AV cues in speech recognition. PMID:25474650
Soshi, Takahiro; Hisanaga, Satoko; Kodama, Narihiro; Kanekama, Yori; Samejima, Yasuhiro; Yumoto, Eiji; Sekiyama, Kaoru
Speech perception in noise is still difficult for cochlear implant (CI) users even with many years of CI use. This study aimed to investigate neurophysiological and behavioral foundations for CI-dependent speech perception in noise. Seventeen post-lingual CI users and twelve age-matched normal hearing adults participated in two experiments. In Experiment 1, CI users' auditory-only word perception in noise (white noise, two-talker babble; at 10 dB SNR) degraded by about 15%, compared to that in quiet (48% accuracy). CI users' auditory-visual word perception was generally better than auditory-only perception. Auditory-visual word perception was degraded under information masking by the two-talker noise (69% accuracy), compared to that in quiet (77%). Such degradation was not observed for white noise (77%), suggesting that the overcoming of information masking is an important issue for CI users' speech perception improvement. In Experiment 2, event-related cortical potentials were recorded in an auditory oddball task in quiet and noise (white noise only). Similarly to the normal hearing participants, the CI users showed the mismatch negative response (MNR) to deviant speech in quiet, indicating automatic speech detection. In noise, the MNR disappeared in the CI users, and only the good CI performers (above 66% accuracy) showed P300 (P3) like the normal hearing participants. P3 amplitude in the CI users was positively correlated with speech perception scores. These results suggest that CI users' difficulty in speech perception in noise is associated with the lack of automatic speech detection indicated by the MNR. Successful performance in noise may begin with attended auditory processing indicated by P3.
Pajak, Bozena; Levy, Roger
The end-result of perceptual reorganization in infancy is currently viewed as a reconfigured perceptual space, “warped” around native-language phonetic categories, which then acts as a direct perceptual filter on any non-native sounds: naïve-listener discrimination of non-native-sounds is determined by their mapping onto native-language phonetic categories that are acoustically/articulatorily most similar. We report results that suggest another factor in non-native speech perception: some perceptual sensitivities cannot be attributed to listeners’ warped perceptual space alone, but rather to enhanced general sensitivity along phonetic dimensions that the listeners’ native language employs to distinguish between categories. Specifically, we show that the knowledge of a language with short and long vowel categories leads to enhanced discrimination of non-native consonant length contrasts. We argue that these results support a view of perceptual reorganization as the consequence of learners’ hierarchical inductive inferences about the structure of the language’s sound system: infants not only acquire the specific phonetic category inventory, but also draw higher-order generalizations over the set of those categories, such as the overall informativity of phonetic dimensions for sound categorization. Non-native sound perception is then also determined by sensitivities that emerge from these generalizations, rather than only by mappings of non-native sounds onto native-language phonetic categories. PMID:25197153
Guediche, Sara; Blumstein, Sheila E.; Fiez, Julie A.; Holt, Lori L.
Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. Specifically, we highlight the potential role of learning algorithms that rely on prediction error signals and discuss specific neural structures that are likely to contribute to such learning. To this end, we review behavioral studies, computational accounts, and neuroimaging findings related to adaptive plasticity in speech perception. Already, a few studies have alluded to a potential role of these mechanisms in adaptive plasticity in speech perception. Furthermore, we consider research topics in neuroscience that offer insight into how perception can be adaptively tuned to short-term deviations while balancing the need to maintain stability in the perception of learned long-term regularities. Consideration of the application and limitations of these algorithms in characterizing flexible speech perception under adverse conditions promises to inform theoretical models of speech. PMID:24427119
Saltuklaroglu, Tim; Kalinowski, Joseph; Dayalu, Vikram N; Stuart, Andrew; Rastatter, Michael P
In accord with a proposed innate link between speech perception and production (e.g., motor theory), this study provides compelling evidence for the inhibition of stuttering events in people who stutter prior to the initiation of the intended speech act, via both the perception and the production of speech gestures. Stuttering frequency during reading was reduced in 10 adults who stutter by approximately 40% in three of four experimental conditions: (1) following passive audiovisual presentation (i.e., viewing and hearing) of another person producing pseudostuttering (stutter-like syllabic repetitions) and following active shadowing of both (2) pseudostuttered and (3) fluent speech. Stuttering was not inhibited during reading following passive audiovisual presentation of fluent speech. Syllabic repetitions can inhibit stuttering both when produced and when perceived, and we suggest that these elementary stuttering forms may serve as compensatory speech gestures for releasing involuntary stuttering blocks by engaging mirror neuronal systems that are predisposed for fluent gestural imitation.
Eskelund, Kasper; MacDonald, Ewen N; Andersen, Tobias S
We perceive identity, expression and speech from faces. While perception of identity and expression depends crucially on the configuration of facial features it is less clear whether this holds for visual speech perception. Facial configuration is poorly perceived for upside-down faces as demonstrated by the Thatcher illusion in which the orientation of the eyes and mouth with respect to the face is inverted (Thatcherization). This gives the face a grotesque appearance but this is only seen when the face is upright. Thatcherization can likewise disrupt visual speech perception but only when the face is upright indicating that facial configuration can be important for visual speech perception. This effect can propagate to auditory speech perception through audiovisual integration so that Thatcherization disrupts the McGurk illusion in which visual speech perception alters perception of an incongruent acoustic phoneme. This is known as the McThatcher effect. Here we show that the McThatcher effect is reflected in the McGurk mismatch negativity (MMN). The MMN is an event-related potential elicited by a change in auditory perception. The McGurk-MMN can be elicited by a change in auditory perception due to the McGurk illusion without any change in the acoustic stimulus. We found that Thatcherization disrupted a strong McGurk illusion and a correspondingly strong McGurk-MMN only for upright faces. This confirms that facial configuration can be important for audiovisual speech perception. For inverted faces we found a weaker McGurk illusion but, surprisingly, no MMN. We also found no correlation between the strength of the McGurk illusion and the amplitude of the McGurk-MMN. We suggest that this may be due to a threshold effect so that a strong McGurk illusion is required to elicit the McGurk-MMN.
Venail, Frederic; Mathiolon, Caroline; Menjot de Champfleur, Sophie; Piron, Jean Pierre; Sicard, Marielle; Villemus, Françoise; Vessigaud, Marie Aude; Sterkers-Artieres, Françoise; Mondain, Michel; Uziel, Alain
Frequency-place mismatch often occurs after cochlear implantation, yet its effect on speech perception outcome remains unclear. In this article, we propose a method, based on cochlea imaging, to determine the cochlear place-frequency map. We evaluated the effect of frequency-place mismatch on speech perception outcome in subjects implanted with 3 different lengths of electrode arrays. A deeper insertion was responsible for a larger frequency-place mismatch and a decreased and delayed speech perception improvement by comparison with a shallower insertion, for which a similar but slighter effect was noticed. Our results support the notion that selecting an electrode array length adapted to each individual's cochlear anatomy may reduce frequency-place mismatch and thus improve speech perception outcome.
Moteki, Hideaki; Kitoh, Ryosuke; Tsukada, Keita; Iwasaki, Satoshi; Nishio, Shin-Ya
Conclusion: Bilateral electric acoustic stimulation (EAS) effectively improved speech perception in noise and sound localization in patients with high-frequency hearing loss. Objective: To evaluate bilateral EAS efficacy of sound localization detection and speech perception in noise in two cases of high-frequency hearing loss. Methods: Two female patients, aged 38 and 45 years, respectively, received bilateral EAS sequentially. Pure-tone audiometry was performed preoperatively and postoperatively to evaluate the hearing preservation in the lower frequencies. Speech perception outcomes in quiet and noise and sound localization were assessed with unilateral and bilateral EAS. Results: Residual hearing in the lower frequencies was well preserved after insertion of a FLEX24 electrode (24 mm) using the round window approach. After bilateral EAS, speech perception improved in quiet and even more so in noise. In addition, the sound localization ability of both cases with bilateral EAS improved remarkably. PMID:25423260
Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki
Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs' response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs' early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.
Robertson, Erin K.; Joanisse, Marc F.; Desroches, Amy S.; Ng, Stella
We examined categorical speech perception in school-age children with developmental dyslexia or Specific Language Impairment (SLI), compared to age-matched and younger controls. Stimuli consisted of synthetic speech tokens in which place of articulation varied from "b" to "d". Children were tested on categorization, categorization in noise, and…
Casserly, Elizabeth D.
Real-time use of spoken language is a fundamentally interactive process involving speech perception, speech production, linguistic competence, motor control, neurocognitive abilities such as working memory, attention, and executive function, environmental noise, conversational context, and--critically--the communicative interaction between…
Mitterer, Holger; Ernestus, Mirjam
This study reports a shadowing experiment, in which one has to repeat a speech stimulus as fast as possible. We tested claims about a direct link between perception and production based on speech gestures, and obtained two types of counterevidence. First, shadowing is not slowed down by a gestural mismatch between stimulus and response. Second,…
Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki
Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs’ response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs’ early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception. PMID:27734953
Pisoni, David B.; And Others
Summarizing research activities from January 1983 to December 1983, this is the ninth annual report of research on speech perception, analysis and synthesis conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, and progress reports. The report…
Pisoni, David B.; And Others
Summarizing research activities from January 1982 to December 1982, this is the eighth annual report of research on speech perception, analysis and synthesis conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report includes extended manuscripts, short reports, progress reports, and information…
Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.
According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…
Lavie, Limor; Banai, Karen; Karni, Avi; Attias, Joseph
Purpose: We tested whether using hearing aids can improve unaided performance in speech perception tasks in older adults with hearing impairment. Method: Unaided performance was evaluated in dichotic listening and speech-in-noise tests in 47 older adults with hearing impairment; 36 participants in 3 study groups were tested before hearing aid…
Ben-David, Boaz M.; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H. H. M.
Purpose: Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. Method: We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5…
Paatsch, Louise E.; Blamey, Peter J.; Sarant, Julia Z.; Martin, Lois F.A.; Bow, Catherine P.
Open-set word and sentence speech-perception test scores are commonly used as a measure of hearing abilities in children and adults using cochlear implants and/or hearing aids. These tests ore usually presented auditorily with a verbal response. In the case of children, scores are typically lower and more variable than for adults with hearing…
Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias
Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…
Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory
Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.
Venezia, Jonathan H.; Thurman, Steven M.; Matchin, William; George, Sahara E.; Hickok, Gregory
Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually-relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (∼35% identification of /apa/ compared to ∼5% in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually-relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (∼130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309
Davis, Matthew H.
Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior
Pons, Ferran; Andreu, Llorenc; Sanz-Torrent, Monica; Buil-Legaz, Lucia; Lewkowicz, David J.
Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the…
Gick, Bryan; Jóhannsdóttir, Kristín M.; Gibraiel, Diana; Mühlbauer, Jeff
A single pool of untrained subjects was tested for interactions across two bimodal perception conditions: audio-tactile, in which subjects heard and felt speech, and visual-tactile, in which subjects saw and felt speech. Identifications of English obstruent consonants were compared in bimodal and no-tactile baseline conditions. Results indicate that tactile information enhances speech perception by about 10 percent, regardless of which other mode (auditory or visual) is active. However, within-subject analysis indicates that individual subjects who benefit more from tactile information in one cross-modal condition tend to benefit less from tactile information in the other. PMID:18396924
Gick, Bryan; Jóhannsdóttir, Kristín M; Gibraiel, Diana; Mühlbauer, Jeff
A single pool of untrained subjects was tested for interactions across two bimodal perception conditions: audio-tactile, in which subjects heard and felt speech, and visual-tactile, in which subjects saw and felt speech. Identifications of English obstruent consonants were compared in bimodal and no-tactile baseline conditions. Results indicate that tactile information enhances speech perception by about 10 percent, regardless of which other mode (auditory or visual) is active. However, within-subject analysis indicates that individual subjects who benefit more from tactile information in one cross-modal condition tend to benefit less from tactile information in the other.
Yu Rao; Yiya Hao; Panahi, Issa M S; Kehtarnavaz, Nasser
In this paper, the development of a speech processing pipeline on smartphones for hearing aid devices (HADs) is presented. This pipeline is used for noise suppression and speech enhancement (SE) to improve speech quality and intelligibility. The proposed method is implemented to run in real-time on Android smartphones. The results of the testing conducted indicate that the proposed method suppresses the noise and improves the perceptual quality of speech in terms of three objective measures of perceptual evaluation of speech quality (PESQ), noise attenuation level (NAL), and the coherent speech intelligibility index (CSD).
Srinivasan, Arthi G.; Padilla, Monica; Shannon, Robert V.; Landsberger, David M.
Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. PMID:23467170
Srinivasan, Arthi G; Padilla, Monica; Shannon, Robert V; Landsberger, David M
Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical.
Ingvalson, Erin M; Dhar, Sumitrajit; Wong, Patrick C M; Liu, Hanjun
Working memory capacity has been linked to performance on many higher cognitive tasks, including the ability to perceive speech in noise. Current efforts to train working memory have demonstrated that working memory performance can be improved, suggesting that working memory training may lead to improved speech perception in noise. A further advantage of working memory training to improve speech perception in noise is that working memory training materials are often simple, such as letters or digits, making them easily translatable across languages. The current effort tested the hypothesis that working memory training would be associated with improved speech perception in noise and that materials would easily translate across languages. Native Mandarin Chinese and native English speakers completed ten days of reversed digit span training. Reading span and speech perception in noise both significantly improved following training, whereas untrained controls showed no gains. These data suggest that working memory training may be used to improve listeners' speech perception in noise and that the materials may be quickly adapted to a wide variety of listeners.
Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie
Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception.
Ingvalson, Erin M.; Dhar, Sumitrajit; Wong, Patrick C. M.; Liu, Hanjun
Working memory capacity has been linked to performance on many higher cognitive tasks, including the ability to perceive speech in noise. Current efforts to train working memory have demonstrated that working memory performance can be improved, suggesting that working memory training may lead to improved speech perception in noise. A further advantage of working memory training to improve speech perception in noise is that working memory training materials are often simple, such as letters or digits, making them easily translatable across languages. The current effort tested the hypothesis that working memory training would be associated with improved speech perception in noise and that materials would easily translate across languages. Native Mandarin Chinese and native English speakers completed ten days of reversed digit span training. Reading span and speech perception in noise both significantly improved following training, whereas untrained controls showed no gains. These data suggest that working memory training may be used to improve listeners' speech perception in noise and that the materials may be quickly adapted to a wide variety of listeners. PMID:26093435
Crew, Joseph D.; Galvin III, John J.; Landsberger, David M.; Fu, Qian-Jie
Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349
Samira, Anderson; Bharath, Chandrasekaran; Han-Gyol, Yi; Nina, Kraus
Children are known to be particularly vulnerable to the effects of noise on speech perception, and it is commonly acknowledged that failure of central auditory processes can lead to these difficulties with speech-in-noise (SIN) perception. Still, little is known about the mechanistic relationship between central processes and the perception of speech in noise. Our aims were two-fold: to examine the effects of noise on the central encoding of speech through measurement of cortical event-related potentials (ERPs) and to examine the relationship between cortical processing and behavioral indices of SIN perception. We recorded cortical responses to the speech syllable [da] in quiet and multi-talker babble noise in 32 children with a broad range of SIN perception. Outcomes suggest inordinate effects of noise on auditory function in the bottom SIN perceivers, compared with the top perceivers. The cortical amplitudes in the top SIN group remained stable between conditions, whereas amplitudes increased significantly in the bottom SIN group, suggesting a developmental central processing impairment in the bottom perceivers that may contribute to difficulties encoding and perceiving speech in challenging listening environments. PMID:20950282
Hazan, Valerie; Messaoud-Galusi, Souhila; Rosen, Stuart; Nouwens, Suzan; Shakespeare, Bethanie
Purpose: This study investigated whether adults with dyslexia show evidence of a consistent speech perception deficit by testing phoneme categorization and word perception in noise. Method: Seventeen adults with dyslexia and 20 average readers underwent a test battery including standardized reading, language and phonological awareness tests, and…
Calcus, Axelle; Lorenzi, Christian; Collet, Gregory; Colin, Cécile; Kolinsky, Régine
Purpose: Children with dyslexia have been suggested to experience deficits in both categorical perception (CP) and speech identification in noise (SIN) perception. However, results regarding both abilities are inconsistent, and the relationship between them is still unclear. Therefore, this study aimed to investigate the relationship between CP…
Thibodeau, Linda M.; Sussman, Harvey M.
Assesses the relationship between production deficits and speech perception abilities. A categorical perception paradigm was administered to a group of communication disordered children and to a matched control group. Group results are tentatively interpreted as showing a moderate perceptual deficit in the communication disordered children of this…
Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo
The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.
Schiavetti, Nicholas; Whitehead, Robert L; Metz, Dale Evan
This article reviews experiments completed over the past decade at the National Technical Institute for the Deaf and the State University of New York at Geneseo concerning speech produced during simultaneous communication (SC) and synthesizes the empirical evidence concerning the acoustical and perceptual characteristics of speech in SC. Comparisons are drawn between SC and other modes of rate-altered speech that have been used successfully to enhance communication effectiveness. Of particular importance are conclusions regarding the appropriateness of speech produced during SC for communication between hearing and hearing-impaired speakers and listeners and the appropriateness of SC use by parents and teachers for speech development of children with hearing impairment. This program of systematic basic research adds value to the discussion about the use of SC by focusing on the specific implications of empirical results regarding speech production and perception.
Jordan, Timothy R; Sheen, Mercedes; Abedipour, Lily; Paterson, Kevin B
When observing a talking face, it has often been argued that visual speech to the left and right of fixation may produce differences in performance due to divided projections to the two cerebral hemispheres. However, while it seems likely that such a division in hemispheric projections exists for areas away from fixation, the nature and existence of a functional division in visual speech perception at the foveal midline remains to be determined. We investigated this issue by presenting visual speech in matched hemiface displays to the left and right of a central fixation point, either exactly abutting the foveal midline or else located away from the midline in extrafoveal vision. The location of displays relative to the foveal midline was controlled precisely using an automated, gaze-contingent eye-tracking procedure. Visual speech perception showed a clear right hemifield advantage when presented in extrafoveal locations but no hemifield advantage (left or right) when presented abutting the foveal midline. Thus, while visual speech observed in extrafoveal vision appears to benefit from unilateral projections to left-hemisphere processes, no evidence was obtained to indicate that a functional division exists when visual speech is observed around the point of fixation. Implications of these findings for understanding visual speech perception and the nature of functional divisions in hemispheric projection are discussed.
Jordan, Timothy R.; Sheen, Mercedes; Abedipour, Lily; Paterson, Kevin B.
When observing a talking face, it has often been argued that visual speech to the left and right of fixation may produce differences in performance due to divided projections to the two cerebral hemispheres. However, while it seems likely that such a division in hemispheric projections exists for areas away from fixation, the nature and existence of a functional division in visual speech perception at the foveal midline remains to be determined. We investigated this issue by presenting visual speech in matched hemiface displays to the left and right of a central fixation point, either exactly abutting the foveal midline or else located away from the midline in extrafoveal vision. The location of displays relative to the foveal midline was controlled precisely using an automated, gaze-contingent eye-tracking procedure. Visual speech perception showed a clear right hemifield advantage when presented in extrafoveal locations but no hemifield advantage (left or right) when presented abutting the foveal midline. Thus, while visual speech observed in extrafoveal vision appears to benefit from unilateral projections to left-hemisphere processes, no evidence was obtained to indicate that a functional division exists when visual speech is observed around the point of fixation. Implications of these findings for understanding visual speech perception and the nature of functional divisions in hemispheric projection are discussed. PMID:25032950
Kleinschmidt, Dave F; Jaeger, T Florian
Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker's /p/ might be physically indistinguishable from another talker's /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively nonstationary world and propose that the speech perception system overcomes this challenge by (a) recognizing previously encountered situations, (b) generalizing to other situations based on previous similar experience, and (c) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (a) to (c) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on 2 critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires that listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these 2 aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension.
Noordenbos, M W; Segers, E; Serniclaes, W; Mitterer, H; Verhoeven, L
There is ample evidence that individuals with dyslexia have a phonological deficit. A growing body of research also suggests that individuals with dyslexia have problems with categorical perception, as evidenced by weaker discrimination of between-category differences and better discrimination of within-category differences compared to average readers. Whether the categorical perception problems of individuals with dyslexia are a result of their reading problems or a cause has yet to be determined. Whether the observed perception deficit relates to a more general auditory deficit or is specific to speech also has yet to be determined. To shed more light on these issues, the categorical perception abilities of children at risk for dyslexia and chronological age controls were investigated before and after the onset of formal reading instruction in a longitudinal study. Both identification and discrimination data were collected using identical paradigms for speech and non-speech stimuli. Results showed the children at risk for dyslexia to shift from an allophonic mode of perception in kindergarten to a phonemic mode of perception in first grade, while the control group showed a phonemic mode already in kindergarten. The children at risk for dyslexia thus showed an allophonic perception deficit in kindergarten, which was later suppressed by phonemic perception as a result of formal reading instruction in first grade; allophonic perception in kindergarten can thus be treated as a clinical marker for the possibility of later reading problems.
Bidelman, Gavin M; Lee, Chia-Cheng
Categorical perception (CP) represents a fundamental process in converting continuous speech acoustics into invariant percepts. Using scalp-recorded event-related brain potentials (ERPs), we investigated how tone-language experience and stimulus context influence the CP for lexical tones-pitch patterns used by a majority of the world's languages to signal word meaning. Stimuli were vowel pairs overlaid with a high-level tone (T1) followed by a pitch continuum spanning between dipping (T3) and rising (T2) contours of the Mandarin tonal space. To vary context, T1 either preceded or followed the critical T2/T3 continuum. Behaviorally, native Chinese showed stronger CP as evident by their steeper, more dichotomous psychometric functions and faster identification of linguistic pitch patterns than native English-speaking controls. Stimulus context produced shifts in both groups' categorical boundary but was more exaggerated in native listeners. Analysis of source activity extracted from primary auditory cortex revealed overall stronger neural encoding of tone in Chinese compared to English, indicating experience-dependent plasticity in cortical pitch processing. More critically, "neurometric" functions derived from multidimensional scaling and clustering of source ERPs established: (i) early auditory cortical activity could accurately predict listeners' psychometric speech identification and contextual shifts in the perceptual boundary; (ii) neurometric profiles were organized more categorically in native speakers. Our data show that tone-language experience refines early auditory cortical brain representations so as to supply more faithful templates to neural mechanisms subserving lexical pitch categorization. We infer that contextual influence on the CP for tones is determined by language experience and the frequency of pitch patterns as they occur in listeners' native lexicon.
Calandruccio, Lauren; Gomez, Bianca; Buss, Emily; Leibold, Lori J.
Purpose To develop a task to evaluate children’s English and Spanish speech perception abilities in either noise or competing speech maskers. Methods Eight bilingual Spanish/English and eight age matched monolingual English children (ages 4.9 –16.4 years) were tested. A forced-choice, picture-pointing paradigm was selected for adaptively estimating masked speech reception thresholds. Speech stimuli were spoken by simultaneous bilingual Spanish/English talkers. The target stimuli were thirty disyllabic English and Spanish words, familiar to five-year-olds, and easily illustrated. Competing stimuli included either two-talker English or two-talker Spanish speech (corresponding to target language) and spectrally matched noise. Results For both groups of children, regardless of test language, performance was significantly worse for the two-talker than the noise masker. No difference in performance was found between bilingual and monolingual children. Bilingual children performed significantly better in English than in Spanish in competing speech. For all listening conditions, performance improved with increasing age. Conclusions Results indicate that the stimuli and task are appropriate for speech recognition testing in both languages, providing a more conventional measure of speech-in-noise perception as well as a measure of complex listening. Further research is needed to determine performance for Spanish-dominant listeners and to evaluate the feasibility of implementation into routine clinical use. PMID:24686915
Evans, S; McGettigan, C; Agnew, ZK; Rosen, S; Scott, SK
Spoken conversations typically take place in noisy environments and different kinds of masking sounds place differing demands on cognitive resources. Previous studies, examining the modulation of neural activity associated with the properties of competing sounds, have shown that additional speech streams engage the superior temporal gyrus. However, the absence of a condition in which target speech was heard without additional masking made it difficult to identify brain networks specific to masking and to ascertain the extent to which competing speech was processed equivalently to target speech. In this study, we scanned young healthy adults with continuous functional Magnetic Resonance Imaging (fMRI), whilst they listened to stories masked by sounds that differed in their similarity to speech. We show that auditory attention and control networks are activated during attentive listening to masked speech in the absence of an overt behavioural task. We demonstrate that competing speech is processed predominantly in the left hemisphere within the same pathway as target speech but is not treated equivalently within that stream, and that individuals who perform better in speech in noise tasks activate the left mid-posterior superior temporal gyrus more. Finally, we identify neural responses associated with the onset of sounds in the auditory environment, activity was found within right lateralised frontal regions consistent with a phasic alerting response. Taken together, these results provide a comprehensive account of the neural processes involved in listening in noise. PMID:26696297
Altmann, Christian F; Uesaki, Maiko; Ono, Kentaro; Matsuhashi, Masao; Mima, Tatsuya; Fukuyama, Hidenao
Categorical perception of phonemes describes the phenomenon that, when phonemes are classified they are often perceived to fall into distinct categories even though physically they follow a continuum along a feature dimension. While consonants such as plosives have been proposed to be perceived categorically, the representation of vowels has been described to be more continuous. We aimed at testing this difference in representation at a behavioral and neurophysiological level using human magnetoencephalography (MEG). To this end, we designed stimuli based on natural speech by morphing along a phonological continuum entailing changes of the voiced stop-consonant or the steady-state vowel of a consonant-vowel (CV) syllable. Then, while recording MEG, we presented participants with consecutive pairs of either same or different CV syllables. The differences were such that either both CV syllables were from within the same category or belonged to different categories. During the MEG experiment, the participants actively discriminated the stimulus pairs. Behaviorally, we found that discrimination was easier for the between-compared to the within-category contrast for both consonants and vowels. However, this categorical effect was significantly stronger for the consonants compared to vowels, in line with a more continuous representation of vowels. At the neural level, we observed significant repetition suppression of MEG evoked fields, i.e. lower amplitudes for physically same compared to different stimulus pairs, at around 430 to 500ms after the onset of the second stimulus. Source reconstruction revealed generating sources of this repetition suppression effect within left superior temporal sulcus and gyrus, posterior to Heschl׳s gyrus. A region-of-interest analysis within this region showed a clear categorical effect for consonants, but not for vowels, providing further evidence for the important role of left superior temporal areas in categorical representation
Li, Yongxin; Zhang, Guoping; Galvin, John J.; Fu, Qian-Jie
For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI) and hearing aid (HA) typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0) information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2) information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects’ HA-aided pure-tone average (PTA) thresholds between 250 and 2000 Hz; subjects were divided into two groups: “better” PTA (<50 dB HL) or “poorer” PTA (>50 dB HL). The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12), further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception. PMID:25386962
Kant, Anjali R; Pathak, Sonal
The present study aims to provide a qualitative description and comparison of speech perception performance using model based tests like multisyllabic lexical neighborhood test (MLNT) and lexical neighborhood test (LNT), in early and late implanted (prelingual) hearing impaired children using cochlear implants. The subjects comprised of cochlear implantees; Group I (early implantees)-n = 15, 3-6 years of age; mean age at implantation-3½ years. Group II (late implantees)-n = 15, 7-13 years of age; mean age at implantation-5 years. The tests were presented in a sound treated room at 70 dBSPL. The children were instructed to repeat the words on hearing them. Responses were scored as percentage of words correctly repeated. Their means were computed. The late implantees achieved higher scores for words on MLNT than those on LNT. This may imply that late implantees are making use of length cues in order to aid them in speech perception. The major phonological process used by early implantees was deletion and by the late implantees was substitution. One needs to wait until the child achieves a score of 20 % on LNT before assessing other aspects of his/her speech perception abilities. There appears to be a need to use speech perception tests which are based on theoretical empirical models, in order to enable us to give a descriptive analysis of post implant speech perception performance.
Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid
As people grow older, speech perception difficulties become highly prevalent, especially in noisy listening situations. Moreover, it is assumed that speech intelligibility is more affected in the event of background noises that induce a higher cognitive load, i.e., noises that result in informational versus energetic masking. There is ample evidence showing that speech perception problems in aging persons are partly due to hearing impairment and partly due to age-related declines in cognition and suprathreshold auditory processing. In order to develop effective rehabilitation strategies, it is indispensable to know how these different degrading factors act upon speech perception. This implies disentangling effects of hearing impairment versus age and examining the interplay between both factors in different background noises of everyday settings. To that end, we investigated open-set sentence identification in six participant groups: a young (20-30 years), middle-aged (50-60 years), and older cohort (70-80 years), each including persons who had normal audiometric thresholds up to at least 4 kHz, on the one hand, and persons who were diagnosed with elevated audiometric thresholds, on the other hand. All participants were screened for (mild) cognitive impairment. We applied stationary and amplitude modulated speech-weighted noise, which are two types of energetic maskers, and unintelligible speech, which causes informational masking in addition to energetic masking. By means of these different background noises, we could look into speech perception performance in listening situations with a low and high cognitive load, respectively. Our results indicate that, even when audiometric thresholds are within normal limits up to 4 kHz, irrespective of threshold elevations at higher frequencies, and there is no indication of even mild cognitive impairment, masked speech perception declines by middle age and decreases further on to older age. The impact of hearing
Stevenson, Ryan A; Baum, Sarah H; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Wallace, Mark T
Speech perception in noisy environments is boosted when a listener can see the speaker's mouth and integrate the auditory and visual speech information. Autistic children have a diminished capacity to integrate sensory information across modalities, which contributes to core symptoms of autism, such as impairments in social communication. We investigated the abilities of autistic and typically-developing (TD) children to integrate auditory and visual speech stimuli in various signal-to-noise ratios (SNR). Measurements of both whole-word and phoneme recognition were recorded. At the level of whole-word recognition, autistic children exhibited reduced performance in both the auditory and audiovisual modalities. Importantly, autistic children showed reduced behavioral benefit from multisensory integration with whole-word recognition, specifically at low SNRs. At the level of phoneme recognition, autistic children exhibited reduced performance relative to their TD peers in auditory, visual, and audiovisual modalities. However, and in contrast to their performance at the level of whole-word recognition, both autistic and TD children showed benefits from multisensory integration for phoneme recognition. In accordance with the principle of inverse effectiveness, both groups exhibited greater benefit at low SNRs relative to high SNRs. Thus, while autistic children showed typical multisensory benefits during phoneme recognition, these benefits did not translate to typical multisensory benefit of whole-word recognition in noisy environments. We hypothesize that sensory impairments in autistic children raise the SNR threshold needed to extract meaningful information from a given sensory input, resulting in subsequent failure to exhibit behavioral benefits from additional sensory information at the level of whole-word recognition. Autism Res 2017. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.
Purpose: It has recently been reported (e.g., V. van Wassenhove, K. W. Grant, & D. Poeppel, 2005) that audiovisual (AV) presented speech is associated with an N1/P2 auditory event-related potential (ERP) response that is lower in peak amplitude compared with the responses associated with auditory only (AO) speech. This effect was replicated.…
Pyschny, Verena; Landwehr, Markus; Hahn, Moritz; Walger, Martin; von Wedel, Hasso; Meister, Hartmut
Purpose: The objective of the study was to investigate the influence of bimodal stimulation upon hearing ability for speech recognition in the presence of a single competing talker. Method: Speech recognition was measured in 3 listening conditions: hearing aid (HA) alone, cochlear implant (CI) alone, and both devices together (CI + HA). To examine…
Scott, Sophie K.; Wise, Richard J. S.
In this paper we attempt to relate the prelexical processing of speech, with particular emphasis on functional neuroimaging studies, to the study of auditory perceptual systems by disciplines in the speech and hearing sciences. The elaboration of the sound-to-meaning pathways in the human brain enables their integration into models of the human…
Ghazanfar, Asif A; Pinsk, Mark A
Listening to speech amidst noise is facilitated by a variety of cues, including the predictable use of certain words in certain contexts. A recent fMRI study of the interaction between noise and semantic predictability has identified a cortical network involved in speech comprehension.
Snellings, Patrick; van der Leij, Aryan; Blok, Henk; de Jong, Peter F.
This study investigated the role of speech perception accuracy and speed in fluent word decoding of reading disabled (RD) children. A same-different phoneme discrimination task with natural speech tested the perception of single consonants and consonant clusters by young but persistent RD children. RD children were slower than chronological age…
Brungart, Douglas S; Sheffield, Benjamin M; Kubli, Lina R
In the real world, spoken communication occurs in complex environments that involve audiovisual speech cues, spatially separated sound sources, reverberant listening spaces, and other complicating factors that influence speech understanding. However, most clinical tools for assessing speech perception are based on simplified listening environments that do not reflect the complexities of real-world listening. In this study, speech materials from the QuickSIN speech-in-noise test by Killion, Niquette, Gudmundsen, Revit, and Banerjee [J. Acoust. Soc. Am. 116, 2395-2405 (2004)] were modified to simulate eight listening conditions spanning the range of auditory environments listeners encounter in everyday life. The standard QuickSIN test method was used to estimate 50% speech reception thresholds (SRT50) in each condition. A method of adjustment procedure was also used to obtain subjective estimates of the lowest signal-to-noise ratio (SNR) where the listeners were able to understand 100% of the speech (SRT100) and the highest SNR where they could detect the speech but could not understand any of the words (SRT0). The results show that the modified materials maintained most of the efficiency of the QuickSIN test procedure while capturing performance differences across listening conditions comparable to those reported in previous studies that have examined the effects of audiovisual cues, binaural cues, room reverberation, and time compression on the intelligibility of speech.
Koelewijn, Thomas; Zekveld, Adriana A; Festen, Joost M; Kramer, Sophia E
A recent pupillometry study on adults with normal hearing indicates that the pupil response during speech perception (cognitive processing load) is strongly affected by the type of speech masker. The current study extends these results by recording the pupil response in 32 participants with hearing impairment (mean age 59 yr) while they were listening to sentences masked by fluctuating noise or a single-talker. Efforts were made to improve audibility of all sounds by means of spectral shaping. Additionally, participants performed tests measuring verbal working memory capacity, inhibition of interfering information in working memory, and linguistic closure. The results showed worse speech reception thresholds for speech masked by single-talker speech compared to fluctuating noise. In line with previous results for participants with normal hearing, the pupil response was larger when listening to speech masked by a single-talker compared to fluctuating noise. Regression analysis revealed that larger working memory capacity and better inhibition of interfering information related to better speech reception thresholds, but these variables did not account for inter-individual differences in the pupil response. In conclusion, people with hearing impairment show more cognitive load during speech processing when there is interfering speech compared to fluctuating noise.
Drullman, Rob; Bronkhorst, Adelbert W.
Speech intelligibility was investigated by varying the number of interfering talkers, level, and mean pitch differences between target and interfering speech, and the presence of tactile support. In a first experiment the speech-reception threshold (SRT) for sentences was measured for a male talker against a background of one to eight interfering male talkers or speech noise. Speech was presented diotically and vibro-tactile support was given by presenting the low-pass-filtered signal (0-200 Hz) to the index finger. The benefit in the SRT resulting from tactile support ranged from 0 to 2.4 dB and was largest for one or two interfering talkers. A second experiment focused on masking effects of one interfering talker. The interference was the target talker's own voice with an increased mean pitch by 2, 4, 8, or 12 semitones. Level differences between target and interfering speech ranged from -16 to +4 dB. Results from measurements of correctly perceived words in sentences show an intelligibility increase of up to 27% due to tactile support. Performance gradually improves with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences. Differences in performance between noise and speech maskers and between speech maskers with various mean pitches are explained by the effect of informational masking. .
Tahmina, Qudsia; Runge, Christina; Friedland, David R.
This study assesses the effects of adding low- or high-frequency information to the band-limited telephone-processed speech on bimodal listeners’ telephone speech perception in quiet environments. In the proposed experiments, bimodal users were presented under quiet listening conditions with wideband speech (WB), bandpass-filtered telephone speech (300–3,400 Hz, BP), high-pass filtered speech (f > 300 Hz, HP, i.e., distorted frequency components above 3,400 Hz in telephone speech were restored), and low-pass filtered speech (f < 3,400 Hz, LP, i.e., distorted frequency components below 300 Hz in telephone speech were restored). Results indicated that in quiet environments, for all four types of stimuli, listening with both hearing aid (HA) and cochlear implant (CI) was significantly better than listening with CI alone. For both bimodal and CI-alone modes, there were no statistically significant differences between the LP and BP scores and between the WB and HP scores. However, the HP scores were significantly better than the BP scores. In quiet conditions, both CI alone and bimodal listening achieved the largest benefits when telephone speech was augmented with high rather than low-frequency information. These findings provide support for the design of algorithms that would extend higher frequency information, at least in quiet environments. PMID:24265213
Shofner, William P
The behavioral responses of chinchillas to noise-vocoded versions of naturally spoken speech sounds were measured using stimulus generalization and operant conditioning. Behavioral performance for speech generalization by chinchillas is compared to recognition by a group of human listeners for the identical speech sounds. The ability of chinchillas to generalize the vocoded versions as tokens of the natural speech sounds is far less than recognition by human listeners. In many cases, responses of chinchillas to noise-vocoded speech sounds were more similar to responses to band limited noise than to the responses to natural speech sounds. Chinchillas were also tested with a middle C musical note as played on a piano. Comparison of the responses of chinchillas for the middle C condition to the responses obtained for the speech conditions suggest that chinchillas may be more influenced by fundamental frequency than by formant structure. The differences between vocoded speech perception in chinchillas and human listeners may reflect differences in their abilities to resolve the formants along the cochlea. It is argued that lengthening of the cochlea during human evolution may have provided one of the auditory mechanisms that influenced the evolution of speech-specific mechanisms.
Vandewalle, Ellen; Boets, Bart; Ghesquiere, Pol; Zink, Inge
This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay…
Kishon-Rabin, Liat; Segal, Osnat; Algom, Daniel
Purpose: To clarify the relationship between psychoacoustic capabilities and speech perception in adolescents with severe-to-profound hearing loss (SPHL). Method: Twenty-four adolescents with SPHL and young adults with normal hearing were assessed with psychoacoustic and speech perception tests. The psychoacoustic tests included gap detection…
Vandewalle, Ellen; Boets, Bart; Ghesquière, Pol; Zink, Inge
This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay (n = 8), (2) children with SLI and normal literacy (n = 10) and (3) typically developing children (n = 14). Moreover, the relations between these auditory processing and speech perception skills and oral language and literacy skills in grade 1 and grade 3 were analyzed. The SLI group with literacy delay scored significantly lower than both other groups on speech perception, but not on temporal auditory processing. Both normal reading groups did not differ in terms of speech perception or auditory processing. Speech perception was significantly related to reading and spelling in grades 1 and 3 and had a unique predictive contribution to reading growth in grade 3, even after controlling reading level, phonological ability, auditory processing and oral language skills in grade 1. These findings indicated that speech perception also had a unique direct impact upon reading development and not only through its relation with phonological awareness. Moreover, speech perception seemed to be more associated with the development of literacy skills and less with oral language ability.
Dole, Marjorie; Hoen, Michel; Meunier, Fanny
Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type,…
Rogalsky, Corianne; Love, Tracy; Driscoll, David; Anderson, Steven W.; Hickok, Gregory
The discovery of mirror neurons in macaque has led to a resurrection of motor theories of speech perception. Although the majority of lesion and functional imaging studies have associated perception with the temporal lobes, it has also been proposed that the ‘human mirror system’, which prominently includes Broca’s area, is the neurophysiological substrate of speech perception. Although numerous studies have demonstrated a tight link between sensory and motor speech processes, few have directly assessed the critical prediction of mirror neuron theories of speech perception, namely that damage to the human mirror system should cause severe deficits in speech perception. The present study measured speech perception abilities of patients with lesions involving motor regions in the left posterior frontal lobe and/or inferior parietal lobule (i.e., the proposed human ‘mirror system’). Performance was at or near ceiling in patients with fronto-parietal lesions. It is only when the lesion encroaches on auditory regions in the temporal lobe that perceptual deficits are evident. This suggests that ‘mirror system’ damage does not disrupt speech perception, but rather that auditory systems are the primary substrate for speech perception. PMID:21207313
Boets, Bart; Ghesquiere, Pol; van Wieringen, Astrid; Wouters, Jan
We tested categorical perception and speech-in-noise perception in a group of five-year-old preschool children genetically at risk for dyslexia, compared to a group of well-matched control children and a group of adults. Both groups of children differed significantly from the adults on all speech measures. Comparing both child groups, the risk…
Narne, Vijaya Kumar
Aim The present study evaluated the relation between speech perception in the presence of background noise and temporal processing ability in listeners with Auditory Neuropathy (AN). Method The study included two experiments. In the first experiment, temporal resolution of listeners with normal hearing and those with AN was evaluated using measures of temporal modulation transfer function and frequency modulation detection at modulation rates of 2 and 10 Hz. In the second experiment, speech perception in quiet and noise was evaluated at three signal to noise ratios (SNR) (0, 5, and 10 dB). Results Results demonstrated that listeners with AN performed significantly poorer than normal hearing listeners in both amplitude modulation and frequency modulation detection, indicating significant impairment in extracting envelope as well as fine structure cues from the signal. Furthermore, there was significant correlation seen between measures of temporal resolution and speech perception in noise. Conclusion Results suggested that an impaired ability to efficiently process envelope and fine structure cues of the speech signal may be the cause of the extreme difficulties faced during speech perception in noise by listeners with AN. PMID:23409105
Ueda, Kazuo; Nakajima, Yoshitaka; Akahane-Yamada, Reiko
Our auditory system has to organize and to pick up a target sound with many components, sometimes rejecting irrelevant sound components, but sometimes forming multiple streams including the target stream. This situation is well described with the concept of auditory scene analysis. Research on speech perception in noise is closely related to auditory scene analysis. This paper briefly reviews the concept of auditory scene analysis and previous and ongoing research on speech perception in noise, and discusses the future direction of research. Further experimental investigations are needed to understand our perceptual mechanisms better.
Moradi, Shahram; Lidestam, Björn; Hällgren, Mathias; Rönnberg, Jerker
This study compared elderly hearing aid (EHA) users and elderly normal-hearing (ENH) individuals on identification of auditory speech stimuli (consonants, words, and final word in sentences) that were different when considering their linguistic properties. We measured the accuracy with which the target speech stimuli were identified, as well as the isolation points (IPs: the shortest duration, from onset, required to correctly identify the speech target). The relationships between working memory capacity, the IPs, and speech accuracy were also measured. Twenty-four EHA users (with mild to moderate hearing impairment) and 24 ENH individuals participated in the present study. Despite the use of their regular hearing aids, the EHA users had delayed IPs and were less accurate in identifying consonants and words compared with the ENH individuals. The EHA users also had delayed IPs for final word identification in sentences with lower predictability; however, no significant between-group difference in accuracy was observed. Finally, there were no significant between-group differences in terms of IPs or accuracy for final word identification in highly predictable sentences. Our results also showed that, among EHA users, greater working memory capacity was associated with earlier IPs and improved accuracy in consonant and word identification. Together, our findings demonstrate that the gated speech perception ability of EHA users was not at the level of ENH individuals, in terms of IPs and accuracy. In addition, gated speech perception was more cognitively demanding for EHA users than for ENH individuals in the absence of semantic context.
You, R S; Serniclaes, W; Rider, D; Chabane, N
Previous studies have claimed to show deficits in the perception of speech sounds in autism spectrum disorders (ASD). The aim of the current study was to clarify the nature of such deficits. Children with ASD might only exhibit a lesser amount of precision in the perception of phoneme categories (CPR deficit). However, these children might further present an allophonic mode of speech perception, similar to the one evidenced in dyslexia, characterised by enhanced discrimination of acoustic differences within phoneme categories. Allophonic perception usually gives rise to a categorical perception (CP) deficit, characterised by a weaker coherence between discrimination and identification of speech sounds. The perceptual performance of ASD children was compared to that of control children of the same chronological age. Identification and discrimination data were collected for continua of natural vowels, synthetic vowels, and synthetic consonants. Results confirmed that children with ASD exhibit a CPR deficit for the three stimulus continua. These children further exhibited a trend toward allophonic perception that was, however, not accompanied by the usual CP deficit. These findings confirm that the commonly found CPR deficit is also present in ASD. Whether children with ASD also present allophonic perception requires further investigations.
Rees, Rachel; Bladel, Judith
Many studies have shown that French Cued Speech (CS) can enhance lipreading and the development of phonological awareness and literacy in deaf children but, as yet, there is little evidence that these findings can be generalized to English CS. This study investigated the possible effects of English CS on the speech perception, phonological…
van Laarhoven, Thijs; Keetels, Mirjam; Schakel, Lemmy; Vroomen, Jean
Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and adults with DD. We found that children with a documented history of DD have deficits in their ability to gain benefit from lip-read information that disambiguates noise-masked speech. We show with another group of adult individuals with DD that these deficits persist into adulthood. These deficits could not be attributed to impairments in unisensory auditory word recognition. Rather, the results indicate a specific deficit in audio-visual speech processing and suggest that impaired multisensory integration might be an important aspect of DD.
Samson, Y; Belin, P; Thivard, L; Boddaert, N; Crozier, S; Zilbovicius, M
Since the description of cortical deafness, it has been known that the superior temporal cortex is bilaterally involved in the initial stages of language auditory perception but the precise anatomical limits and the function of this area remain debated. Here we reviewed more than 40 recent papers of positron emission tomography and functional magnetic resonance imaging related to language auditory perception, and we performed a meta-analysis of the localization of the peaks of activation in the Talairach's space. We found 8 studies reporting word versus non-word listening contrasts with 54 activation peaks in the temporal lobes. These peaks clustered in a bilateral and well-limited area of the temporal superior cortex, which is here operationally defined as the speech sensitive auditory cortex. This area is more than 4cm long, located in the superior temporal gyrus and the superior temporal sulcus, both anterior and posterior to Heschl's gyrus. It do not include the primary auditory cortex nor the ascending part of the planum temporale. The speech sensitive auditory cortex is not activated by pure tones, environmental sounds, or attention directed toward elementary components of a sound such as intensity, pitch, or duration, and thus has some specificity for speech signals. The specificity is not perfect, since we found a number of non-speech auditory stimuli activating the speech sensitive auditory cortex. Yet the latter studies always involve auditory perception mechanisms which are also relevant to speech perception either at the level of primitive auditory scene analysis processes, or at the level of specific schema-based recognition processes. The dorsal part of the speech sensitive auditory cortex may be involved in primitive scene analysis processes, whereas distributed activation of this area may contribute to the emergence of a broad class of "voice" schemas and of more specific "speech schemas/phonetic modules" related to different languages. In addition
Caldwell, Amanda; Nittrouer, Susan
Purpose Common wisdom suggests that listening in noise poses disproportionately greater difficulty for listeners with cochlear implants (CIs) than for peers with normal hearing (NH). The purpose of this study was to examine phonological, language, and cognitive skills that might help explain speech-in-noise abilities for children with CIs. Method Three groups of kindergartners (NH, hearing aid wearers, and CI users) were tested on speech recognition in quiet and noise and on tasks thought to underlie the abilities that fit into the domains of phonological awareness, general language, and cognitive skills. These last measures were used as predictor variables in regression analyses with speech-in-noise scores as dependent variables. Results Compared to children with NH, children with CIs did not perform as well on speech recognition in noise or on most other measures, including recognition in quiet. Two surprising results were that (a) noise effects were consistent across groups and (b) scores on other measures did not explain any group differences in speech recognition. Conclusions Limitations of implant processing take their primary toll on recognition in quiet and account for poor speech recognition and language/phonological deficits in children with CIs. Implications are that teachers/clinicians need to teach language/phonology directly and maximize signal-to-noise levels in the classroom. PMID:22744138
van Heukelem, Kristin; Bradlow, Ann R.
Studies on speech perception in multitalker babble have revealed asymmetries in the effects of noise on native versus foreign-accented speech intelligibility for native listeners [Rogers et al., Lang Speech 47(2), 139-154 (2004)] and on sentence-in-noise perception by native versus non-native listeners [Mayo et al., J. Speech Lang. Hear. Res., 40, 686-693 (1997)], suggesting that the linguistic backgrounds of talkers and listeners contribute to the effects of noise on speech perception. However, little attention has been paid to the language of the babble. This study tested whether the language of the noise also has asymmetrical effects on listeners. Replicating previous findings [e.g., Bronkhorst and Plomp, J. Acoust. Soc. Am., 92, 3132-3139 (1992)], the results showed poorer English sentence recognition by native English listeners in six-talker babble than in two-talker babble regardless of the language of the babble, demonstrating the effect of increased psychoacoustic/energetic masking. In addition, the results showed that in the two-talker babble condition, native English listeners were more adversely affected by English than Chinese babble. These findings demonstrate informational/cognitive masking on sentence-in-noise recognition in the form of linguistic competition. Whether this competition is at the lexical or sublexical level and whether it is modulated by the phonetic similarity between the target and noise languages remains to be determined.
Aydelott, Jennifer; Leech, Robert; Crinion, Jennifer
It is widely accepted that hearing loss increases markedly with age, beginning in the fourth decade ISO 7029 (2000). Age-related hearing loss is typified by high-frequency threshold elevation and associated reductions in speech perception because speech sounds, especially consonants, become inaudible. Nevertheless, older adults often report additional and progressive difficulties in the perception and comprehension of speech, often highlighted in adverse listening conditions that exceed those reported by younger adults with a similar degree of high-frequency hearing loss (Dubno, Dirks, & Morgan) leading to communication difficulties and social isolation (Weinstein & Ventry). Some of the age-related decline in speech perception can be accounted for by peripheral sensory problems but cognitive aging can also be a contributing factor. In this article, we review findings from the psycholinguistic literature predominantly over the last four years and present a pilot study illustrating how normal age-related changes in cognition and the linguistic context can influence speech-processing difficulties in older adults. For significant progress in understanding and improving the auditory performance of aging listeners to be made, we discuss how future research will have to be much more specific not only about which interactions between auditory and cognitive abilities are critical but also how they are modulated in the brain.
Aydelott, Jennifer; Leech, Robert; Crinion, Jennifer
It is widely accepted that hearing loss increases markedly with age, beginning in the fourth decade ISO 7029 (2000). Age-related hearing loss is typified by high-frequency threshold elevation and associated reductions in speech perception because speech sounds, especially consonants, become inaudible. Nevertheless, older adults often report additional and progressive difficulties in the perception and comprehension of speech, often highlighted in adverse listening conditions that exceed those reported by younger adults with a similar degree of high-frequency hearing loss (Dubno, Dirks, & Morgan) leading to communication difficulties and social isolation (Weinstein & Ventry). Some of the age-related decline in speech perception can be accounted for by peripheral sensory problems but cognitive aging can also be a contributing factor. In this article, we review findings from the psycholinguistic literature predominantly over the last four years and present a pilot study illustrating how normal age-related changes in cognition and the linguistic context can influence speech-processing difficulties in older adults. For significant progress in understanding and improving the auditory performance of aging listeners to be made, we discuss how future research will have to be much more specific not only about which interactions between auditory and cognitive abilities are critical but also how they are modulated in the brain. PMID:21307006
Mayo, L H; Florentine, M; Buus, S
To determine how age of acquisition influences perception of second-language speech, the Speech Perception in Noise (SPIN) test was administered to native Mexican-Spanish-speaking listeners who learned fluent English before age 6 (early bilinguals) or after age 14 (late bilinguals) and monolingual American-English speakers (monolinguals). Results show that the levels of noise at which the speech was intelligible were significantly higher and the benefit from context was significantly greater for monolinguals and early bilinguals than for late bilinguals. These findings indicate that learning a second language at an early age is important for the acquisition of efficient high-level processing of it, at least in the presence of noise.
Sarant, J Z; Cowan, R S; Blamey, P J; Galvin, K L; Clark, G M
The prognosis for benefit from use of cochlear implants in congenitally deaf adolescents, who have a long duration of profound deafness prior to implantation, has typically been low. Speech perception results for two congenitally deaf patients implanted as adolescents at the University of Melbourne/Royal Victorian Eye and Ear Hospital Clinic show that, after 12 months of experience, both patients had significant open-set speech discrimination scores without lipreading. These results suggest that although benefits may in general be low for congenitally deaf adolescents, individuals may attain significant benefits to speech perception after a short period of experience. Prospective patients from this group should therefore be considered on an individual basis with regard to prognosis for benefit from cochlear implantation.
Astheimer, Lori B.; Berkes, Matthias; Bialystok, Ellen
Attention is required during speech perception to focus processing resources on critical information. Previous research has shown that bilingualism modifies attentional processing in nonverbal domains. The current study used event-related potentials (ERPs) to determine whether bilingualism also modifies auditory attention during speech perception. We measured attention to word onsets in spoken English for monolinguals and Chinese-English bilinguals. Auditory probes were inserted at four times in a continuous narrative: concurrent with word onset, 100 ms before or after onset, and at random control times. Greater attention was indexed by an increase in the amplitude of the early negativity (N1). Among monolinguals, probes presented after word onsets elicited a larger N1 than control probes, replicating previous studies. For bilinguals, there was no N1 difference for probes at different times around word onsets, indicating less specificity in allocation of attention. These results suggest that bilingualism shapes attentional strategies during English speech comprehension. PMID:27110579
McMurray, Bob; Aslin, Richard N.
Previous research on speech perception in both adults and infants has supported the view that consonants are perceived categorically; that is, listeners are relatively insensitive to variation below the level of the phoneme. More recent work, on the other hand, has shown adults to be systematically sensitive to within category variation [McMurray,…
This paper examines the special considerations involved in selecting a speech perception test battery for young deaf children. The auditory-only tests consisted of closed-set word identification tasks and minimal-pairs syllable tasks. Additional tests included identification of words in sentences, open-set word recognition, and evaluation of…
Chung, Kevin K. H.; McBride-Chang, Catherine; Cheung, Him; Wong, Simpson W. L.
This study focused on the associations of general auditory processing, speech perception, phonological awareness and word reading in Cantonese-speaking children from Hong Kong learning to read both Chinese (first language [L1]) and English (second language [L2]). Children in Grades 2--4 ("N" = 133) participated and were administered…
Fowler, Jennifer R.; Eggleston, Jessica L.; Reavis, Kelly M.; McMillan, Garnett P.; Reiss, Lina A. J.
Purpose: The objective was to determine whether speech perception could be improved for bimodal listeners (those using a cochlear implant [CI] in one ear and hearing aid in the contralateral ear) by removing low-frequency information provided by the CI, thereby reducing acoustic-electric overlap. Method: Subjects were adult CI subjects with at…
Lee, Andrew H.; Lyster, Roy
To what extent do second language (L2) learners benefit from instruction that includes corrective feedback (CF) on L2 speech perception? This article addresses this question by reporting the results of a classroom-based experimental study conducted with 32 young adult Korean learners of English. An instruction-only group and an instruction + CF…
Horlyck, Stephanie; Reid, Amanda; Burnham, Denis
Does the intensification of what can be called "language-specific speech perception" around reading onset occur as a function of maturation or experience? Preschool 5-year-olds with no school experience, 5-year-olds with 6 months' schooling, 6-year-olds with 6 months' schooling, and 6-year-olds with 18 months' schooling were tested on…
Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.
Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…
Schomers, Malte R.; Pulvermüller, Friedemann
In the neuroscience of language, phonemes are frequently described as multimodal units whose neuronal representations are distributed across perisylvian cortical regions, including auditory and sensorimotor areas. A different position views phonemes primarily as acoustic entities with posterior temporal localization, which are functionally independent from frontoparietal articulatory programs. To address this current controversy, we here discuss experimental results from functional magnetic resonance imaging (fMRI) as well as transcranial magnetic stimulation (TMS) studies. On first glance, a mixed picture emerges, with earlier research documenting neurofunctional distinctions between phonemes in both temporal and frontoparietal sensorimotor systems, but some recent work seemingly failing to replicate the latter. Detailed analysis of methodological differences between studies reveals that the way experiments are set up explains whether sensorimotor cortex maps phonological information during speech perception or not. In particular, acoustic noise during the experiment and ‘motor noise’ caused by button press tasks work against the frontoparietal manifestation of phonemes. We highlight recent studies using sparse imaging and passive speech perception tasks along with multivariate pattern analysis (MVPA) and especially representational similarity analysis (RSA), which succeeded in separating acoustic-phonological from general-acoustic processes and in mapping specific phonological information on temporal and frontoparietal regions. The question about a causal role of sensorimotor cortex on speech perception and understanding is addressed by reviewing recent TMS studies. We conclude that frontoparietal cortices, including ventral motor and somatosensory areas, reflect phonological information during speech perception and exert a causal influence on language understanding. PMID:27708566
Markham, Chris; van Laar, Darren; Gibbard, Deborah; Dean, Taraneh
Background: This study is part of a programme of research aiming to develop a quantitative measure of quality of life for children with communication needs. It builds on the preliminary findings of Markham and Dean (2006), which described some of the perception's parents and carers of children with speech language and communication needs had…
Kumar, Prawin; Yathiraj, Asha
The present study aimed at assessing perception of filtered speech that simulated different configurations of hearing loss. The simulation was done by filtering four equivalent lists of a monosyllabic test developed by Shivaprasad for Indian-English speakers. This was done using the Adobe Audition software. Thirty normal hearing participants in…
Blood, Gordon W.; Boyle, Michael P.; Blood, Ingrid M.; Nalesnik, Gina R.
Bullying in school-age children is a global epidemic. School personnel play a critical role in eliminating this problem. The goals of this study were to examine speech-language pathologists' (SLPs) perceptions of bullying, endorsement of potential strategies for dealing with bullying, and associations among SLPs' responses and specific demographic…
Lo, Chi Yhun; McMahon, Catherine M; Looi, Valerie; Thompson, William F
Cochlear implant (CI) recipients generally have good perception of speech in quiet environments but difficulty perceiving speech in noisy conditions, reduced sensitivity to speech prosody, and difficulty appreciating music. Auditory training has been proposed as a method of improving speech perception for CI recipients, and recent efforts have focussed on the potential benefits of music-based training. This study evaluated two melodic contour training programs and their relative efficacy as measured on a number of speech perception tasks. These melodic contours were simple 5-note sequences formed into 9 contour patterns, such as "rising" or "rising-falling." One training program controlled difficulty by manipulating interval sizes, the other by note durations. Sixteen adult CI recipients (aged 26-86 years) and twelve normal hearing (NH) adult listeners (aged 21-42 years) were tested on a speech perception battery at baseline and then after 6 weeks of melodic contour training. Results indicated that there were some benefits for speech perception tasks for CI recipients after melodic contour training. Specifically, consonant perception in quiet and question/statement prosody was improved. In comparison, NH listeners performed at ceiling for these tasks. There was no significant difference between the posttraining results for either training program, suggesting that both conferred benefits for training CI recipients to better perceive speech.
Lo, Chi Yhun; McMahon, Catherine M.; Looi, Valerie; Thompson, William F.
Cochlear implant (CI) recipients generally have good perception of speech in quiet environments but difficulty perceiving speech in noisy conditions, reduced sensitivity to speech prosody, and difficulty appreciating music. Auditory training has been proposed as a method of improving speech perception for CI recipients, and recent efforts have focussed on the potential benefits of music-based training. This study evaluated two melodic contour training programs and their relative efficacy as measured on a number of speech perception tasks. These melodic contours were simple 5-note sequences formed into 9 contour patterns, such as “rising” or “rising-falling.” One training program controlled difficulty by manipulating interval sizes, the other by note durations. Sixteen adult CI recipients (aged 26–86 years) and twelve normal hearing (NH) adult listeners (aged 21–42 years) were tested on a speech perception battery at baseline and then after 6 weeks of melodic contour training. Results indicated that there were some benefits for speech perception tasks for CI recipients after melodic contour training. Specifically, consonant perception in quiet and question/statement prosody was improved. In comparison, NH listeners performed at ceiling for these tasks. There was no significant difference between the posttraining results for either training program, suggesting that both conferred benefits for training CI recipients to better perceive speech. PMID:26494944
Scott, Sophie K; Wise, Richard J S
In this paper we attempt to relate the prelexical processing of speech, with particular emphasis on functional neuroimaging studies, to the study of auditory perceptual systems by disciplines in the speech and hearing sciences. The elaboration of the sound-to-meaning pathways in the human brain enables their integration into models of the human language system and the definition of potential auditory processing differences between the two cerebral hemispheres. Further, it facilitates comparison with recent developments in the study of the anatomy of non-human primate auditory cortex, which has very precisely revealed architectonically distinct regions, connectivity, and functional specialization.
The objective of this dissertation was to assess the use of computer modeling techniques to predict quantitative and qualitative measures of speech perception in classrooms under realistic conditions of background noise and reverberation. Secondary objectives included (1) finding relationships among acoustical measurements made in actual classrooms and in the computer models of the actual rooms as a prediction tool of 15 acoustic parameters at the design stage of projects and (2) finding relationships among speech perception scores and 15 acoustic parameters to determine the best predictors of speech perception in actual classroom conditions. Fifteen types of acoustical measurements were made in three actual classrooms with reverberation times of 0.5, 1.3, and 5.1 seconds. Speech perception tests using a Modified Rhyme Test list were also given to 22 subject in each room with five noise conditions of signal-to-noise ratios of 31, 24, 15, 0, -10. Computer models of the rooms were constructed using a commercially available computer model software program. The 15 acoustical measurements were made at 6 or 9 locations in the model rooms. Impulse responses obtained in the computer models of the rooms were convolved with the anechoically recorded speech tests used in the full size rooms to produce a compact disk with the MRT lists with the acoustical response of the computer model rooms. Speech perception tests using this as source material were given to the subjects over loudspeaker in an acoustic test booth. The results of the study showed correlations (R2) of between acoustical measures made in the full size classrooms and the computer models of the classrooms of 0.92 to 0.99 with standard errors of 0.033 to 7.311. Comparisons between speech perception scores tested in the rooms and acoustical measurements made in the rooms and in the computer models of the classrooms showed that the measures have similar prediction accuracy with other studies in the literatures. The
Ghitza, Oded; Giraud, Anne-Lise; Poeppel, David
A recent opinion article (Neural oscillations in speech: do not be enslaved by the envelope. Obleser et al., 2012) questions the validity of a class of speech perception models inspired by the possible role of neuronal oscillations in decoding speech (e.g., Ghitza, 2011; Giraud and Poeppel, 2012). The authors criticize, in particular, what they see as an over-emphasis of the role of temporal speech envelope information, and an over-emphasis of entrainment to the input rhythm while neglecting the role of top-down processes in modulating the entrainment of neuronal oscillations. Here we respond to these arguments, referring to the phenomenological model of Ghitza (2011), taken as a representative of the criticized approach. PMID:23316150
Kleinschmidt, Dave F.; Jaeger, T. Florian
Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker’s /p/ might be physically indistinguishable from another talker’s /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively non-stationary world and propose that the speech perception system overcomes this challenge by (1) recognizing previously encountered situations, (2) generalizing to other situations based on previous similar experience, and (3) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (1) to (3) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on two critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these two aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension. PMID:25844873
García, Paula B; Rosado Rogers, Lydia; Nishi, Kanae
This study evaluated the English version of Computer-Assisted Speech Perception Assessment (E-CASPA) with Spanish-English bilingual children. E-CASPA has been evaluated with monolingual English speakers ages 5 years and older, but it is unknown whether a separate norm is necessary for bilingual children. Eleven Spanish-English bilingual and 12 English monolingual children (6 to 12 years old) with normal hearing participated. Responses were scored by word, phoneme, consonant, and vowel. Regardless of scores, performance across three signal-to-noise ratio conditions was similar between groups, suggesting that the same norm can be used for both bilingual and monolingual children.
Caldwell, Amanda; Nittrouer, Susan
Purpose: Common wisdom suggests that listening in noise poses disproportionately greater difficulty for listeners with cochlear implants (CIs) than for peers with normal hearing (NH). The purpose of this study was to examine phonological, language, and cognitive skills that might help explain speech-in-noise abilities for children with CIs.…
The immediate recall is compared of 15 speech concepts presented in mixed order in 3 modalities: auditory, symbolically visual (written), and iconic. Results of experiments with 121 subjects suggest that the first step of information processing is a differentiation of sensory stimuli. (10 references) (LB)
Kraut, Rachel; Wulff, Stefanie
Seventy-eight native English speakers rated the foreign-accented speech (FAS) of 24 international students enrolled in an Intensive English programme at a public university in Texas on degree of accent, comprehensibility and communicative ability. Variables considered to potentially impact listeners' ratings were the sex of the speaker, the first…
Linnville, Steven E.; And Others
In an investigation using auditory evoked responses (AERs) to compare strongly left- and strongly right-handed adults in their hemispheric processing of speech materials, it was anticipated that AERs would reflect a bilateralization in the left-handed group of subjects and marked hemispheric differences in the right-handed group. In addition, the…
Knowland, Victoria C. P.; Evans, Sam; Snell, Caroline; Rosen, Stuart
Purpose: The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method: In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with…
Werker, J F; Gilbert, J H; Humphrey, K; Tees, R C
Previous research has suggested that infants discriminate many speech sounds according to phonemic category regardless of language exposure, while adults of one language group may have difficulty discriminating nonnative linguistic contrasts. Our study attempted to address directly questions about infant perceptual ability and the possibility of its decline as a function of development in the absence of specific experience by comparing English-speaking adults, Hindi-speaking adults, and 7-month-old infants on their ability to discriminate 2 pairs of natural Hindi (non-English) speech contrasts. To do this, infants were tested in a "visually reinforced infant speech discrimination" paradigm, while a variant of this paradigm was used to test adults. Support was obtained for the above hypotheses. Infants were shown to be able to discriminate both Hindi sound pairs, and support for the idea of a decrease in speech perceptual abilities wih age and experience was clearly evident with the rarer of the 2 non-English contrasts. The results were then discussed with respect to the possible nature and purpose of these abilities.
Robinson, Elizabeth J.; Davidson, Lisa S.; Uchanski, Rosalie M.; Brenner, Christine M.; Geers, Ann E.
Background For pediatric cochlear implant (CI) users, CI processor technology, map characteristics and fitting strategies are known to have a substantial impact on speech perception scores at young ages. It is unknown whether these benefits continue over time as these children reach adolescence. Purpose To document changes in CI technology, map characteristics, and speech perception scores in children between elementary grades and high school, and to describe relations between map characteristics and speech perception scores over time. Research Design A longitudinal design with participants 8–9 years old at session 1 and 15–18 years old at session 2. Study Sample Participants were 82 adolescents with unilateral CIs, who are a subset of a larger longitudinal study. Mean age at implantation was 3.4 years (range: 1.7 – 5.4), and mean duration of device use was 5.5 years (range: 3.8–7.5) at session 1 and 13.3 years (range: 10.9–15) at session 2. Data Collection and Analysis Speech perception tests at sessions 1 and 2 were the Lexical Neighborhood word Test (LNT-70) and Bamford-Kowal-Bench sentences in quiet (BKB-Q), presented at 70 dB SPL. At session 2, the LNT was also administered at 50 dB SPL (LNT-50) and BKB sentences were administered in noise with a +10 dB SNR (BKB-N). CI processor technology type and CI map characteristics (coding strategy, number of electrodes, map threshold levels [T levels], and map comfort levels [C levels]) were obtained at both sessions. Electrical dynamic range [EDR] was computed [C level – T level], and descriptive statistics, correlations, and repeated-measures ANOVAs were employed. Results Participants achieved significantly higher LNT and BKB scores, at 70 dB SPL, at ages 15-18 than at ages 8-9 years. Forty-two participants had 1-3 electrodes either activated or deactivated in their map between test sessions, and 40 had no change in number of active electrodes (mean change: -0.5; range: -3 to +2). After conversion from
Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan
Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our
Kumar, G. Vinodh; Halder, Tamesh; Jaiswal, Amit K.; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan
Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300–600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus
McMurray, Bob; Kovack-Lesh, Kristine A; Goodwin, Dresden; McEchron, William
Infant directed speech (IDS) is a speech register characterized by simpler sentences, a slower rate, and more variable prosody. Recent work has implicated it in more subtle aspects of language development. Kuhl et al. (1997) demonstrated that segmental cues for vowels are affected by IDS in a way that may enhance development: the average locations of the extreme "point" vowels (/a/, /i/ and /u/) are further apart in acoustic space. If infants learn speech categories, in part, from the statistical distributions of such cues, these changes may specifically enhance speech category learning. We revisited this by asking (1) if these findings extend to a new cue (Voice Onset Time, a cue for voicing); (2) whether they extend to the interior vowels which are much harder to learn and/or discriminate; and (3) whether these changes may be an unintended phonetic consequence of factors like speaking rate or prosodic changes associated with IDS. Eighteen caregivers were recorded reading a picture book including minimal pairs for voicing (e.g., beach/peach) and a variety of vowels to either an adult or their infant. Acoustic measurements suggested that VOT was different in IDS, but not in a way that necessarily supports better development, and that these changes are almost entirely due to slower rate of speech of IDS. Measurements of the vowel suggested that in addition to changes in the mean, there was also an increase in variance, and statistical modeling suggests that this may counteract the benefit of any expansion of the vowel space. As a whole this suggests that changes in segmental cues associated with IDS may be an unintended by-product of the slower rate of speech and different prosodic structure, and do not necessarily derive from a motivation to enhance development.
Mayer, Jennifer L.; Hannent, Ian; Heaton, Pamela F.
Whilst enhanced perception has been widely reported in individuals with Autism Spectrum Disorders (ASDs), relatively little is known about the developmental trajectory and impact of atypical auditory processing on speech perception in intellectually high-functioning adults with ASD. This paper presents data on perception of complex tones and…
Bidelman, Gavin M
Event-related brain potentials (ERPs) reveal musical experience refines neural encoding and confers stronger categorical perception (CP) and neural organization for speech sounds. In addition to evoked brain activity, the human EEG can be decomposed into induced (non-phase-locked) responses whose various frequency bands reflect different mechanisms of perceptual-cognitive processing. Here, we aimed to clarify which spectral properties of these neural oscillations are most prone to music-related neuroplasticity and which are linked to behavioral benefits in the categorization of speech. We recorded electrical brain activity while musicians and nonmusicians rapidly identified speech tokens from a sound continuum. Time-frequency analysis parsed evoked and induced EEG into alpha- (∼10Hz), beta- (∼20Hz), and gamma- (>30Hz) frequency bands. We found that musicians' enhanced behavioral CP was accompanied by improved evoked speech responses across the frequency spectrum, complementing previously observed enhancements in evoked potential studies (i.e., ERPs). Brain-behavior correlations implied differences in the underlying neural mechanisms supporting speech CP in each group: modulations in induced gamma power predicted the slope of musicians' speech identification functions whereas early evoked alpha activity predicted behavior in nonmusicians. Collectively, findings indicate that musical training tunes speech processing via two complementary mechanisms: (i) strengthening the formation of auditory object representations for speech signals (gamma-band) and (ii) improving network control and/or the matching of sounds to internalized memory templates (alpha/beta-band). Both neurobiological enhancements may be deployed behaviorally and account for musicians' benefits in the perceptual categorization of speech.
Biau, Emmanuel; Torralba, Mireia; Fuentemilla, Lluis; de Diego Balaguer, Ruth; Soto-Faraco, Salvador
Speakers often accompany speech with spontaneous beat gestures in natural spoken communication. These gestures are usually aligned with lexical stress and can modulate the saliency of their affiliate words. Here we addressed the consequences of beat gestures on the neural correlates of speech perception. Previous studies have highlighted the role played by theta oscillations in temporal prediction of speech. We hypothesized that the sight of beat gestures may influence ongoing low-frequency neural oscillations around the onset of the corresponding words. Electroencephalographic (EEG) recordings were acquired while participants watched a continuous, naturally recorded discourse. The phase-locking value (PLV) at word onset was calculated from the EEG from pairs of identical words that had been pronounced with and without a concurrent beat gesture in the discourse. We observed an increase in PLV in the 5-6 Hz theta range as well as a desynchronization in the 8-10 Hz alpha band around the onset of words preceded by a beat gesture. These findings suggest that beats help tune low-frequency oscillatory activity at relevant moments during natural speech perception, providing a new insight of how speech and paralinguistic information are integrated.
Vance, Maggie; Martindale, Nicola
Deficits in speech perception are reported for some children with language impairments. This deficit is more marked when listening against background noise. This study investigated the speech perception skills of young children with and without language difficulties. A speech discrimination task, using non-word minimal pairs in an XAB paradigm, was presented to 20 5-7-year-old children with language difficulties and 33 typically-developing (TD) children aged between 4- to 7-years. Stimuli were presented in quiet and in background noise (babble), and stimuli varied in phonetic contrasts, differing in either place of articulation or presence/absence of voicing. Children with language difficulties performed less well than TD children in all conditions. There was an interaction between group and noise condition, such that children with language difficulties were more affected by the presence of noise. Both groups of children made more errors with one voicing contrast /s z/ and there was some indication that children with language difficulties had proportionately greater difficulty with this contrast. Speech discrimination scores were significantly correlated with language scores for children with language difficulties. Issues in developing material for assessment of speech discrimination in children with LI are discussed.
Tao, Lily; Taft, Marcus
Foreign accent in speech often presents listeners with challenging listening conditions. Consequently, listeners may need to draw on additional cognitive resources in order to perceive and comprehend such speech. Previous research has shown that, for older adults, executive functions predicted perception of speech material spoken in a novel, artificially created (and therefore unfamiliar) accent. The present study investigates the influences of executive functions, information processing speed, and working memory on perception of unfamiliar foreign accented speech, in healthy young adults. The results showed that the executive processes of inhibition and switching, as well as information processing speed predict response times to both accented and standard sentence stimuli, while inhibition and information processing speed predict speed of responding to accented word stimuli. Inhibition and switching further predict accuracy in responding to accented word and standard sentence stimuli that has increased processing demand (i.e., nonwords and sentences with unexpected semantic content). These findings suggest that stronger abilities in aspects of cognitive functioning may be helpful for matching variable pronunciations of speech sounds to stored representations, for example by being able to manage the activation of incorrect competing representations and shifting to other possible matches. PMID:28286491
GARCIA, Tatiana Manfrini; JACOB, Regina Tangerino de Souza; MONDELLI, Maria Fernanda Capoani Garcia
ABSTRACT Objective To relate the performance of individuals with hearing loss at high frequencies in speech perception with the quality of life before and after the fitting of an open-fit hearing aid (HA). Methods The WHOQOL-BREF had been used before the fitting and 90 days after the use of HA. The Hearing in Noise Test (HINT) had been conducted in two phases: (1) at the time of fitting without an HA (situation A) and with an HA (situation B); (2) with an HA 90 days after fitting (situation C). Study Sample Thirty subjects with sensorineural hearing loss at high frequencies. Results By using an analysis of variance and the Tukey’s test comparing the three HINT situations in quiet and noisy environments, an improvement has been observed after the HA fitting. The results of the WHOQOL-BREF have showed an improvement in the quality of life after the HA fitting (paired t-test). The relationship between speech perception and quality of life before the HA fitting indicated a significant relationship between speech recognition in noisy environments and in the domain of social relations after the HA fitting (Pearson’s correlation coefficient). Conclusions The auditory stimulation has improved speech perception and the quality of life of individuals. PMID:27383708
Young, N M; Grohne, K M; Carrasco, V N; Brown, C
This study compares the auditory perceptual skill development of 23 congenitally deaf children who received the Nucleus 22-channel cochlear implant with the SPEAK speech coding strategy, and 20 children who received the CLARION Multi-Strategy Cochlear Implant with the Continuous Interleaved Sampler (CIS) speech coding strategy. All were under 5 years old at implantation. Preimplantation, there were no significant differences between the groups in age, length of hearing aid use, or communication mode. Auditory skills were assessed at 6 months and 12 months after implantation. Postimplantation, the mean scores on all speech perception tests were higher for the Clarion group. These differences were statistically significant for the pattern perception and monosyllable subtests of the Early Speech Perception battery at 6 months, and for the Glendonald Auditory Screening Procedure at 12 months. Multiple regression analysis revealed that device type accounted for the greatest variance in performance after 12 months of implant use. We conclude that children using the CIS strategy implemented in the Clarion implant may develop better auditory perceptual skills during the first year postimplantation than children using the SPEAK strategy with the Nucleus device.
Leong, Victoria; Stone, Michael A; Turner, Richard E; Goswami, Usha
Prosodic rhythm in speech [the alternation of "Strong" (S) and "weak" (w) syllables] is cued, among others, by slow rates of amplitude modulation (AM) within the speech envelope. However, it is unclear exactly which envelope modulation rates and statistics are the most important for the rhythm percept. Here, the hypothesis that the phase relationship between "Stress" rate (∼2 Hz) and "Syllable" rate (∼4 Hz) AMs provides a perceptual cue for speech rhythm is tested. In a rhythm judgment task, adult listeners identified AM tone-vocoded nursery rhyme sentences that carried either trochaic (S-w) or iambic patterning (w-S). Manipulation of listeners' rhythm perception was attempted by parametrically phase-shifting the Stress AM and Syllable AM in the vocoder. It was expected that a 1π radian phase-shift (half a cycle) would reverse the perceived rhythm pattern (i.e., trochaic → iambic) whereas a 2π radian shift (full cycle) would retain the perceived rhythm pattern (i.e., trochaic → trochaic). The results confirmed these predictions. Listeners judgments of rhythm systematically followed Stress-Syllable AM phase-shifts, but were unaffected by phase-shifts between the Syllable AM and the Sub-beat AM (∼14 Hz) in a control condition. It is concluded that the Stress-Syllable AM phase relationship is an envelope-based modulation statistic that supports speech rhythm perception.
Mitterer, Holger; McQueen, James M.
Two experiments examined how Dutch listeners deal with the effects of connected-speech processes, specifically those arising from word-final /t/ reduction (e.g., whether Dutch [tas] is "tas," bag, or a reduced-/t/ version of "tast," touch). Eye movements of Dutch participants were tracked as they looked at arrays containing 4…
Fuller, Christina; Free, Rolien; Maat, Bert; Başkent, Deniz
In normal-hearing listeners, musical background has been observed to change the sound representation in the auditory system and produce enhanced performance in some speech perception tests. Based on these observations, it has been hypothesized that musical background can influence sound and speech perception, and as an extension also the quality of life, by cochlear-implant users. To test this hypothesis, this study explored musical background [using the Dutch Musical Background Questionnaire (DMBQ)], and self-perceived sound and speech perception and quality of life [using the Nijmegen Cochlear Implant Questionnaire (NCIQ) and the Speech Spatial and Qualities of Hearing Scale (SSQ)] in 98 postlingually deafened adult cochlear-implant recipients. In addition to self-perceived measures, speech perception scores (percentage of phonemes recognized in words presented in quiet) were obtained from patient records. The self-perceived hearing performance was associated with the objective speech perception. Forty-one respondents (44% of 94 respondents) indicated some form of formal musical training. Fifteen respondents (18% of 83 respondents) judged themselves as having musical training, experience, and knowledge. No association was observed between musical background (quantified by DMBQ), and self-perceived hearing-related performance or quality of life (quantified by NCIQ and SSQ), or speech perception in quiet.
Mantello, Erika Barioni; Silva, Carla Dias da; Massuda, Eduardo Tanaka; Hyppolito, Miguel Angelo; Reis, Ana Cláudia Mirândola Barbosa dos
Introduction Hearing difficulties can be minimized by the use of hearing aids. Objective The objective of this study is to assess the speech perception and satisfaction of hearing aids users before and after aid adaptation and to determine whether these measures are correlated. Methods The study was conducted on 65 individuals, 54% females and 46% males aged 63 years on average, after the systematic use of hearing aids for at least three months. We characterized subjectś personal identification data, the degree, and configuration of hearing loss, as well as aspects related to adaptation. We then applied a satisfaction questionnaire and a speech perception test (words and sentences), with and without the use of the hearing aids. Results Mean speech recognition with words and sentences was 69% and 79%, respectively, with hearing aids use; whereas, without hearing aids use the figures were 43% and 53%. Mean questionnaire score was 30.1 points. Regarding hearing loss characteristics, 78.5% of the subjects had a sensorineural loss, 20% a mixed loss, and 1.5% a conductive loss. Hearing loss of moderate degree was present in 60.5% of cases, loss of descending configuration in 47%, and plain loss in 37.5%. There was no correlation between individual satisfaction and the percentages of the speech perception tests applied. Conclusion Word and sentence recognition was significantly better with the use of the hearing aids. The users showed a high degree of satisfaction. In the present study, there was no correlation observed between the levels of speech perception and levels of user satisfaction measured with the questionnaire. PMID:27746833
Deschamps, Isabelle; Hasson, Uri; Tremblay, Pascale
The processing of continuous and complex auditory signals such as speech relies on the ability to use statistical cues (e.g. transitional probabilities). In this study, participants heard short auditory sequences composed either of Italian syllables or bird songs and completed a regularity-rating task. Behaviorally, participants were better at differentiating between levels of regularity in the syllable sequences than in the bird song sequences. Inter-individual differences in sensitivity to regularity for speech stimuli were correlated with variations in surface-based cortical thickness (CT). These correlations were found in several cortical areas including regions previously associated with statistical structure processing (e.g. bilateral superior temporal sulcus, left precentral sulcus and inferior frontal gyrus), as well other regions (e.g. left insula, bilateral superior frontal gyrus/sulcus and supramarginal gyrus). In all regions, this correlation was positive suggesting that thicker cortex is related to higher sensitivity to variations in the statistical structure of auditory sequences. Overall, these results suggest that inter-individual differences in CT within a distributed network of cortical regions involved in statistical structure processing, attention and memory is predictive of the ability to detect structural structure in auditory speech sequences. PMID:26919234
Hertrich, Ingo; Mathiak, Klaus; Lutzenberger, Werner; Menning, Hans; Ackermann, Hermann
Using whole-head magnetoencephalography (MEG), audiovisual (AV) interactions during speech perception (/ta/- and /pa/-syllables) were investigated in 20 subjects. Congruent AV events served as the 'standards' of an oddball design. The deviants encompassed incongruent /ta/-/pa/ configurations differing from the standards either in the acoustic or the visual domain. As an auditory non-speech control condition, the same video signals were synchronized with either one of two complex tones. As in natural speech, visual movement onset preceded acoustic signals by about 150 ms. First, the impact of visual information on auditorily evoked fields to non-speech sounds was determined. Larger facial movements (/pa/ versus /ta/) yielded enhanced early responses such as the M100 component, indicating, most presumably, anticipatory pre-activation of auditory cortex by visual motion cues. As a second step of analysis, mismatch fields (MMF) were calculated. Acoustic deviants elicited a typical MMF, peaking ca. 180 ms after stimulus onset, whereas visual deviants gave rise to later responses (220 ms) of a more posterior-medial source location. Finally, a late (275 ms), left-lateralized visually-induced MMF component, resembling the acoustic mismatch response, emerged during the speech condition, presumably reflecting phonetic/linguistic operations. There is mounting functional imaging evidence for an early impact of visual information on auditory cortical regions during speech perception. The present study suggests at least two successive AV interactions in association with syllable recognition tasks: early activation of auditory areas depending upon visual motion cues and a later speech-specific left-lateralized response mediated, conceivably, by backward-projections from multisensory areas.
Weil, Shawn A.
Non-normative speech (i.e., synthetic speech, pathological speech, foreign accented speech) is more difficult to process for native listeners than is normative speech. Does perceptual dissimilarity affect only intelligibility, or are there other costs to processing? The current series of experiments investigates both the intelligibility and time course of foreign accented speech (FAS) perception. Native English listeners heard single English words spoken by both native English speakers and non-native speakers (Mandarin or Russian). Words were chosen based on the similarity between the phonetic inventories of the respective languages. Three experimental designs were used: a cross-modal matching task, a word repetition (shadowing) task, and two subjective ratings tasks which measured impressions of accentedness and effortfulness. The results replicate previous investigations that have found that FAS significantly lowers word intelligibility. Furthermore, in FAS as well as perceptual effort, in the word repetition task, correct responses are slower to accented words than to nonaccented words. An analysis indicates that both intelligibility and reaction time are, in part, functions of the similarity between the talker's utterance and the listener's representation of the word.
Zhang, Linjun; Shu, Hua; Zhou, Fengying; Wang, Xiaoyi; Li, Ping
The present study examines the neural substrates for the perception of speech rhythm and intonation. Subjects listened passively to synthesized speech stimuli that contained no semantic and phonological information, in three conditions: (1) continuous speech stimuli with fixed syllable duration and fundamental frequency in the standard condition, (2) stimuli with varying vocalic durations of syllables in the speech rhythm condition, and (3) stimuli with varying fundamental frequency in the intonation condition. Compared to the standard condition, speech rhythm activated the right middle superior temporal gyrus (mSTG), whereas intonation activated the bilateral superior temporal gyrus and sulcus (STG/STS) and the right posterior STS. Conjunction analysis further revealed that rhythm and intonation activated a common area in the right mSTG but compared to speech rhythm, intonation elicited additional activations in the right anterior STS. Findings from the current study reveal that the right mSTG plays an important role in prosodic processing. Implications of our findings are discussed with respect to neurocognitive theories of auditory processing.
Miller, James D.; Watson, Charles S.; Dubno, Judy R.; Leek, Marjorie R.
Following an overview of theoretical issues in speech-perception training and of previous efforts to enhance hearing aid use through training, a multisite study, designed to evaluate the efficacy of two types of computerized speech-perception training for adults who use hearing aids, is described. One training method focuses on the identification of 109 syllable constituents (45 onsets, 28 nuclei, and 36 codas) in quiet and in noise, and on the perception of words in sentences presented in various levels of noise. In a second type of training, participants listen to 6- to 7-minute narratives in noise and are asked several questions about each narrative. Two groups of listeners are trained, each using one of these types of training, performed in a laboratory setting. The training for both groups is preceded and followed by a series of speech-perception tests. Subjects listen in a sound field while wearing their hearing aids at their usual settings. The training continues over 15 to 20 visits, with subjects completing at least 30 hours of focused training with one of the two methods. The two types of training are described in detail, together with a summary of other perceptual and cognitive measures obtained from all participants. PMID:27587914
Miller, James D; Watson, Charles S; Dubno, Judy R; Leek, Marjorie R
Following an overview of theoretical issues in speech-perception training and of previous efforts to enhance hearing aid use through training, a multisite study, designed to evaluate the efficacy of two types of computerized speech-perception training for adults who use hearing aids, is described. One training method focuses on the identification of 109 syllable constituents (45 onsets, 28 nuclei, and 36 codas) in quiet and in noise, and on the perception of words in sentences presented in various levels of noise. In a second type of training, participants listen to 6- to 7-minute narratives in noise and are asked several questions about each narrative. Two groups of listeners are trained, each using one of these types of training, performed in a laboratory setting. The training for both groups is preceded and followed by a series of speech-perception tests. Subjects listen in a sound field while wearing their hearing aids at their usual settings. The training continues over 15 to 20 visits, with subjects completing at least 30 hours of focused training with one of the two methods. The two types of training are described in detail, together with a summary of other perceptual and cognitive measures obtained from all participants.
Slevc, L Robert; Martin, Randi C; Hamilton, A Cris; Joanisse, Marc F
The mechanisms and functional anatomy underlying the early stages of speech perception are still not well understood. One way to investigate the cognitive and neural underpinnings of speech perception is by investigating patients with speech perception deficits but with preserved ability in other domains of language. One such case is reported here: patient NL shows highly impaired speech perception despite normal hearing ability and preserved semantic knowledge, speaking, and reading ability, and is thus classified as a case of pure word deafness (PWD). NL has a left temporoparietal lesion without right hemisphere damage and DTI imaging suggests that he has preserved cross-hemispheric connectivity, arguing against an account of PWD as a disconnection of left lateralized language areas from auditory input. Two experiments investigated whether NL's speech perception deficit could instead result from an underlying problem with rapid temporal processing. Experiment 1 showed that NL has particular difficulty discriminating sounds that differ in terms of rapid temporal changes, be they speech or non-speech sounds. Experiment 2 employed an intensive training program designed to improve rapid temporal processing in language impaired children (Fast ForWord; Scientific Learning Corporation, Oakland, CA) and found that NL was able to improve his ability to discriminate rapid temporal differences in non-speech sounds, but not in speech sounds. Overall, these data suggest that patients with unilateral PWD may, in fact, have a deficit in (left lateralized) temporal processing ability, however they also show that a rapid temporal processing deficit is, by itself, unable to account for this patient's speech perception deficit.
Slevc, L. Robert; Martin, Randi C.; Hamilton, A. Cris; Joanisse, Marc F.
The mechanisms and functional anatomy underlying the early stages of speech perception are still not well understood. One way to investigate the cognitive and neural underpinnings of speech perception is by investigating patients with speech perception deficits but with preserved ability in other domains of language. One such case is reported here: patient NL shows highly impaired speech perception despite normal hearing ability and preserved semantic knowledge, speaking, and reading ability, and is thus classified as a case of pure word deafness (PWD). NL has a left temporoparietal lesion without right hemisphere damage and DTI imaging suggests that he has preserved cross-hemispheric connectivity, arguing against an account of PWD as a disconnection of left lateralized language areas from auditory input. Two experiments investigated whether NL’s speech perception deficit could instead result from an underlying problem with rapid temporal processing. Experiment 1 showed that NL has particular difficulty discriminating sounds that differ in terms of rapid temporal changes, be they speech or non-speech sounds. Experiment 2 employed an intensive training program designed to improve rapid temporal processing in language impaired children (Fast ForWord; Scientific Learning Corporation, Oakland, CA) and found that NL was able to improve his ability to discriminate rapid temporal differences in non-speech sounds, but not in speech sounds. Overall, these data suggest that patients with unilateral PWD may, in fact, have a deficit in (left lateralized) temporal processing ability, however they also show that a rapid temporal processing deficit is, by itself, unable to account for this patient’s speech perception deficit. PMID:21093464
Cabrera, Laurianne; Tsao, Feng-Ming; Liu, Huei-Mei; Li, Lu-Yang; Hu, You-Hsin; Lorenzi, Christian; Bertoncini, Josiane
A number of studies showed that infants reorganize their perception of speech sounds according to their native language categories during their first year of life. Still, information is lacking about the contribution of basic auditory mechanisms to this process. This study aimed to evaluate when native language experience starts to noticeably affect the perceptual processing of basic acoustic cues [i.e., frequency-modulation (FM) and amplitude-modulation information] known to be crucial for speech perception in adults. The discrimination of a lexical-tone contrast (rising versus low) was assessed in 6- and 10-month-old infants learning either French or Mandarin using a visual habituation paradigm. The lexical tones were presented in two conditions designed to either keep intact or to severely degrade the FM and fine spectral cues needed to accurately perceive voice-pitch trajectory. A third condition was designed to assess the discrimination of the same voice-pitch trajectories using click trains containing only the FM cues related to the fundamental-frequency (F0) in French- and Mandarin-learning 10-month-old infants. Results showed that the younger infants of both language groups and the Mandarin-learning 10-month-olds discriminated the intact lexical-tone contrast while French-learning 10-month-olds failed. However, only the French 10-month-olds discriminated degraded lexical tones when FM, and thus voice-pitch cues were reduced. Moreover, Mandarin-learning 10-month-olds were found to discriminate the pitch trajectories as presented in click trains better than French infants. Altogether, these results reveal that the perceptual reorganization occurring during the first year of life for lexical tones is coupled with changes in the auditory ability to use speech modulation cues.
Tanta, Ivan; Lesinger, Gordana
Key place in this paper takes the study of political speech in the Republic of Croatia and their impact on voters, or which keywords are in political speeches and public appearances of politicians in Croatia that their voting body wants to hear. Given listed below we will define the research topic in the form of a question - is there a discrepancy in the perception of the public-political speech in Croatia, and which keywords are specific to the two main regions in Croatia and that inhabitant these regions respond. Marcus Tullius Cicero, the most important Roman orator, he used a specific associative mnemonic technique that is called "technique room". He would talk expound on keywords and conceptual terms that he needed for the desired topic and join in these make them, according to the desired order, in a very creative and unique way, the premises of the house or palace, which he knew well. Then, while holding the speech intended to pass through rooms of the house or palace and then put keywords and concepts come to mind, again according to the desired order. Given that this is a specific kind of research political speech that is relatively recent in Croatia, it should be noted that there is still, this kind of political communication is not sufficiently explored. Particularly the emphasis on the impact and use of keywords specific to the Republic of Croatia, in everyday public and political communication. The paper will be analyzed the political, campaign speeches and promises several winning candidates, and now Croatian MEPs, specific keywords related to: economics, culture, science, education and health. The analysis is based on comparison of the survey results on the representation of key words in the speeches of politicians and qualitative analysis of the speeches of politicians on key words during the election campaign.
PONS, FERRAN; ANDREU, LLORENC.; SANZ-TORRENT, MONICA; BUIL-LEGAZ, LUCIA; LEWKOWICZ, DAVID J.
Speech perception involves the integration of auditory and visual articulatory information and, thus, requires the perception of temporal synchrony between this information. There is evidence that children with Specific Language Impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component followed the visual component. None of the groups perceived an audiovisual asynchrony of 366ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception. PMID:22874648
Jenson, David; Harkrider, Ashley W.; Thornton, David; Bowers, Andrew L.; Saltuklaroglu, Tim
Sensorimotor integration (SMI) across the dorsal stream enables online monitoring of speech. Jenson et al. (2014) used independent component analysis (ICA) and event related spectral perturbation (ERSP) analysis of electroencephalography (EEG) data to describe anterior sensorimotor (e.g., premotor cortex, PMC) activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory) regions of the dorsal stream in the same tasks. Perception tasks required “active” discrimination of syllable pairs (/ba/ and /da/) in quiet and noisy conditions. Production conditions required overt production of syllable pairs and nouns. ICA performed on concatenated raw 68 channel EEG data from all tasks identified bilateral “auditory” alpha (α) components in 15 of 29 participants localized to pSTG (left) and pMTG (right). ERSP analyses were performed to reveal fluctuations in the spectral power of the α rhythm clusters across time. Production conditions were characterized by significant α event related synchronization (ERS; pFDR < 0.05) concurrent with EMG activity from speech production, consistent with speech-induced auditory inhibition. Discrimination conditions were also characterized by α ERS following stimulus offset. Auditory α ERS in all conditions temporally aligned with PMC activity reported in Jenson et al. (2014). These findings are indicative of speech-induced suppression of auditory regions, possibly via efference copy. The presence of the same pattern following stimulus offset in discrimination conditions suggests that sensorimotor contributions following speech perception reflect covert replay, and that covert replay provides one source of the motor activity previously observed in some speech perception tasks. To our knowledge, this is the first time that inhibition of auditory regions by speech has been observed in real-time with the ICA/ERSP technique. PMID
Jenson, David; Harkrider, Ashley W; Thornton, David; Bowers, Andrew L; Saltuklaroglu, Tim
Sensorimotor integration (SMI) across the dorsal stream enables online monitoring of speech. Jenson et al. (2014) used independent component analysis (ICA) and event related spectral perturbation (ERSP) analysis of electroencephalography (EEG) data to describe anterior sensorimotor (e.g., premotor cortex, PMC) activity during speech perception and production. The purpose of the current study was to identify and temporally map neural activity from posterior (i.e., auditory) regions of the dorsal stream in the same tasks. Perception tasks required "active" discrimination of syllable pairs (/ba/ and /da/) in quiet and noisy conditions. Production conditions required overt production of syllable pairs and nouns. ICA performed on concatenated raw 68 channel EEG data from all tasks identified bilateral "auditory" alpha (α) components in 15 of 29 participants localized to pSTG (left) and pMTG (right). ERSP analyses were performed to reveal fluctuations in the spectral power of the α rhythm clusters across time. Production conditions were characterized by significant α event related synchronization (ERS; pFDR < 0.05) concurrent with EMG activity from speech production, consistent with speech-induced auditory inhibition. Discrimination conditions were also characterized by α ERS following stimulus offset. Auditory α ERS in all conditions temporally aligned with PMC activity reported in Jenson et al. (2014). These findings are indicative of speech-induced suppression of auditory regions, possibly via efference copy. The presence of the same pattern following stimulus offset in discrimination conditions suggests that sensorimotor contributions following speech perception reflect covert replay, and that covert replay provides one source of the motor activity previously observed in some speech perception tasks. To our knowledge, this is the first time that inhibition of auditory regions by speech has been observed in real-time with the ICA/ERSP technique.
Foote, Jennifer A; Trofimovich, Pavel
Second language speech learning is predicated on learners' ability to notice differences between their own language output and that of their interlocutors. Because many learners interact primarily with other second language users, it is crucial to understand which dimensions underlie the perception of second language speech by learners, compared to native speakers. For this study, 15 non-native and 10 native English speakers rated 30-s language audio-recordings from controlled reading and interview tasks for dissimilarity, using all pairwise combinations of recordings. PROXSCAL multidimensional scaling analyses revealed fluency and aspects of speakers' pronunciation as components underlying listener judgments but showed little agreement across listeners. Results contribute to an understanding of why second language speech learning is difficult and provide implications for language training.
Schwartz, Richard G.; Scheffler, Frances L. V.; Lopez, Karece
Using an identification task, we examined lexical effects on the perception of vowel duration as a cue to final consonant voicing in 12 children with specific language impairment (SLI) and 13 age-matched (6;6-9;6) peers with typical language development (TLD). Naturally recorded CVtsets [word-word (WW), nonword-nonword (NN), word-nonword (WN) and…
Stewart, Mary E.; Ota, Mitsuhiko
It has been claimed that Autism Spectrum Disorder (ASD) is characterized by a limited ability to process perceptual stimuli in reference to the contextual information of the percept. Such a connection between a nonholistic processing style and behavioral traits associated with ASD is thought to exist also within the neurotypical population albeit…
Feijoo, Sergio; Fernandez, Santiago; Alvarez, Jose Manuel
The combined effects of excessive ambient noise and reverberation in classrooms interfere with speech recognition and tend to degrade the learning process of young children. This paper reports a detailed analysis of a speech recognition test carried out with two different children populations of ages 8-9 and 10-11. Unlike English, Spanish has few minimal pairs to be used for phoneme recognition in a closed set manner. The test consisted in a series of two-syllable nonsense words formed by the combination of all possible syllables in Spanish. The test was administered to the children as a dictation task in which they had to write down the words spoken by their female teacher. The test was administered in two blocks on different days, and later repeated to analyze its consistency. The rationale for this procedure was (a) the test should reproduce normal academic situations, (b) all phonological and lexical context effects should be avoided, (c) errors in both words and phonemes should be scored to unveil any possible acoustic base for them. Although word recognition scores were similar among age groups and repetitions, phoneme errors showed high variability questioning the validity of such a test for classroom assessment.
Padgitt, Noelle R.; Munson, Benjamin; Carney, Edward J.
University instruction in phonetics requires students to associate a set of quasialphabetic symbols and diacritics with speech sounds. In the case of narrow phonetic transcription, students are required to associate symbols with sounds that do not function contrastively in the language. This learning task is challenging, given that students must discriminate among different variants of sounds that are not used to convey differences in lexical meaning. Consequently, many students fail to learn phonetic transcription to a level of proficiency needed for practical application (B. Munson and K. N. Brinkman, Am. J. Speech Lang. Path. ). In an effort to improve students' phonetic transcription skills, a computerized training program was developed to trains students' discrimination and identification of selected phonetic contrasts. The design of the training tool was based on similar tools that have been used to train phonetic contrasts in second-language learners of English (e.g., A. Bradlow et al., J. Acoust. Soc. Am. 102, 3115 ). It consists of multiple stages (bombardment, discrimination, identification) containing phonetic contrasts that students have identified as particularly difficult to perceive. This presentation will provide a demonstration of the training tool, and will present preliminary data on the efficacy of this tool in improving students' phonetic transcription abilities.
Coffey, Emily B J; Mogilever, Nicolette B; Zatorre, Robert J
The ability to understand speech in the presence of competing sound sources is an important neuroscience question in terms of how the nervous system solves this computational problem. It is also a critical clinical problem that disproportionally affects the elderly, children with language-related learning disorders, and those with hearing loss. Recent evidence that musicians have an advantage on this multifaceted skill has led to the suggestion that musical training might be used to improve or delay the decline of speech-in-noise (SIN) function. However, enhancements have not been universally reported, nor have the relative contributions of different bottom-up versus top-down processes, and their relation to preexisting factors been disentangled. This information that would be helpful to establish whether there is a real effect of experience, what exactly is its nature, and how future training-based interventions might target the most relevant components of cognitive processes. These questions are complicated by important differences in study design and uneven coverage of neuroimaging modality. In this review, we aim to systematize recent results from studies that have specifically looked at musician-related differences in SIN by their study design properties, to summarize the findings, and to identify knowledge gaps for future work.
Wesselmeier, Hendrik; Müller, Horst M
We investigated the preparation of a spoken answer response to interrogative sentences by measuring response time (RT) and the response-related readiness potential (RP). By comparing the RT and RP results we aimed to identify whether the RP-onset is more related to the actual speech preparation process or the pure intention to speak after turn-anticipation. Additionally, we investigated if the RP-onset can be influenced by the syntactic structure (one or two completion points). Therefore, the EEG data were sorted based on two variables: the cognitive load required for the response and the syntactic structure of the stimulus questions. The results of the response utterance preparation associated event-related potential (ERP) and the RT suggest that the RP-onset is more related to the actual speech preparation process rather than the pure intention to speak after turn-anticipation. However, the RP-onset can be influenced by the syntactic structure of the question leading to an early response preparation.
Kuchinsky, Stefanie E; Ahlstrom, Jayne B; Cute, Stephanie L; Humes, Larry E; Dubno, Judy R; Eckert, Mark A
The current pupillometry study examined the impact of speech-perception training on word recognition and cognitive effort in older adults with hearing loss. Trainees identified more words at the follow-up than at the baseline session. Training also resulted in an overall larger and faster peaking pupillary response, even when controlling for performance and reaction time. Perceptual and cognitive capacities affected the peak amplitude of the pupil response across participants but did not diminish the impact of training on the other pupil metrics. Thus, we demonstrated that pupillometry can be used to characterize training-related and individual differences in effort during a challenging listening task. Importantly, the results indicate that speech-perception training not only affects overall word recognition, but also a physiological metric of cognitive effort, which has the potential to be a biomarker of hearing loss intervention outcome.
Evans, Samuel; Davis, Matthew H
How humans extract the identity of speech sounds from highly variable acoustic signals remains unclear. Here, we use searchlight representational similarity analysis (RSA) to localize and characterize neural representations of syllables at different levels of the hierarchically organized temporo-frontal pathways for speech perception. We asked participants to listen to spoken syllables that differed considerably in their surface acoustic form by changing speaker and degrading surface acoustics using noise-vocoding and sine wave synthesis while we recorded neural responses with functional magnetic resonance imaging. We found evidence for a graded hierarchy of abstraction across the brain. At the peak of the hierarchy, neural representations in somatomotor cortex encoded syllable identity but not surface acoustic form, at the base of the hierarchy, primary auditory cortex showed the reverse. In contrast, bilateral temporal cortex exhibited an intermediate response, encoding both syllable identity and the surface acoustic form of speech. Regions of somatomotor cortex associated with encoding syllable identity in perception were also engaged when producing the same syllables in a separate session. These findings are consistent with a hierarchical account of how variable acoustic signals are transformed into abstract representations of the identity of speech sounds.
Research on asynchronous audiovisual speech perception manipulates experimental conditions to observe their effects on synchrony judgments. Probabilistic models establish a link between the sensory and decisional processes underlying such judgments and the observed data, via interpretable parameters that allow testing hypotheses and making inferences about how experimental manipulations affect such processes. Two models of this type have recently been proposed, one based on independent channels and the other using a Bayesian approach. Both models are fitted here to a common data set, with a subsequent analysis of the interpretation they provide about how experimental manipulations affected the processes underlying perceived synchrony. The data consist of synchrony judgments as a function of audiovisual offset in a speech stimulus, under four within-subjects manipulations of the quality of the visual component. The Bayesian model could not accommodate asymmetric data, was rejected by goodness-of-fit statistics for 8/16 observers, and was found to be nonidentifiable, which renders uninterpretable parameter estimates. The independent-channels model captured asymmetric data, was rejected for only 1/16 observers, and identified how sensory and decisional processes mediating asynchronous audiovisual speech perception are affected by manipulations that only alter the quality of the visual component of the speech signal. PMID:27551361
García-Pérez, Miguel A; Alcalá-Quintana, Rocío
Research on asynchronous audiovisual speech perception manipulates experimental conditions to observe their effects on synchrony judgments. Probabilistic models establish a link between the sensory and decisional processes underlying such judgments and the observed data, via interpretable parameters that allow testing hypotheses and making inferences about how experimental manipulations affect such processes. Two models of this type have recently been proposed, one based on independent channels and the other using a Bayesian approach. Both models are fitted here to a common data set, with a subsequent analysis of the interpretation they provide about how experimental manipulations affected the processes underlying perceived synchrony. The data consist of synchrony judgments as a function of audiovisual offset in a speech stimulus, under four within-subjects manipulations of the quality of the visual component. The Bayesian model could not accommodate asymmetric data, was rejected by goodness-of-fit statistics for 8/16 observers, and was found to be nonidentifiable, which renders uninterpretable parameter estimates. The independent-channels model captured asymmetric data, was rejected for only 1/16 observers, and identified how sensory and decisional processes mediating asynchronous audiovisual speech perception are affected by manipulations that only alter the quality of the visual component of the speech signal.
Parviainen, Tiina; Helenius, Päivi; Poskiparta, Elisa; Niemi, Pekka; Salmelin, Riitta
Speech processing skills go through intensive development during mid-childhood, providing basis also for literacy acquisition. The sequence of auditory cortical processing of speech has been characterized in adults, but very little is known about the neural representation of speech sound perception in the developing brain. We used whole-head magnetoencephalography (MEG) to record neural responses to speech and nonspeech sounds in first-graders (7-8-year-old) and compared the activation sequence to that in adults. In children, the general location of neural activity in the superior temporal cortex was similar to that in adults, but in the time domain the sequence of activation was strikingly different. Cortical differentiation between sound types emerged in a prolonged response pattern at about 250 ms after sound onset, in both hemispheres, clearly later than the corresponding effect at about 100 ms in adults that was detected specifically in the left hemisphere. Better reading skills were linked with shorter-lasting neural activation, speaking for interdependence of the maturing neural processes of auditory perception and developing linguistic skills. This study uniquely utilized the potential of MEG in comparing both spatial and temporal characteristics of neural activation between adults and children. Besides depicting the group-typical features in cortical auditory processing, the results revealed marked interindividual variability in children.
Narne, Vijaya Kumar; Vanaja, C S
Individuals with auditory neuropathy (AN) often suffer from temporal processing deficits causing speech perception difficulties. In the present study an envelope enhancement scheme that incorporated envelope expansion was used to reduce the effects of temporal deficits. The study involved two experiments. In the first experiment, to simulate the effects of reduced temporal resolution, temporally smeared speech stimuli were presented to listeners with normal hearing. The results revealed that temporal smearing of the speech signal reduced identification scores. With the envelope enhancement of the speech signal prior to being temporally smeared, identification scores improved significantly compared to temporally smeared condition. The second experiment assessed speech perception in twelve individuals with AN, using unprocessed and envelope-enhanced speech signals. The results revealed improvement in speech identification scores for the majority of individuals with AN when the envelope of the speech signal was enhanced. However, envelope enhancement was not able to improve speech identification scores for individuals with AN who had very poor unprocessed speech scores. Overall, the results of the present study suggest that applying envelope enhancement strategies in hearing aids might provide some benefits to many individuals with AN.
Rüsseler, Jascha; Ye, Zheng; Gerth, Ivonne; Szycik, Gregor R; Münte, Thomas F
Developmental dyslexia is a specific deficit in reading and spelling that often persists into adulthood. In the present study, we used slow event-related fMRI and independent component analysis to identify brain networks involved in perception of audio-visual speech in a group of adult readers with dyslexia (RD) and a group of fluent readers (FR). Participants saw a video of a female speaker saying a disyllabic word. In the congruent condition, audio and video input were identical whereas in the incongruent condition, the two inputs differed. Participants had to respond to occasionally occurring animal names. The independent components analysis (ICA) identified several components that were differently modulated in FR and RD. Two of these components including fusiform gyrus and occipital gyrus showed less activation in RD compared to FR possibly indicating a deficit to extract face information that is needed to integrate auditory and visual information in natural speech perception. A further component centered on the superior temporal sulcus (STS) also exhibited less activation in RD compared to FR. This finding is corroborated in the univariate analysis that shows less activation in STS for RD compared to FR. These findings suggest a general impairment in recruitment of audiovisual processing areas in dyslexia during the perception of natural speech.
Sinke, Christopher; Neufeld, Janina; Zedler, Markus; Emrich, Hinderk M; Bleich, Stefan; Münte, Thomas F; Szycik, Gregor R
Recent research suggests synesthesia as a result of a hypersensitive multimodal binding mechanism. To address the question whether multimodal integration is altered in synesthetes in general, grapheme-colour and auditory-visual synesthetes were investigated using speech-related stimulation in two behavioural experiments. First, we used the McGurk illusion to test the strength and number of illusory perceptions in synesthesia. In a second step, we analysed the gain in speech perception coming from seen articulatory movements under acoustically noisy conditions. We used disyllabic nouns as stimulation and varied signal-to-noise ratio of the auditory stream presented concurrently to a matching video of the speaker. We hypothesized that if synesthesia is due to a general hyperbinding mechanism this group of subjects should be more susceptible to McGurk illusions and profit more from the visual information during audiovisual speech perception. The results indicate that there are differences between synesthetes and controls concerning multisensory integration--but in the opposite direction as hypothesized. Synesthetes showed a reduced number of illusions and had a reduced gain in comprehension by viewing matching articulatory movements in comparison to control subjects. Our results indicate that rather than having a hypersensitive binding mechanism, synesthetes show weaker integration of vision and audition.
Eggleston, Jessica L.; Reavis, Kelly M.; McMillan, Garnett P.; Reiss, Lina A. J.
Purpose The objective was to determine whether speech perception could be improved for bimodal listeners (those using a cochlear implant [CI] in one ear and hearing aid in the contralateral ear) by removing low-frequency information provided by the CI, thereby reducing acoustic–electric overlap. Method Subjects were adult CI subjects with at least 1 year of CI experience. Nine subjects were evaluated in the CI-only condition (control condition), and 26 subjects were evaluated in the bimodal condition. CIs were programmed with 4 experimental programs in which the low cutoff frequency (LCF) was progressively raised. Speech perception was evaluated using Consonant-Nucleus-Consonant words in quiet, AzBio sentences in background babble, and spondee words in background babble. Results The CI-only group showed decreased speech perception in both quiet and noise as the LCF was raised. Bimodal subjects with better hearing in the hearing aid ear (< 60 dB HL at 250 and 500 Hz) performed best for words in quiet as the LCF was raised. In contrast, bimodal subjects with worse hearing (> 60 dB HL at 250 and 500 Hz) performed similarly to the CI-only group. Conclusions These findings suggest that reducing low-frequency overlap of the CI and contralateral hearing aid may improve performance in quiet for some bimodal listeners with better hearing. PMID:26535803
Fischer-Shofty, Meytal; Levkovitz, Yechiel; Shamay-Tsoory, Simone G
Despite the dominant role of the hormone oxytocin (OT) in social behavior, little is known about the role of OT in the perception of social relationships. Furthermore, it is unclear whether there are sex differences in the way that OT affects social perception. Here, we employed a double-blind, placebo-controlled crossover design to investigate the effect of OT on accurate social perception. Following treatment, 62 participants completed the Interpersonal Perception Task, a method of assessing the accuracy of social judgments that requires identification of the relationship between people interacting in real life video clips divided into three categories: kinship, intimacy and competition. The findings suggest that OT had a general effect on improving accurate perception of social interactions. Furthermore, we show that OT also involves sex-specific characteristics. An interaction between treatment, task category and sex indicated that OT had a selective effect on improving kinship recognition in women, but not in men, whereas men's performance was improved following OT only for competition recognition. It is concluded that the gender-specific findings reported here may point to some biosocial differences in the effect of OT which may be expressed in women's tendency for communal and familial social behavior as opposed to men's tendency for competitive social behavior.
Fischer-Shofty, Meytal; Levkovitz, Yechiel
Despite the dominant role of the hormone oxytocin (OT) in social behavior, little is known about the role of OT in the perception of social relationships. Furthermore, it is unclear whether there are sex differences in the way that OT affects social perception. Here, we employed a double-blind, placebo-controlled crossover design to investigate the effect of OT on accurate social perception. Following treatment, 62 participants completed the Interpersonal Perception Task, a method of assessing the accuracy of social judgments that requires identification of the relationship between people interacting in real life video clips divided into three categories: kinship, intimacy and competition. The findings suggest that OT had a general effect on improving accurate perception of social interactions. Furthermore, we show that OT also involves sex-specific characteristics. An interaction between treatment, task category and sex indicated that OT had a selective effect on improving kinship recognition in women, but not in men, whereas men's performance was improved following OT only for competition recognition. It is concluded that the gender-specific findings reported here may point to some biosocial differences in the effect of OT which may be expressed in women's tendency for communal and familial social behavior as opposed to men's tendency for competitive social behavior. PMID:22446301
Skipper, Jeremy I.
What do we hear when someone speaks and what does auditory cortex (AC) do with that sound? Given how meaningful speech is, it might be hypothesized that AC is most active when other people talk so that their productions get decoded. Here, neuroimaging meta-analyses show the opposite: AC is least active and sometimes deactivated when participants listened to meaningful speech compared to less meaningful sounds. Results are explained by an active hypothesis-and-test mechanism where speech production (SP) regions are neurally re-used to predict auditory objects associated with available context. By this model, more AC activity for less meaningful sounds occurs because predictions are less successful from context, requiring further hypotheses be tested. This also explains the large overlap of AC co-activity for less meaningful sounds with meta-analyses of SP. An experiment showed a similar pattern of results for non-verbal context. Specifically, words produced less activity in AC and SP regions when preceded by co-speech gestures that visually described those words compared to those words without gestures. Results collectively suggest that what we ‘hear’ during real-world speech perception may come more from the brain than our ears and that the function of AC is to confirm or deny internal predictions about the identity of sounds. PMID:25092665
Skipper, Jeremy I
What do we hear when someone speaks and what does auditory cortex (AC) do with that sound? Given how meaningful speech is, it might be hypothesized that AC is most active when other people talk so that their productions get decoded. Here, neuroimaging meta-analyses show the opposite: AC is least active and sometimes deactivated when participants listened to meaningful speech compared to less meaningful sounds. Results are explained by an active hypothesis-and-test mechanism where speech production (SP) regions are neurally re-used to predict auditory objects associated with available context. By this model, more AC activity for less meaningful sounds occurs because predictions are less successful from context, requiring further hypotheses be tested. This also explains the large overlap of AC co-activity for less meaningful sounds with meta-analyses of SP. An experiment showed a similar pattern of results for non-verbal context. Specifically, words produced less activity in AC and SP regions when preceded by co-speech gestures that visually described those words compared to those words without gestures. Results collectively suggest that what we 'hear' during real-world speech perception may come more from the brain than our ears and that the function of AC is to confirm or deny internal predictions about the identity of sounds.
Varnet, Léo; Wang, Tianyun; Peter, Chloe; Meunier, Fanny; Hoen, Michel
It is now well established that extensive musical training percolates to higher levels of cognition, such as speech processing. However, the lack of a precise technique to investigate the specific listening strategy involved in speech comprehension has made it difficult to determine how musicians’ higher performance in non-speech tasks contributes to their enhanced speech comprehension. The recently developed Auditory Classification Image approach reveals the precise time-frequency regions used by participants when performing phonemic categorizations in noise. Here we used this technique on 19 non-musicians and 19 professional musicians. We found that both groups used very similar listening strategies, but the musicians relied more heavily on the two main acoustic cues, at the first formant onset and at the onsets of the second and third formants onsets. Additionally, they responded more consistently to stimuli. These observations provide a direct visualization of auditory plasticity resulting from extensive musical training and shed light on the level of functional transfer between auditory processing and speech perception. PMID:26399909
Gordon-Salant, Sandra; Fitzgibbons, Peter J.
The present experiments examine the effects of listener age and hearing sensitivity on the ability to understand temporally altered speech in quiet when the proportion of a sentence processed by time compression is varied. Additional conditions in noise investigate whether or not listeners are affected by alterations in the presentation rate of background speech babble, relative to the presentation rate of the target speech signal. Younger and older adults with normal hearing and with mild-to-moderate sensorineural hearing losses served as listeners. Speech stimuli included sentences, syntactic sets, and random-order words. Presentation rate was altered via time compression applied to the entire stimulus or to selected phrases within the stimulus. Older listeners performed more poorly than younger listeners in most conditions involving time compression, and their performance decreased progressively with the proportion of the stimulus that was processed with time compression. Older listeners also performed more poorly than younger listeners in all noise conditions, but both age groups demonstrated better performance in conditions incorporating a mismatch in the presentation rate between target signal and background babble compared to conditions with matched rates. The age effects in quiet are consistent with the generalized slowing hypothesis of aging. Performance patterns in noise tentatively support the notion that altered rates of speech signal and background babble may provide a cue to enhance auditory figure-ground perception by both younger and older listeners.
Bidelman, Gavin M; Howell, Megan
Previous studies suggest that at poorer signal-to-noise ratios (SNRs), auditory cortical event-related potentials are weakened, prolonged, and show a shift in the functional lateralization of cerebral processing from left to right hemisphere. Increased right hemisphere involvement during speech-in-noise (SIN) processing may reflect the recruitment of additional brain resources to aid speech recognition or alternatively, the progressive loss of involvement from left linguistic brain areas as speech becomes more impoverished (i.e., nonspeech-like). To better elucidate the brain basis of SIN perception, we recorded neuroelectric activity in normal hearing listeners to speech sounds presented at various SNRs. Behaviorally, listeners obtained superior SIN performance for speech presented to the right compared to the left ear (i.e., right ear advantage). Source analysis of neural data assessed the relative contribution of region-specific neural generators (linguistic and auditory brain areas) to SIN processing. We found that left inferior frontal brain areas (e.g., Broca's areas) partially disengage at poorer SNRs but responses do not right lateralize with increasing noise. In contrast, auditory sources showed more resilience to noise in left compared to right primary auditory cortex but also a progressive shift in dominance from left to right hemisphere at lower SNRs. Region- and ear-specific correlations revealed that listeners' right ear SIN advantage was predicted by source activity emitted from inferior frontal gyrus (but not primary auditory cortex). Our findings demonstrate changes in the functional asymmetry of cortical speech processing during adverse acoustic conditions and suggest that "cocktail party" listening skills depend on the quality of speech representations in the left cerebral hemisphere rather than compensatory recruitment of right hemisphere mechanisms.
An accurate mapping from speech acoustics to speech articulator movements has many practical applications, as well as theoretical implications of speech planning and perception science. This work can be divided into two parts. In the first part, we show that a simple codebook can be used to map acoustics to speech articulator movements in natural, conversational speech. In the second part, we incorporate cost optimization principles that have been shown to be relevant in motor control tasks into the codebook approach. These cost optimizations are defined as minimization of integral of magnitude velocity, acceleration and jerk of the speech articulators, and are implemented using a dynamic programming technique. Results show that incorporating cost minimization of speech articulator movements can significantly improve mapping acoustics to speech articulator movements. This suggests underlying physiological or neural planning principles used by speech articulators during speech production.
Iverson, Paul; Wagner, Anita; Pinet, Melanie; Rosen, Stuart
This study examined the perceptual specialization for native-language speech sounds, by comparing native Hindi and English speakers in their perception of a graded set of English /w/-/v/ stimuli that varied in similarity to natural speech. The results demonstrated that language experience does not affect general auditory processes for these types of sounds; there were strong cross-language differences for speech stimuli, and none for stimuli that were nonspeech. However, the cross-language differences extended into a gray area of speech-like stimuli that were difficult to classify, suggesting that the specialization occurred in phonetic processing prior to categorization.
Zevin, Jason D
It is clear that the ability to learn new speech contrasts changes over development, such that learning to categorize speech sounds as native speakers of a language do is more difficult in adulthood than it is earlier in development. There is also a wealth of data concerning changes in the perception of speech sounds during infancy, such that infants quite rapidly progress from language-general to more language-specific perceptual biases. It is often suggested that the perceptual narrowing observed during infancy plays a causal role in the loss of plasticity observed in adulthood, but the relationship between these two phenomena is complicated. Here I consider the relationship between changes in sensitivity to speech sound categorization over the first 2 years of life, when they appear to reorganize quite rapidly, to the "long tail" of development throughout childhood, in the context of understanding the sensitive period for speech perception.
Callan, Daniel E; Tsytsarev, Vassiliy; Hanakawa, Takashi; Callan, Akiko M; Katsuhara, Maya; Fukuyama, Hidenao; Turner, Robert
This 3-T fMRI study investigates brain regions similarly and differentially involved with listening and covert production of singing relative to speech. Given the greater use of auditory-motor self-monitoring and imagery with respect to consonance in singing, brain regions involved with these processes are predicted to be differentially active for singing more than for speech. The stimuli consisted of six Japanese songs. A block design was employed in which the tasks for the subject were to listen passively to singing of the song lyrics, passively listen to speaking of the song lyrics, covertly sing the song lyrics visually presented, covertly speak the song lyrics visually presented, and to rest. The conjunction of passive listening and covert production tasks used in this study allow for general neural processes underlying both perception and production to be discerned that are not exclusively a result of stimulus induced auditory processing nor to low level articulatory motor control. Brain regions involved with both perception and production for singing as well as speech were found to include the left planum temporale/superior temporal parietal region, as well as left and right premotor cortex, lateral aspect of the VI lobule of posterior cerebellum, anterior superior temporal gyrus, and planum polare. Greater activity for the singing over the speech condition for both the listening and covert production tasks was found in the right planum temporale. Greater activity in brain regions involved with consonance, orbitofrontal cortex (listening task), subcallosal cingulate (covert production task) were also present for singing over speech. The results are consistent with the PT mediating representational transformation across auditory and motor domains in response to consonance for singing over that of speech. Hemispheric laterality was assessed by paired t tests between active voxels in the contrast of interest relative to the left-right flipped contrast of
McQueen, James M.; Jesse, Alexandra; Norris, Dennis
The strongest support for feedback in speech perception comes from evidence of apparent lexical influence on prelexical fricative-stop compensation for coarticulation. Lexical knowledge (e.g., that the ambiguous final fricative of "Christma?" should be [s]) apparently influences perception of following stops. We argue that all such previous…
Schatzer, Reinhold; Koroleva, Inna; Griessner, Andreas; Levin, Sergey; Kusovkov, Vladislav; Yanov, Yuri; Zierhofer, Clemens
Early multi-channel designs in the history of cochlear implant development were based on a vocoder-type processing of frequency channels and presented bands of compressed analog stimulus waveforms simultaneously on multiple tonotopically arranged electrodes. The realization that the direct summation of electrical fields as a result of simultaneous electrode stimulation exacerbates interactions among the stimulation channels and limits cochlear implant outcome led to the breakthrough in the development of cochlear implants, the continuous interleaved (CIS) sampling coding strategy. By interleaving stimulation pulses across electrodes, CIS activates only a single electrode at each point in time, preventing a direct summation of electrical fields and hence the primary component of channel interactions. In this paper we show that a previously presented approach of simultaneous stimulation with channel interaction compensation (CIC) may also ameliorate the deleterious effects of simultaneous channel interaction on speech perception. In an acute study conducted in eleven experienced MED-EL implant users, configurations involving simultaneous stimulation with CIC and doubled pulse phase durations have been investigated. As pairs of electrodes were activated simultaneously and pulse durations were doubled, carrier rates remained the same. Comparison conditions involved both CIS and fine structure (FS) strategies, either with strictly sequential or paired-simultaneous stimulation. Results showed no statistical difference in the perception of sentences in noise and monosyllables for sequential and paired-simultaneous stimulation with doubled phase durations. This suggests that CIC can largely compensate for the effects of simultaneous channel interaction, for both CIS and FS coding strategies. A simultaneous stimulation paradigm has a number of potential advantages over a traditional sequential interleaved design. The flexibility gained when dropping the requirement of
Sohoglu, Ediz; Peelle, Jonathan E; Carlyon, Robert P; Davis, Matthew H
A striking feature of human perception is that our subjective experience depends not only on sensory information from the environment but also on our prior knowledge or expectations. The precise mechanisms by which sensory information and prior knowledge are integrated remain unclear, with longstanding disagreement concerning whether integration is strictly feedforward or whether higher-level knowledge influences sensory processing through feedback connections. Here we used concurrent EEG and MEG recordings to determine how sensory information and prior knowledge are integrated in the brain during speech perception. We manipulated listeners' prior knowledge of speech content by presenting matching, mismatching, or neutral written text before a degraded (noise-vocoded) spoken word. When speech conformed to prior knowledge, subjective perceptual clarity was enhanced. This enhancement in clarity was associated with a spatiotemporal profile of brain activity uniquely consistent with a feedback process: activity in the inferior frontal gyrus was modulated by prior knowledge before activity in lower-level sensory regions of the superior temporal gyrus. In parallel, we parametrically varied the level of speech degradation, and therefore the amount of sensory detail, so that changes in neural responses attributable to sensory information and prior knowledge could be directly compared. Although sensory detail and prior knowledge both enhanced speech clarity, they had an opposite influence on the evoked response in the superior temporal gyrus. We argue that these data are best explained within the framework of predictive coding in which sensory activity is compared with top-down predictions and only unexplained activity propagated through the cortical hierarchy.
Holt, Lori L.; Lotto, Andrew J.; Diehl, Randy L.
Behavioral experiments with infants, adults, and nonhuman animals converge with neurophysiological findings to suggest that there is a discontinuity in auditory processing of stimulus components differing in onset time by about 20 ms. This discontinuity has been implicated as a basis for boundaries between speech categories distinguished by voice onset time (VOT). Here, it is investigated how this discontinuity interacts with the learning of novel perceptual categories. Adult listeners were trained to categorize nonspeech stimuli that mimicked certain temporal properties of VOT stimuli. One group of listeners learned categories with a boundary coincident with the perceptual discontinuity. Another group learned categories defined such that the perceptual discontinuity fell within a category. Listeners in the latter group required significantly more experience to reach criterion categorization performance. Evidence of interactions between the perceptual discontinuity and the learned categories extended to generalization tests as well. It has been hypothesized that languages make use of perceptual discontinuities to promote distinctiveness among sounds within a language inventory. The present data suggest that discontinuities interact with category learning. As such, ``learnability'' may play a predictive role in selection of language sound inventories.
Miglietta, Sandra; Grimaldi, Mirko; Calabrese, Andrea
A Mismatch Negativity (MMN) study was performed to investigate whether pre-attentive vowel perception is influenced by phonological status. We compared the MMN response to the acoustic distinction between the allophonic variation [ε-e] and phonemic contrast [e-i] present in a Southern-Italian variety (Tricase dialect). Clear MMNs were elicited for both the phonemic and allophonic conditions. Interestingly, a shorter latency was observed for the phonemic pair, but no significant amplitude difference was observed between the two conditions. Together, these results suggest that for isolated vowels, the phonological status of a vowel category is reflected in the latency of the MMN peak. The earlier latency of the phonemic condition argues for an easier parsing and encoding of phonemic contrasts in memory representations. Thus, neural computations mapping auditory inputs into higher perceptual representations seem 'sensitive' to the contrastive/non-contrastive status of the sounds as determined by the listeners' knowledge of the own phonological system.
Nittrouer, Susan; Kuess, Jamie; Lowenstein, Joanna H.
Children need to discover linguistically meaningful structures in the acoustic speech signal. Being attentive to recurring, time-varying formant patterns helps in that process. However, that kind of acoustic structure may not be available to children with cochlear implants (CIs), thus hindering development. The major goal of this study was to examine whether children with CIs are as sensitive to time-varying formant structure as children with normal hearing (NH) by asking them to recognize sine-wave speech. The same materials were presented as speech in noise, as well, to evaluate whether any group differences might simply reflect general perceptual deficits on the part of children with CIs. Vocabulary knowledge, phonemic awareness, and “top-down” language effects were all also assessed. Finally, treatment factors were examined as possible predictors of outcomes. Results showed that children with CIs were as accurate as children with NH at recognizing sine-wave speech, but poorer at recognizing speech in noise. Phonemic awareness was related to that recognition. Top-down effects were similar across groups. Having had a period of bimodal stimulation near the time of receiving a first CI facilitated these effects. Results suggest that children with CIs have access to the important time-varying structure of vocal-tract formants. PMID:25994709
LaCroix, Arianna N.; Diaz, Alvaro F.; Rogalsky, Corianne
The relationship between the neurobiology of speech and music has been investigated for more than a century. There remains no widespread agreement regarding how (or to what extent) music perception utilizes the neural circuitry that is engaged in speech processing, particularly at the cortical level. Prominent models such as Patel's Shared Syntactic Integration Resource Hypothesis (SSIRH) and Koelsch's neurocognitive model of music perception suggest a high degree of overlap, particularly in the frontal lobe, but also perhaps more distinct representations in the temporal lobe with hemispheric asymmetries. The present meta-analysis study used activation likelihood estimate analyses to identify the brain regions consistently activated for music as compared to speech across the functional neuroimaging (fMRI and PET) literature. Eighty music and 91 speech neuroimaging studies of healthy adult control subjects were analyzed. Peak activations reported in the music and speech studies were divided into four paradigm categories: passive listening, discrimination tasks, error/anomaly detection tasks and memory-related tasks. We then compared activation likelihood estimates within each category for music vs. speech, and each music condition with passive listening. We found that listening to music and to speech preferentially activate distinct temporo-parietal bilateral cortical networks. We also found music and speech to have shared resources in the left pars opercularis but speech-specific resources in the left pars triangularis. The extent to which music recruited speech-activated frontal resources was modulated by task. While there are certainly limitations to meta-analysis techniques particularly regarding sensitivity, this work suggests that the extent of shared resources between speech and music may be task-dependent and highlights the need to consider how task effects may be affecting conclusions regarding the neurobiology of speech and music. PMID:26321976
Trude, Alison M; Duff, Melissa C; Brown-Schmidt, Sarah
A hallmark of human speech perception is the ability to comprehend speech quickly and effortlessly despite enormous variability across talkers. However, current theories of speech perception do not make specific claims about the memory mechanisms involved in this process. To examine whether declarative memory is necessary for talker-specific learning, we tested the ability of amnesic patients with severe declarative memory deficits to learn and distinguish the accents of two unfamiliar talkers by monitoring their eye-gaze as they followed spoken instructions. Analyses of the time-course of eye fixations showed that amnesic patients rapidly learned to distinguish these accents and tailored perceptual processes to the voice of each talker. These results demonstrate that declarative memory is not necessary for this ability and points to the involvement of non-declarative memory mechanisms. These results are consistent with findings that other social and accommodative behaviors are preserved in amnesia and contribute to our understanding of the interactions of multiple memory systems in the use and understanding of spoken language.
O'Brien, Rachael; Byrne, Nicole; Mitchell, Rebecca; Ferguson, Alison
Workforce shortages are forecast for speech-language pathology in Australia, and will have a more significant impact on rural and remote areas than on metropolitan areas. Allied health (AH) disciplines such as physiotherapy and occupational therapy address the problem of workforce shortages and growing clinical demand by employing allied health assistants (AHAs) to provide clinical and administrative support to AH professionals. Currently, speech-language pathologists (SLPs) don't work with discipline-specific allied health assistants in all states of Australia (e.g., New South Wales). This paper aims to provide insight into the perceptions of SLPs in one Australian state (NSW) regarding working with AHAs. Semi-structured interviews were conducted with eight rural SLPs. Qualitative analysis indicated that participants perceived they had deficits in skills and knowledge required to work with AHAs and identified further training needs. Participants perceived the SLP role to be misunderstood and were concerned about poor consultation regarding the introduction of AHAs into the profession. Ambivalence was evident in overall perceptions of working with AHAs, and tasks performed. While previous research identified benefits of working with AHAs, results from this study suggest that significant professional, economic, and organizational issues need addressing before such a change should be implemented in speech-language pathology.
Oh, Soo Hee; Donaldson, Gail S; Kong, Ying-Yee
Low-frequency acoustic cues have been shown to enhance speech perception by cochlear-implant users, particularly when target speech occurs in a competing background. The present study examined the extent to which a continuous representation of low-frequency harmonicity cues contributes to bimodal benefit in simulated bimodal listeners. Experiment 1 examined the benefit of restoring a continuous temporal envelope to the low-frequency ear while the vocoder ear received a temporally interrupted stimulus. Experiment 2 examined the effect of providing continuous harmonicity cues in the low-frequency ear as compared to restoring a continuous temporal envelope in the vocoder ear. Findings indicate that bimodal benefit for temporally interrupted speech increases when continuity is restored to either or both ears. The primary benefit appears to stem from the continuous temporal envelope in the low-frequency region providing additional phonetic cues related to manner and F1 frequency; a secondary contribution is provided by low-frequency harmonicity cues when a continuous representation of the temporal envelope is present in the low-frequency, or both ears. The continuous temporal envelope and harmonicity cues of low-frequency speech are thought to support bimodal benefit by facilitating identification of word and syllable boundaries, and by restoring partial phonetic cues that occur during gaps in the temporally interrupted stimulus.
Scheperle, Rachel A.; Abbas, Paul J.
Objectives The ability to perceive speech is related to the listener’s ability to differentiate among frequencies (i.e., spectral resolution). Cochlear implant (CI) users exhibit variable speech-perception and spectral-resolution abilities, which can be attributed in part to the extent of electrode interactions at the periphery (i.e., spatial selectivity). However, electrophysiological measures of peripheral spatial selectivity have not been found to correlate with speech perception. The purpose of this study was to evaluate auditory processing at the periphery and cortex using both simple and spectrally complex stimuli to better understand the stages of neural processing underlying speech perception. The hypotheses were that (1) by more completely characterizing peripheral excitation patterns than in previous studies, significant correlations with measures of spectral selectivity and speech perception would be observed, (2) adding information about processing at a level central to the auditory nerve would account for additional variability in speech perception, and (3) responses elicited with spectrally complex stimuli would be more strongly correlated with speech perception than responses elicited with spectrally simple stimuli. Design Eleven adult CI users participated. Three experimental processor programs (MAPs) were created to vary the likelihood of electrode interactions within each participant. For each MAP, a subset of 7 of 22 intracochlear electrodes was activated: adjacent (MAP 1), every-other (MAP 2), or every third (MAP 3). Peripheral spatial selectivity was assessed using the electrically evoked compound action potential (ECAP) to obtain channel-interaction functions for all activated electrodes (13 functions total). Central processing was assessed by eliciting the auditory change complex (ACC) with both spatial (electrode pairs) and spectral (rippled noise) stimulus changes. Speech-perception measures included vowel-discrimination and the Bamford
Scott, Sophie K.; Rosen, Stuart; Wickham, Lindsay; Wise, Richard J. S.
Positron emission tomography (PET) was used to investigate the neural basis of the comprehension of speech in unmodulated noise (``energetic'' masking, dominated by effects at the auditory periphery), and when presented with another speaker (``informational'' masking, dominated by more central effects). Each type of signal was presented at four different signal-to-noise ratios (SNRs) (+3, 0, -3, -6 dB for the speech-in-speech, +6, +3, 0, -3 dB for the speech-in-noise), with listeners instructed to listen for meaning to the target speaker. Consistent with behavioral studies, there was SNR-dependent activation associated with the comprehension of speech in noise, with no SNR-dependent activity for the comprehension of speech-in-speech (at low or negative SNRs). There was, in addition, activation in bilateral superior temporal gyri which was associated with the informational masking condition. The extent to which this activation of classical ``speech'' areas of the temporal lobes might delineate the neural basis of the informational masking is considered, as is the relationship of these findings to the interfering effects of unattended speech and sound on more explicit working memory tasks. This study is a novel demonstration of candidate neural systems involved in the perception of speech in noisy environments, and of the processing of multiple speakers in the dorso-lateral temporal lobes.
In this dissertation I present a model that captures categorical effects in both first language (L1) and second language (L2) speech perception. In L1 perception, categorical effects range between extremely strong for consonants to nearly continuous perception of vowels. I treat the problem of speech perception as a statistical inference problem and by quantifying categoricity I obtain a unified model of both strong and weak categorical effects. In this optimal inference mechanism, the listener uses their knowledge of categories and the acoustics of the signal to infer the intended productions of the speaker. The model splits up speech variability into meaningful category variance and perceptual noise variance. The ratio of these two variances, which I call Tau, directly correlates with the degree of categorical effects for a given phoneme or continuum. By fitting the model to behavioral data from different phonemes, I show how a single parametric quantitative variation can lead to the different degrees of categorical effects seen in perception experiments with different phonemes. In L2 perception, L1 categories have been shown to exert an effect on how L2 sounds are identified and how well the listener is able to discriminate them. Various models have been developed to relate the state of L1 categories with both the initial and eventual ability to process the L2. These models largely lacked a formalized metric to measure perceptual distance, a means of making a-priori predictions of behavior for a new contrast, and a way of describing non-discrete gradient effects. In the second part of my dissertation, I apply the same computational model that I used to unify L1 categorical effects to examining L2 perception. I show that we can use the model to make the same type of predictions as other SLA models, but also provide a quantitative framework while formalizing all measures of similarity and bias. Further, I show how using this model to consider L2 learners at
Neal, Jennifer Watling; Neal, Zachary P.; Cappella, Elise
This study examines predictors of observer accuracy (i.e. seeing) and target accuracy (i.e. being seen) in perceptions of classmates’ relationships in a predominantly African American sample of 420 second through fourth graders (ages 7 – 11). Girls, children in higher grades, and children in smaller classrooms were more accurate observers. Targets (i.e. pairs of children) were more accurately observed when they occurred in smaller classrooms of higher grades and involved same-sex, high-popularity, and similar-popularity children. Moreover, relationships between pairs of girls were more accurately observed than relationships between pairs of boys. As a set, these findings suggest the importance of both observer and target characteristics for children’s accurate perceptions of classroom relationships. Moreover, the substantial variation in observer accuracy and target accuracy has methodological implications for both peer-reported assessments of classroom relationships and the use of stochastic actor-based models to understand peer selection and socialization processes. PMID:26347582
Gauvin, Hanna S.; Hartsuiker, Robert J.; Huettig, Falk
The Perceptual Loop Theory of speech monitoring assumes that speakers routinely inspect their inner speech. In contrast, Huettig and Hartsuiker (2010) observed that listening to one's own speech during language production drives eye-movements to phonologically related printed words with a similar time-course as listening to someone else's speech does in speech perception experiments. This suggests that speakers use their speech perception system to listen to their own overt speech, but not to their inner speech. However, a direct comparison between production and perception with the same stimuli and participants is lacking so far. The current printed word eye-tracking experiment therefore used a within-subjects design, combining production and perception. Displays showed four words, of which one, the target, either had to be named or was presented auditorily. Accompanying words were phonologically related, semantically related, or unrelated to the target. There were small increases in looks to phonological competitors with a similar time-course in both production and perception. Phonological effects in perception however lasted longer and had a much larger magnitude. We conjecture that this difference is related to a difference in predictability of one's own and someone else's speech, which in turn has consequences for lexical competition in other-perception and possibly suppression of activation in self-perception. PMID:24339809
Ohms, Verena R.; Gill, Arike; Van Heijningen, Caroline A. A.; Beckers, Gabriel J. L.; ten Cate, Carel
Humans readily distinguish spoken words that closely resemble each other in acoustic structure, irrespective of audible differences between individual voices or sex of the speakers. There is an ongoing debate about whether the ability to form phonetic categories that underlie such distinctions indicates the presence of uniquely evolved, speech-linked perceptual abilities, or is based on more general ones shared with other species. We demonstrate that zebra finches (Taeniopygia guttata) can discriminate and categorize monosyllabic words that differ in their vowel and transfer this categorization to the same words spoken by novel speakers independent of the sex of the voices. Our analysis indicates that the birds, like humans, use intrinsic and extrinsic speaker normalization to make the categorization. This finding shows that there is no need to invoke special mechanisms, evolved together with language, to explain this feature of speech perception. PMID:19955157
Carlin, Charles H.; Milam, Jennifer L.; Carlin, Emily L.; Owen, Ashley
E-supervision has a potential role in addressing speech-language personnel shortages in rural and difficult to staff school districts. The purposes of this article are twofold: to determine how e-supervision might support graduate speech-language pathologist (SLP) interns placed in rural, remote, and difficult to staff public school districts; and, to investigate interns’ perceptions of in-person supervision compared to e-supervision. The study used a mixed methodology approach and collected data from surveys, supervision documents and records, and interviews. The results showed the use of e-supervision allowed graduate SLP interns to be adequately supervised across a variety of clients and professional activities in a manner that was similar to in-person supervision. Further, e-supervision was perceived as a more convenient and less stressful supervision format when compared to in-person supervision. Other findings are discussed and implications and limitations provided. PMID:25945201
Le Cocq, Cecile
Workers in noisy industrial environments are often confronted to communication problems. Lost of workers complain about not being able to communicate easily with their coworkers when they wear hearing protectors. In consequence, they tend to remove their protectors, which expose them to the risk of hearing loss. In fact this communication problem is a double one: first the hearing protectors modify one's own voice perception; second they interfere with understanding speech from others. This double problem is examined in this thesis. When wearing hearing protectors, the modification of one's own voice perception is partly due to the occlusion effect which is produced when an earplug is inserted in the car canal. This occlusion effect has two main consequences: first the physiological noises in low frequencies are better perceived, second the perception of one's own voice is modified. In order to have a better understanding of this phenomenon, the literature results are analyzed systematically, and a new method to quantify the occlusion effect is developed. Instead of stimulating the skull with a bone vibrator or asking the subject to speak as is usually done in the literature, it has been decided to excite the buccal cavity with an acoustic wave. The experiment has been designed in such a way that the acoustic wave which excites the buccal cavity does not excite the external car or the rest of the body directly. The measurement of the hearing threshold in open and occluded car has been used to quantify the subjective occlusion effect for an acoustic wave in the buccal cavity. These experimental results as well as those reported in the literature have lead to a better understanding of the occlusion effect and an evaluation of the role of each internal path from the acoustic source to the internal car. The speech intelligibility from others is altered by both the high sound levels of noisy industrial environments and the speech signal attenuation due to hearing
Wilson, Amanda H.; Paré, Martin; Munhall, Kevin G.
Purpose The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method We presented vowel–consonant–vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent conditions (Experiment 1; N = 66). In Experiment 2 (N = 20), participants performed a visual-only speech perception task and in Experiment 3 (N = 20) an audiovisual task while having their gaze behavior monitored using eye-tracking equipment. Results In the visual-only condition, increasing image resolution led to monotonic increases in performance, and proficient speechreaders were more affected by the removal of high spatial information than were poor speechreaders. The McGurk effect also increased with increasing visual resolution, although it was less affected by the removal of high-frequency information. Observers tended to fixate on the mouth more in visual-only perception, but gaze toward the mouth did not correlate with accuracy of silent speechreading or the magnitude of the McGurk effect. Conclusions The results suggest that individual differences in silent speechreading and the McGurk effect are not related. This conclusion is supported by differential influences of high-resolution visual information on the 2 tasks and differences in the pattern of gaze. PMID:27537379
Massaro, Dominic W
I review 2 seminal research reports published in this journal during its second decade more than a century ago. Given psychology's subdisciplines, they would not normally be reviewed together because one involves reading and the other speech perception. The small amount of interaction between these domains might have limited research and theoretical progress. In fact, the 2 early research reports revealed common processes involved in these 2 forms of language processing. Their illustration of the role of Wundt's apperceptive process in reading and speech perception anticipated descriptions of contemporary theories of pattern recognition, such as the fuzzy logical model of perception. Based on the commonalities between reading and listening, one can question why they have been viewed so differently. It is commonly believed that learning to read requires formal instruction and schooling, whereas spoken language is acquired from birth onward through natural interactions with people who talk. Most researchers and educators believe that spoken language is acquired naturally from birth onward and even prenatally. Learning to read, on the other hand, is not possible until the child has acquired spoken language, reaches school age, and receives formal instruction. If an appropriate form of written text is made available early in a child's life, however, the current hypothesis is that reading will also be learned inductively and emerge naturally, with no significant negative consequences. If this proposal is true, it should soon be possible to create an interactive system, Technology Assisted Reading Acquisition, to allow children to acquire literacy naturally.
Doesburg, Sam M; Emberson, Lauren L; Rahi, Alan; Cameron, David; Ward, Lawrence M
Real-world speech perception relies on both auditory and visual information that fall within the tolerated range of temporal coherence. Subjects were presented with audiovisual recordings of speech that were offset by either 30 or 300 ms, leading to perceptually coherent or incoherent audiovisual speech, respectively. We provide electroencephalographic evidence of a phase-synchronous gamma-oscillatory network that is transiently activated by the perception of audiovisual speech asynchrony, showing both topological and time-course correspondence to networks reported in previous neuroimaging research. This finding addresses a major theoretical hurdle regarding the mechanism by which distributed networks serving a common function achieve transient functional integration. Moreover, this evidence illustrates an important dissociation between phase-synchronization and stimulus coherence, highlighting the functional nature of network-based synchronization.
Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias
Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds on their ability to detect mismatches between concurrently presented auditory and visual vowels and related their performance to their productive abilities and later vocabulary size. Results show that infants' ability to detect mismatches between auditory and visually presented vowels differs depending on the vowels involved. Furthermore, infants' sensitivity to mismatches is modulated by their current articulatory knowledge and correlates with their vocabulary size at 12 months of age. This suggests that-aside from infants' ability to match nonnative audiovisual cues (Pons et al., 2009)-their ability to match native auditory and visual cues continues to develop during the first year of life. Our findings point to a potential role of salient vowel cues and productive abilities in the development of audiovisual speech perception, and further indicate a relation between infants' early sensitivity to audiovisual speech cues and their later language development.
Stevenson, Ryan A.; Nelms, Caitlin; Baum, Sarah H.; Zurkovsky, Lilia; Barense, Morgan D.; Newhouse, Paul A.; Wallace, Mark T.
Over the next two decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio (SNR). For whole-word recognition, older relative to younger adults showed greater multisensory gains at intermediate SNRs, but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as SNR decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments, and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy aging populations, and that deficits begin to emerge only at the more complex, word-recognition level of speech signals. PMID:25282337
Stevenson, Ryan A; Nelms, Caitlin E; Baum, Sarah H; Zurkovsky, Lilia; Barense, Morgan D; Newhouse, Paul A; Wallace, Mark T
Over the next 2 decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio. For whole-word recognition, older adults relative to younger adults showed greater multisensory gains at intermediate SNRs but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as signal-to-noise ratio decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy-aging populations and that deficits begin to emerge only at the more complex word-recognition level of speech signals.
Hayes-Harb, Rachel; Smith, Bruce L.; Bent, Tessa; Bradlow, Ann R.
This study investigated the intelligibility of native and Mandarin-accented English speech for native English and native Mandarin listeners. The word-final voicing contrast was considered (as in minimal pairs such as `cub' and `cup') in a forced-choice word identification task. For these particular talkers and listeners, there was evidence of an interlanguage speech intelligibility benefit for listeners (i.e., native Mandarin listeners were more accurate than native English listeners at identifying Mandarin-accented English words). However, there was no evidence of an interlanguage speech intelligibility benefit for talkers (i.e., native Mandarin listeners did not find Mandarin-accented English speech more intelligible than native English speech). When listener and talker phonological proficiency (operationalized as accentedness) was taken into account, it was found that the interlanguage speech intelligibility benefit for listeners held only for the low phonological proficiency listeners and low phonological proficiency speech. The intelligibility data were also considered in relation to various temporal-acoustic properties of native English and Mandarin-accented English speech in effort to better understand the properties of speech that may contribute to the interlanguage speech intelligibility benefit. PMID:19606271
Lewkowicz, David J; Minar, Nicholas J; Tift, Amy H; Brandon, Melissa
To investigate the developmental emergence of the perception of the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8- to 10-, and 12- to 14-month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor 8- to 10-month-old infants exhibited audiovisual matching in that they did not look longer at the matching monologue. In contrast, the 12- to 14-month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, perceived the multisensory coherence of native-language monologues earlier in the test trials than that of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12- to 14-month-olds did not depend on audiovisual synchrony, whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audiovisual synchrony cues are more important in the perception of the multisensory coherence of non-native speech than that of native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing.
Heffner, Christopher C.; Newman, Rochelle S.; Dilley, Laura C.; Idsardi, William J.
Purpose: A new literature has suggested that speech rate can influence the parsing of words quite strongly in speech. The purpose of this study was to investigate differences between younger adults and older adults in the use of context speech rate in word segmentation, given that older adults perceive timing information differently from younger…
Huettig, Falk; Hartsuiker, Robert J.
Theories of verbal self-monitoring generally assume an internal (pre-articulatory) monitoring channel, but there is debate about whether this channel relies on speech perception or on production-internal mechanisms. Perception-based theories predict that listening to one's own inner speech has similar behavioural consequences as listening to…
McCarthy, Kathleen M; Mahon, Merle; Rosen, Stuart; Evans, Bronwen G
The majority of bilingual speech research has focused on simultaneous bilinguals. Yet, in immigrant communities, children are often initially exposed to their family language (L1), before becoming gradually immersed in the host country's language (L2). This is typically referred to as sequential bilingualism. Using a longitudinal design, this study explored the perception and production of the English voicing contrast in 55 children (40 Sylheti-English sequential bilinguals and 15 English monolinguals). Children were tested twice: when they were in nursery (52-month-olds) and 1 year later. Sequential bilinguals' perception and production of English plosives were initially driven by their experience with their L1, but after starting school, changed to match that of their monolingual peers.
This paper investigates similarities between lexical consonant clusters and CVC sequences differing in the presence or absence of a lexical vowel in speech perception and production in two Portuguese varieties. The frequent high vowel deletion in the European variety (EP) and the realization of intervening vocalic elements between lexical clusters in Brazilian Portuguese (BP) may minimize the contrast between lexical clusters and CVC sequences in the two Portuguese varieties. In order to test this hypothesis we present a perception experiment with 72 participants and a physiological analysis of 3-dimensional movement data from 5 EP and 4 BP speakers. The perceptual results confirmed a gradual confusion of lexical clusters and CVC sequences in EP, which corresponded roughly to the gradient consonantal overlap found in production.
McCarthy, Kathleen M; Mahon, Merle; Rosen, Stuart; Evans, Bronwen G
The majority of bilingual speech research has focused on simultaneous bilinguals. Yet, in immigrant communities, children are often initially exposed to their family language (L1), before becoming gradually immersed in the host country's language (L2). This is typically referred to as sequential bilingualism. Using a longitudinal design, this study explored the perception and production of the English voicing contrast in 55 children (40 Sylheti-English sequential bilinguals and 15 English monolinguals). Children were tested twice: when they were in nursery (52-month-olds) and 1 year later. Sequential bilinguals' perception and production of English plosives were initially driven by their experience with their L1, but after starting school, changed to match that of their monolingual peers. PMID:25123987
Alm, Magnus; Behne, Dawn
Gender and age have been found to affect adults' audio-visual (AV) speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years) and middle-aged adults (50-60 years) with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females' general AV perceptual strategy. Although young females' speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy toward more visually dominated responses.
Stekelenburg, Jeroen J.; Vroomen, Jean
We investigated whether the interpretation of auditory stimuli as speech or non-speech affects audiovisual (AV) speech integration at the neural level. Perceptually ambiguous sine-wave replicas (SWS) of natural speech were presented to listeners who were either in "speech mode" or "non-speech mode". At the behavioral level, incongruent lipread…
O'Halloran, Robyn; Lee, Yan Shan; Rose, Miranda; Liamputtong, Pranee
There is a growing body of research that indicates that a person with a communication disability communicates and participates more effectively given a communicatively accessible environment. If this research is to be translated into practice then one needs to determine who will take on the role of creating communicatively accessible environments. This research adopted a qualitative methodology to explore the perceptions of speech-language pathologists about working to create communicatively accessible healthcare settings. Fifteen speech-language pathologists in three focus groups participated in this research. The focus group discussions were transcribed and analysed thematically. Thematic analysis indicated that speech-language pathologists believe there are four main benefits in creating communicatively accessible healthcare environments. These are Benefits for all people: Access for all, Benefits for healthcare administrators, Benefits for those wanting to improve communication with patients, and Benefits to the capacity to provide communicatively accessible environments. However, they believe these benefits can only be achieved if; The communication resources are available, Skilled, knowledgeable and supportive healthcare providers are available; and Systems are in place to support a whole-of-hospital approach. This research supports the development of a new role to improve the communicative accessibility of healthcare settings.
Richoz, Anne-Raphaëlle; Quinn, Paul C.; Hillairet de Boisferon, Anne; Berger, Carole; Loevenbruck, Hélène; Lewkowicz, David J.; Lee, Kang; Dole, Marjorie; Caldara, Roberto; Pascalis, Olivier
Early multisensory perceptual experiences shape the abilities of infants to perform socially-relevant visual categorization, such as the extraction of gender, age, and emotion from faces. Here, we investigated whether multisensory perception of gender is influenced by infant-directed (IDS) or adult-directed (ADS) speech. Six-, 9-, and 12-month-old infants saw side-by-side silent video-clips of talking faces (a male and a female) and heard either a soundtrack of a female or a male voice telling a story in IDS or ADS. Infants participated in only one condition, either IDS or ADS. Consistent with earlier work, infants displayed advantages in matching female relative to male faces and voices. Moreover, the new finding that emerged in the current study was that extraction of gender from face and voice was stronger at 6 months with ADS than with IDS, whereas at 9 and 12 months, matching did not differ for IDS versus ADS. The results indicate that the ability to perceive gender in audiovisual speech is influenced by speech manner. Our data suggest that infants may extract multisensory gender information developmentally earlier when looking at adults engaged in conversation with other adults (i.e., ADS) than when adults are directly talking to them (i.e., IDS). Overall, our findings imply that the circumstances of social interaction may shape early multisensory abilities to perceive gender. PMID:28060872
Alfelasi, Mohammad; Piron, Jean Pierre; Mathiolon, Caroline; Lenel, Nadjmah; Mondain, Michel; Uziel, Alain; Venail, Frederic
Transtympanic promontory stimulation test (TPST) has been suggested to be a useful tool in predicting postoperative outcomes in patients at risk of poor auditory neuron functioning, especially after a long auditory deprivation. However, only sparse data are available on this topic. This study aimed at showing correlations between the auditory nerve dynamic range, evaluated by TPST, the electrical dynamic range of the cochlear implant and speech perception outcome. We evaluated 65 patients with postlingual hearing loss and no residual hearing, implanted with a Nucleus CI24 cochlear implant device for at least 2 years and with a minimum of 17 active electrodes. Using the TPST, we measured the threshold for auditory perception (T-level) and the maximum acceptable level of stimulation (M-level) at stimulation frequencies of 50, 100 and 200 Hz. General linear regression was performed to correlate 1/speech perception, evaluated using the PBK test 1 year after surgery, and 2/cochlear implant electrical dynamic range, with the age at time of implantation, the duration of auditory deprivation, the etiology of the deafness, the duration of cochlear implant use and auditory nerve dynamic range. Postoperative speech perception outcome correlated with etiology, duration of auditory deprivation and implant use, and TPST at 100 and 200 Hz. The dynamic range of the cochlear implant map correlated with duration of auditory deprivation, speech perception outcome at 6 months and TPST at 100 and 200 Hz. TPST test can be used to predict functional outcome after cochlear implant surgery in difficult cases.
Wang, Hsiao-Lan S; Chen, I-Chen; Chiang, Chun-Han; Lai, Ying-Hui; Tsao, Yu
The current study examined the associations between basic auditory perception, speech prosodic processing, and vocabulary development in Chinese kindergartners, specifically, whether early basic auditory perception may be related to linguistic prosodic processing in Chinese Mandarin vocabulary acquisition. A series of language, auditory, and linguistic prosodic tests were given to 100 preschool children who had not yet learned how to read Chinese characters. The results suggested that lexical tone sensitivity and intonation production were significantly correlated with children's general vocabulary abilities. In particular, tone awareness was associated with comprehensive language development, whereas intonation production was associated with both comprehensive and expressive language development. Regression analyses revealed that tone sensitivity accounted for 36% of the unique variance in vocabulary development, whereas intonation production accounted for 6% of the variance in vocabulary development. Moreover, auditory frequency discrimination was significantly correlated with lexical tone sensitivity, syllable duration discrimination, and intonation production in Mandarin Chinese. Also it provided significant contributions to tone sensitivity and intonation production. Auditory frequency discrimination may indirectly affect early vocabulary development through Chinese speech prosody.
Yu, Luodi; Rao, Aparna; Zhang, Yang; Burton, Philip C.; Rishiq, Dania; Abrams, Harvey
Although audiovisual (AV) training has been shown to improve overall speech perception in hearing-impaired listeners, there has been a lack of direct brain imaging data to help elucidate the neural networks and neural plasticity associated with hearing aid (HA) use and auditory training targeting speechreading. For this purpose, the current clinical case study reports functional magnetic resonance imaging (fMRI) data from two hearing-impaired patients who were first-time HA users. During the study period, both patients used HAs for 8 weeks; only one received a training program named ReadMyQuipsTM (RMQ) targeting speechreading during the second half of the study period for 4 weeks. Identical fMRI tests were administered at pre-fitting and at the end of the 8 weeks. Regions of interest (ROI) including auditory cortex and visual cortex for uni-sensory processing, and superior temporal sulcus (STS) for AV integration, were identified for each person through independent functional localizer task. The results showed experience-dependent changes involving ROIs of auditory cortex, STS and functional connectivity between uni-sensory ROIs and STS from pretest to posttest in both cases. These data provide initial evidence for the malleable experience-driven cortical functionality for AV speech perception in elderly hearing-impaired people and call for further studies with a much larger subject sample and systematic control to fill in the knowledge gap to understand brain plasticity associated with auditory rehabilitation in the aging population. PMID:28270763
Studdert-Kennedy, Michael; Shankweiler, Donald; Pisoni, David
The distinction between auditory and phonetic processes in speech perception was used in the design and analysis of an experiment. Earlier studies had shown that dichotically presented stop consonants are more often identified correctly when they share place of production (e.g., /ba–pa/) or voicing (e.g., /ba–da/) than when neither feature is shared (e.g., /ba–ta/). The present experiment was intended to determine whether the effect has an auditory or a phonetic basis. Increments in performance due to feature-sharing were compared for synthetic stop-vowel syllables in which formant transitions were the sole cues to place of production under two experimental conditions: (1) when the vowel was the same for both syllables in a dichotic pair, as in our earlier studies, and (2) when the vowels differed. Since the increment in performance due to sharing place was not diminished when vowels differed (i.e., when formant transitions did not coincide), it was concluded that the effect has a phonetic rather than an auditory basis. Right ear advantages were also measured and were found to interact with both place of production and vowel conditions. Taken together, the two sets of results suggest that inhibition of the ipsilateral signal in the perception of dichotically presented speech occurs during phonetic analysis. PMID:23255833
Speech perception among cochlear implant (CI) listeners is highly variable. High degrees of channel interaction are associated with poorer speech understanding. Two methods for reducing channel interaction, focusing electrical fields, and deactivating subsets of channels were assessed by the change in vowel and consonant identification scores with different program settings. The main hypotheses were that (a) focused stimulation will improve phoneme recognition and (b) speech perception will improve when channels with high thresholds are deactivated. To select high-threshold channels for deactivation, subjects’ threshold profiles were processed to enhance the peaks and troughs, and then an exclusion or inclusion criterion based on the mean and standard deviation was used. Low-threshold channels were selected manually and matched in number and apex-to-base distribution. Nine ears in eight adult CI listeners with Advanced Bionics HiRes90k devices were tested with six experimental programs. Two, all-channel programs, (a) 14-channel partial tripolar (pTP) and (b) 14-channel monopolar (MP), and four variable-channel programs, derived from these two base programs, (c) pTP with high- and (d) low-threshold channels deactivated, and (e) MP with high- and (f) low-threshold channels deactivated, were created. Across subjects, performance was similar with pTP and MP programs. However, poorer performing subjects (scoring < 62% correct on vowel identification) tended to perform better with the all-channel pTP than with the MP program (1 > 2). These same subjects showed slightly more benefit with the reduced channel MP programs (5 and 6). Subjective ratings were consistent with performance. These finding suggest that reducing channel interaction may benefit poorer performing CI listeners. PMID:27317668
Mitterer, Holger; Müsseler, Jochen
We investigated the relation between action and perception in speech processing, using the shadowing task, in which participants repeat words they hear. In support of a tight perception-action link, previous work has shown that phonetic details in the stimulus influence the shadowing response. On the other hand, latencies do not seem to suffer if stimulus and response differ in their articulatory properties. The present investigation tested how perception influences production when participants are confronted with regional variation. Results showed that participants often imitate a regional variation if it occurs in the stimulus set but tend to stick to their variant if the stimuli are consistent. Participants were forced or induced to correct by the experimental instructions. Articulatory stimulus-response differences do not lead to latency costs. These data indicate that speech perception does not necessarily recruit the production system.
Hoffman, Ralph E
Auditory/verbal hallucinations (AVHs) are comprised of spoken conversational speech seeming to arise from specific, nonself speakers. One hertz repetitive transcranial magnetic stimulation (rTMS) reduces excitability in the brain region stimulated. Studies utilizing 1-Hz rTMS delivered to the left temporoparietal cortex, a brain area critical to speech perception, have demonstrated statistically significant improvements in AVHs relative to sham simulation. A novel mechanism of AVHs is proposed whereby dramatic pre-psychotic social withdrawal prompts neuroplastic reorganization by the "social brain" to produce spurious social meaning via hallucinations of conversational speech. Preliminary evidence supporting this hypothesis includes a very high rate of social withdrawal emerging prior to the onset of frank psychosis in patients who develop schizophrenia and AVHs. Moreover, reduced AVHs elicited by temporoparietal 1-Hz rTMS are likely to reflect enhanced long-term depression. Some evidence suggests a loss of long-term depression following experimentally-induced deafferentation. Finally, abnormal cortico-cortical coupling is associated with AVHs and also is a common outcome of deafferentation. Auditory/verbal hallucinations (AVHs) of spoken speech or "voices" are reported by 60-80% of persons with schizophrenia at various times during the course of illness. AVHs are associated with high levels of distress, functional disability, and can lead to violent acts. Among patients with AVHs, these symptoms remain poorly or incompletely responsive to currently available treatments in approximately 25% of cases. For patients with AVHs who do respond to antipsychotic drugs, there is a very high likelihood that these experiences will recur in subsequent episodes. A more precise characterization of underlying pathophysiology may lead to more efficacious treatments.
Callan, Daniel; Callan, Akiko; Jones, Jeffery A
Brain imaging studies indicate that speech motor areas are recruited for auditory speech perception, especially when intelligibility is low due to environmental noise or when speech is accented. The purpose of the present study was to determine the relative contribution of brain regions to the processing of speech containing phonetic categories from one's own language, speech with accented samples of one's native phonetic categories, and speech with unfamiliar phonetic categories. To that end, native English and Japanese speakers identified the speech sounds /r/ and /l/ that were produced by native English speakers (unaccented) and Japanese speakers (foreign-accented) while functional magnetic resonance imaging measured their brain activity. For native English speakers, the Japanese accented speech was more difficult to categorize than the unaccented English speech. In contrast, Japanese speakers have difficulty distinguishing between /r/ and /l/, so both the Japanese accented and English unaccented speech were difficult to categorize. Brain regions involved with listening to foreign-accented productions of a first language included primarily the right cerebellum, left ventral inferior premotor cortex PMvi, and Broca's area. Brain regions most involved with listening to a second-language phonetic contrast (foreign-accented and unaccented productions) also included the left PMvi and the right cerebellum. Additionally, increased activity was observed in the right PMvi, the left and right ventral superior premotor cortex PMvs, and the left cerebellum. These results support a role for speech motor regions during the perception of foreign-accented native speech and for perception of difficult second-language phonetic contrasts.
Rallapalli, Varsha H.; Alexander, Joshua M.
The Neural-Scaled Entropy (NSE) model quantifies information in the speech signal that has been altered beyond simple gain adjustments by sensorineural hearing loss (SNHL) and various signal processing. An extension of Cochlear-Scaled Entropy (CSE) [Stilp, Kiefte, Alexander, and Kluender (2010). J. Acoust. Soc. Am. 128(4), 2112–2126], NSE quantifies information as the change in 1-ms neural firing patterns across frequency. To evaluate the model, data from a study that examined nonlinear frequency compression (NFC) in listeners with SNHL were used because NFC can recode the same input information in multiple ways in the output, resulting in different outcomes for different speech classes. Overall, predictions were more accurate for NSE than CSE. The NSE model accurately described the observed degradation in recognition, and lack thereof, for consonants in a vowel-consonant-vowel context that had been processed in different ways by NFC. While NSE accurately predicted recognition of vowel stimuli processed with NFC, it underestimated them relative to a low-pass control condition without NFC. In addition, without modifications, it could not predict the observed improvement in recognition for word final /s/ and /z/. Findings suggest that model modifications that include information from slower modulations might improve predictions across a wider variety of conditions. PMID:26627780
Smiljanić, Rajka; Bradlow, Ann R.
This study investigated how native language background interacts with speaking style adaptations in determining levels of speech intelligibility. The aim was to explore whether native and high proficiency non-native listeners benefit similarly from native and non-native clear speech adjustments. The sentence-in-noise perception results revealed that fluent non-native listeners gained a large clear speech benefit from native clear speech modifications. Furthermore, proficient non-native talkers in this study implemented conversational-to-clear speaking style modifications in their second language (L2) that resulted in significant intelligibility gain for both native and non-native listeners. The results of the accentedness ratings obtained for native and non-native conversational and clear speech sentences showed that while intelligibility was improved, the presence of foreign accent remained constant in both speaking styles. This suggests that objective intelligibility and subjective accentedness are two independent dimensions of non-native speech. Overall, these results provide strong evidence that greater experience in L2 processing leads to improved intelligibility in both production and perception domains. These results also demonstrated that speaking style adaptations along with less signal distortion can contribute significantly towards successful native and non-native interactions. PMID:22225056
Natale, Ruby; Uhlhorn, Susan B.; Lopez-Mitnik, Gabriela; Camejo, Stephanie; Englebert, Nicole; Delamater, Alan M.; Messiah, Sarah E.
Background: One in four preschool-age children in the United States are currently overweight or obese. Previous studies have shown that caregivers of this age group often have difficulty accurately recognizing their child's weight status. The purpose of this study was to examine factors associated with accurate/inaccurate perception of child body…
Smalle, Eleonore H M; Rogers, Jack; Möttönen, Riikka
Recent studies using repetitive transcranial magnetic stimulation (TMS) have demonstrated that disruptions of the articulatory motor cortex impair performance in demanding speech perception tasks. These findings have been interpreted as support for the idea that the motor cortex is critically involved in speech perception. However, the validity of this interpretation has been called into question, because it is unknown whether the TMS-induced disruptions in the motor cortex affect speech perception or rather response bias. In the present TMS study, we addressed this question by using signal detection theory to calculate sensitivity (i.e., d') and response bias (i.e., criterion c). We used repetitive TMS to temporarily disrupt the lip or hand representation in the left motor cortex. Participants discriminated pairs of sounds from a "ba"-"da" continuum before TMS, immediately after TMS (i.e., during the period of motor disruption), and after a 30-min break. We found that the sensitivity for between-category pairs was reduced during the disruption of the lip representation. In contrast, disruption of the hand representation temporarily reduced response bias. This double dissociation indicates that the hand motor cortex contributes to response bias during demanding discrimination tasks, whereas the articulatory motor cortex contributes to perception of speech sounds.
This study of 41 children (ages 7 and 8) studied the effects of low socioeconomic status (SES) and chronic otitis media (OM) on speech perception and phonemic awareness. Findings indicated the children with low SES did poorly on both kinds of tasks whether or not they had chronic OM. (CR)
Gou, J.; Smith, J.; Valero, J.; Rubio, I.
This paper reports on a clinical trial evaluating outcomes of a frequency-lowering technique for adolescents and young adults with severe to profound hearing impairment. Outcomes were defined by changes in aided thresholds, speech perception, and acceptance. The participants comprised seven young people aged between 13 and 25 years. They were…
McMurray, Bob; Munson, Cheyenne; Tomblin, J. Bruce
Purpose: The authors examined speech perception deficits associated with individual differences in language ability, contrasting auditory, phonological, or lexical accounts by asking whether lexical competition is differentially sensitive to fine-grained acoustic variation. Method: Adolescents with a range of language abilities (N = 74, including…
Zhang, Juan; McBride-Chang, Catherine
A 4-stage developmental model, in which auditory sensitivity is fully mediated by speech perception at both the segmental and suprasegmental levels, which are further related to word reading through their associations with phonological awareness, rapid automatized naming, verbal short-term memory and morphological awareness, was tested with…
Chiappe, Penny; Glaeser, Barbara; Ferko, Doreen
This study examined the roles of speech perception and phonological processing in reading and spelling acquisition for native and nonnative speakers of English in the 1st grade. The performance of 50 children (23 native English speakers and 27 native Korean speakers) was examined on tasks assessing reading and spelling, phonological processing,…
Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…
Kummerer, Sharon E.; Lopez-Reyna, Norma A.; Hughes, Marie Tejero
Purpose: This qualitative study explored mothers' perceptions of their children's communication disabilities, emergent literacy development, and speech-language therapy programs. Method: Participants were 14 Mexican immigrant mothers and their children (age 17-47 months) who were receiving center-based services from an early childhood intervention…
Shi, Lu-Feng; Doherty, Karen A.
Purpose: The purpose of the current study was to assess the effect of fast and slow attack/release times (ATs/RTs) on aided perception of reverberant speech in quiet. Method: Thirty listeners with mild-to-moderate sensorineural hearing loss were tested monaurally with a commercial hearing aid programmed in 3 AT/RT settings: linear, fast (AT = 9…
Most, Tova; Rothem, Hilla; Luntz, Michal
The researchers evaluated the contribution of cochlear implants (CIs) to speech perception by a sample of prelingually deaf individuals implanted after age 8 years. This group was compared with a group with profound hearing impairment (HA-P), and with a group with severe hearing impairment (HA-S), both of which used hearing aids. Words and…
Weir, Kristy A.
Speech pathology students readily identify the importance of a sound understanding of anatomical structures central to their intended profession. In contrast, they often do not recognize the relevance of a broader understanding of structure and function. This study aimed to explore students' perceptions of the relevance of anatomy to speech…
Scott, Sophie K.
Our understanding of the neurobiological basis for human speech production and perception has benefited from insights from psychology, neuropsychology and neurology. In this overview, I outline some of the ways that functional imaging has added to this knowledge and argue that, as a neuroanatomical tool, functional imaging has led to some…
Conboy, Barbara T.; Kuhl, Patricia K.
Language experience "narrows" speech perception by the end of infants' first year, reducing discrimination of non-native phoneme contrasts while improving native-contrast discrimination. Previous research showed that declines in non-native discrimination were reversed by second-language experience provided at 9-10 months, but it is not known…
Smalle, Eleonore H. M.; Rogers, Jack; Möttönen, Riikka
Recent studies using repetitive transcranial magnetic stimulation (TMS) have demonstrated that disruptions of the articulatory motor cortex impair performance in demanding speech perception tasks. These findings have been interpreted as support for the idea that the motor cortex is critically involved in speech perception. However, the validity of this interpretation has been called into question, because it is unknown whether the TMS-induced disruptions in the motor cortex affect speech perception or rather response bias. In the present TMS study, we addressed this question by using signal detection theory to calculate sensitivity (i.e., d′) and response bias (i.e., criterion c). We used repetitive TMS to temporarily disrupt the lip or hand representation in the left motor cortex. Participants discriminated pairs of sounds from a “ba”–“da” continuum before TMS, immediately after TMS (i.e., during the period of motor disruption), and after a 30-min break. We found that the sensitivity for between-category pairs was reduced during the disruption of the lip representation. In contrast, disruption of the hand representation temporarily reduced response bias. This double dissociation indicates that the hand motor cortex contributes to response bias during demanding discrimination tasks, whereas the articulatory motor cortex contributes to perception of speech sounds. PMID:25274987
Language development in infants born very preterm is often compromised. Poor language skills have been described in preschoolers and differences between preterms and full terms, relative to early vocabulary size and morphosyntactical complexity, have also been identified. However, very few data are available concerning early speech perception abilities and their predictive value for later language outcomes. An overview of the results obtained in a prospective study exploring the link between early speech perception abilities and lexical development in the second year of life in a population of very preterm infants (≤32 gestation weeks) is presented. Specifically, behavioral measures relative to (a) native-language recognition and discrimination from a rhythmically distant and a rhythmically close nonfamiliar languages, and (b) monosyllabic word-form segmentation, were obtained and compared to data from full-term infants. Expressive vocabulary at two test ages (12 and 18 months, corrected age for gestation) was measured using the MacArthur Communicative Development Inventory. Behavioral results indicated that differences between preterm and control groups were present, but only evident when task demands were high in terms of language processing, selective attention to relevant information and memory load. When responses could be based on acquired knowledge from accumulated linguistic experience, between-group differences were no longer observed. Critically, while preterm infants responded satisfactorily to the native-language recognition and discrimination tasks, they clearly differed from full-term infants in the more challenging activity of extracting and retaining word-form units from fluent speech, a fundamental ability for starting to building a lexicon. Correlations between results from the language discrimination tasks and expressive vocabulary measures could not be systematically established. However, attention time to novel words in the word segmentation
Ordin, Mikhail; Polyanskaya, Leona
We investigated the perception of developmental changes in timing patterns that happen in the course of second language (L2) acquisition, provided that the native and the target languages of the learner are rhythmically similar (German and English). It was found that speech rhythm in L2 English produced by German learners becomes increasingly stress-timed as acquisition progresses. This development is captured by the tempo-normalized rhythm measures of durational variability. Advanced learners also deliver speech at a faster rate. However, when native speakers have to classify the timing patterns characteristic of L2 English of German learners at different proficiency levels, they attend to speech rate cues and ignore the differences in speech rhythm. PMID:25859228
Zheng, Zane Z.; Munhall, Kevin G.; Johnsrude, Ingrid S.
The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or…
Shafiro, Valeriy; Sheft, Stanley; Risley, Robert
Perception of interrupted speech and the influence of speech materials and memory load were investigated using one or two concurrent square-wave gating functions. Sentences (Experiment 1) and random one-, three-, and five-word sequences (Experiment 2) were interrupted using either a primary gating rate alone (0.5−24 Hz) or a combined primary and faster secondary rate. The secondary rate interrupted only speech left intact after primary gating, reducing the original speech to 25%. In both experiments, intelligibility increased with primary rate, but varied with memory load and speech material (highest for sentences, lowest for five-word sequences). With dual-rate gating of sentences, intelligibility with fast secondary rates was superior to that with single rates and a 25% duty cycle, approaching that of single rates with a 50% duty cycle for some low and high rates. For dual-rate gating of words, the positive effect of fast secondary gating was smaller than for sentences, and the advantage of sentences over word-sequences was not obtained in many dual-rate conditions. These findings suggest that integration of interrupted speech fragments after gating depends on the duration of the gated speech interval and that sufficiently robust acoustic-phonetic word cues are needed to access higher-level contextual sentence information. PMID:21973362
Talamas, Sean N; Mavor, Kenneth I; Perrett, David I
Despite the old adage not to 'judge a book by its cover', facial cues often guide first impressions and these first impressions guide our decisions. Literature suggests there are valid facial cues that assist us in assessing someone's health or intelligence, but such cues are overshadowed by an 'attractiveness halo' whereby desirable attributions are preferentially ascribed to attractive people. The impact of the attractiveness halo effect on perceptions of academic performance in the classroom is concerning as this has shown to influence students' future performance. We investigated the limiting effects of the attractiveness halo on perceptions of actual academic performance in faces of 100 university students. Given the ambiguity and various perspectives on the definition of intelligence and the growing consensus on the importance of conscientiousness over intelligence in predicting actual academic performance, we also investigated whether perceived conscientiousness was a more accurate predictor of academic performance than perceived intelligence. Perceived conscientiousness was found to be a better predictor of actual academic performance when compared to perceived intelligence and perceived academic performance, and accuracy was improved when controlling for the influence of attractiveness on judgments. These findings emphasize the misleading effect of attractiveness on the accuracy of first impressions of competence, which can have serious consequences in areas such as education and hiring. The findings also have implications for future research investigating impression accuracy based on facial stimuli.
Talamas, Sean N.; Mavor, Kenneth I.; Perrett, David I.
Despite the old adage not to ‘judge a book by its cover’, facial cues often guide first impressions and these first impressions guide our decisions. Literature suggests there are valid facial cues that assist us in assessing someone’s health or intelligence, but such cues are overshadowed by an ‘attractiveness halo’ whereby desirable attributions are preferentially ascribed to attractive people. The impact of the attractiveness halo effect on perceptions of academic performance in the classroom is concerning as this has shown to influence students’ future performance. We investigated the limiting effects of the attractiveness halo on perceptions of actual academic performance in faces of 100 university students. Given the ambiguity and various perspectives on the definition of intelligence and the growing consensus on the importance of conscientiousness over intelligence in predicting actual academic performance, we also investigated whether perceived conscientiousness was a more accurate predictor of academic performance than perceived intelligence. Perceived conscientiousness was found to be a better predictor of actual academic performance when compared to perceived intelligence and perceived academic performance, and accuracy was improved when controlling for the influence of attractiveness on judgments. These findings emphasize the misleading effect of attractiveness on the accuracy of first impressions of competence, which can have serious consequences in areas such as education and hiring. The findings also have implications for future research investigating impression accuracy based on facial stimuli. PMID:26885976
Simpson, C. A.; Hart, S. G.
The study evaluates the attention required for synthesized speech perception with reference to three levels of linguistic redundancy. Twelve commercial airline pilots were individually tested for 16 cockpit warning messages eight of which consisted of two monosyllabic key words and eight of which consisted of two polysyllabic key words. Three levels of linguistic redundancy were identified: monosyllabic words, polysyllabic words, and sentences. The experiment contained a message familiarization phase and a message recognition phase. It was found that: (1) when the messages are part of a previously learned and recently heard set, and the subject is familiar with the phrasing, the attention needed to recognize the message is not a function of the level of linguistic redundancy, and (2) there is a quantitative and qualitative difference between recognition and comprehension processes; only in the case of active comprehension does additional redundancy reduce attention requirements.
Abdala, Carolina; Dhar, Sumitrajit; Ahmadi, Mahnaz; Luo, Ping
The medial olivocochlear reflex (MOCR) modulates cochlear amplifier gain and is thought to facilitate the detection of signals in noise. High-resolution distortion product otoacoustic emissions (DPOAEs) were recorded in teens, young, middle-aged, and elderly adults at moderate levels using primary tones swept from 0.5 to 4 kHz with and without a contralateral acoustic stimulus (CAS) to elicit medial efferent activation. Aging effects on magnitude and phase of the 2f1-f2 DPOAE and on its components were examined, as was the link between speech-in-noise performance and MOCR strength. Results revealed a mild aging effect on the MOCR through middle age for frequencies below 1.5 kHz. Additionally, positive correlations were observed between strength of the MOCR and performance on select measures of speech perception parsed into features. The elderly group showed unexpected results including relatively large effects of CAS on DPOAE, and CAS-induced increases in DPOAE fine structure as well as increases in the amplitude and phase accumulation of DPOAE reflection components. Contamination of MOCR estimates by middle ear muscle contractions cannot be ruled out in the oldest subjects. The findings reiterate that DPOAE components should be unmixed when measuring medial efferent effects to better consider and understand these potential confounds. PMID:25234884
Välimaa, T T; Sorri, M J
This study was done to survey the effect of cochlear implantation on hearing level, speech perception and listening performance in Finnish-speaking adults. The subjects of the study comprise 67 adults. Pure-tone thresholds (0.125-8 kHz), word recognition and listening performance were studied before and after implantation. After switch-on of the implant, the median values of PTA(0.5-4 kHz) in the sound field were fairly stable across the evaluation period. Three months after switch-on of the implant, the mean word recognition score was 54%. There was clear improvement in the mean word recognition scores over a longer period of time, the mean score being 71% 24 months after switch-on. Six months after switch-on, the majority of subjects (40/48) were able to recognize some speech without speechreading, and 26 of these 48 subjects were able to use the telephone with a known speaker, gaining good functional benefit from the implantation.
Weisenberger, J M; Percy, M E
Current research on the effectiveness of tactile aids for speech perception by hearing-impaired persons suggests that substantial training, lasting over months or years, is necessary for users to achieve maximal benefits from a tactile device. A number of studies have demonstrated the usefulness of training programs that include an analytic component, such as phoneme training, together with more synthetic tasks such as sentence identification and speech tracking. However, particularly in programs for children, it is desirable to structure training experiences so that easy distinctions are trained first, and more difficult distinctions are approached only later in training. In the present study, a systematic evaluation of phoneme-level information provided by the Tactaid VII, a multichannel tactile aid, was performed. Adult subjects were tested in minimal pairs and closed set phoneme discrimination and identification tasks under tactile aid alone, speechreading alone, and speechreading plus tactile aid conditions, to provide an inventory of stimulus identifiability and permit ranking of discriminations as easy or more difficult. Because these rankings might differ as a function of coarticulation effects, three different vowel contexts were tested for consonant stimuli. Results indicated that there were indeed considerable differences across vowel contexts, and that the /ae/ vowel context yielded the most identifiable stimuli. These data could be used by teachers and therapists to construct viable stimulus sets for training programs for tactile aid users.
Jenson, David; Bowers, Andrew L.; Harkrider, Ashley W.; Thornton, David; Cuellar, Megan; Saltuklaroglu, Tim
Activity in anterior sensorimotor regions is found in speech production and some perception tasks. Yet, how sensorimotor integration supports these functions is unclear due to a lack of data examining the timing of activity from these regions. Beta (~20 Hz) and alpha (~10 Hz) spectral power within the EEG μ rhythm are considered indices of motor and somatosensory activity, respectively. In the current study, perception conditions required discrimination (same/different) of syllables pairs (/ba/ and /da/) in quiet and noisy conditions. Production conditions required covert and overt syllable productions and overt word production. Independent component analysis was performed on EEG data obtained during these conditions to (1) identify clusters of μ components common to all conditions and (2) examine real-time event-related spectral perturbations (ERSP) within alpha and beta bands. 17 and 15 out of 20 participants produced left and right μ-components, respectively, localized to precentral gyri. Discrimination conditions were characterized by significant (pFDR < 0.05) early alpha event-related synchronization (ERS) prior to and during stimulus presentation and later alpha event-related desynchronization (ERD) following stimulus offset. Beta ERD began early and gained strength across time. Differences were found between quiet and noisy discrimination conditions. Both overt syllable and word productions yielded similar alpha/beta ERD that began prior to production and was strongest during muscle activity. Findings during covert production were weaker than during overt production. One explanation for these findings is that μ-beta ERD indexes early predictive coding (e.g., internal modeling) and/or overt and covert attentional/motor processes. μ-alpha ERS may index inhibitory input to the premotor cortex from sensory regions prior to and during discrimination, while μ-alpha ERD may index sensory feedback during speech rehearsal and production. PMID:25071633
Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann
During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream--prior to its fusion with auditory phonological features [Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. Time course of early audiovisual interactions during speech and non-speech central-auditory processing: An MEG study. Journal of Cognitive Neuroscience, 21, 259-274, 2009]. Using functional magnetic resonance imaging, the present follow-up study aims to further elucidate the topographic distribution of visual-phonological operations and audiovisual (AV) interactions during speech perception. Ambiguous acoustic syllables--disambiguated to /pa/ or /ta/ by the visual channel (speaking face)--served as test materials, concomitant with various control conditions (nonspeech AV signals, visual-only and acoustic-only speech, and nonspeech stimuli). (i) Visual speech yielded an AV-subadditive activation of primary auditory cortex and the anterior superior temporal gyrus (STG), whereas the posterior STG responded both to speech and nonspeech motion. (ii) The inferior frontal and the fusiform gyrus of the right hemisphere showed a strong phonetic/phonological impact (differential effects of visual /pa/ vs. /ta/) upon hemodynamic activation during presentation of speaking faces. Taken together with the previous MEG data, these results point at a dual-pathway model of visual speech information processing: On the one hand, access to the auditory system via the anterior supratemporal “what" path may give rise to direct activation of "auditory objects." On the other hand, visual speech information seems to be represented in a right-hemisphere visual working memory, providing a potential basis for later interactions with auditory information such as the McGurk effect.
Kreft, Heather A.
Under normal conditions, human speech is remarkably robust to degradation by noise and other distortions. However, people with hearing loss, including those with cochlear implants, often experience great difficulty in understanding speech in noisy environments. Recent work with normal-hearing listeners has shown that the amplitude fluctuations inherent in noise contribute strongly to the masking of speech. In contrast, this study shows that speech perception via a cochlear implant is unaffected by the inherent temporal fluctuations of noise. This qualitative difference between acoustic and electric auditory perception does not seem to be due to differences in underlying temporal acuity but can instead be explained by the poorer spectral resolution of cochlear implants, relative to the normally functioning ear, which leads to an effective smoothing of the inherent temporal-envelope fluctuations of noise. The outcome suggests an unexpected trade-off between the detrimental effects of poorer spectral resolution and the beneficial effects of a smoother noise temporal envelope. This trade-off provides an explanation for the long-standing puzzle of why strong correlations between speech understanding and spectral resolution have remained elusive. The results also provide a potential explanation for why cochlear-implant users and hearing-impaired listeners exhibit reduced or absent masking release when large and relatively slow temporal fluctuations are introduced in noise maskers. The multitone maskers used here may provide an effective new diagnostic tool for assessing functional hearing loss and reduced spectral resolution. PMID:25315376
Oxenham, Andrew J; Kreft, Heather A
Under normal conditions, human speech is remarkably robust to degradation by noise and other distortions. However, people with hearing loss, including those with cochlear implants, often experience great difficulty in understanding speech in noisy environments. Recent work with normal-hearing listeners has shown that the amplitude fluctuations inherent in noise contribute strongly to the masking of speech. In contrast, this study shows that speech perception via a cochlear implant is unaffected by the inherent temporal fluctuations of noise. This qualitative difference between acoustic and electric auditory perception does not seem to be due to differences in underlying temporal acuity but can instead be explained by the poorer spectral resolution of cochlear implants, relative to the normally functioning ear, which leads to an effective smoothing of the inherent temporal-envelope fluctuations of noise. The outcome suggests an unexpected trade-off between the detrimental effects of poorer spectral resolution and the beneficial effects of a smoother noise temporal envelope. This trade-off provides an explanation for the long-standing puzzle of why strong correlations between speech understanding and spectral resolution have remained elusive. The results also provide a potential explanation for why cochlear-implant users and hearing-impaired listeners exhibit reduced or absent masking release when large and relatively slow temporal fluctuations are introduced in noise maskers. The multitone maskers used here may provide an effective new diagnostic tool for assessing functional hearing loss and reduced spectral resolution.
Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa
To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038
D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette
Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development.
Megnin-Viggars, Odette; Goswami, Usha
Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and temporal modulations in the 2-7 Hz range are of particular importance. Dyslexic individuals have specific problems in perceiving speech envelope cues. In the current study, we used an audiovisual noise-vocoded speech task to investigate the contribution of low-frequency visual information to intelligibility of 4-channel and 16-channel noise vocoded speech in participants with and without dyslexia. For the 4-channel speech, noise vocoding preserves amplitude information that is entirely congruent with dynamic visual information. All participants were significantly more accurate with 4-channel speech when visual information was present, even when this information was purely spatio-temporal (pixelated stimuli changing in luminance). Possible underlying mechanisms are discussed.
Pottackal Mathai, Jijo; Mohammed, Hasheem
To investigate the effect of compression time settings and presentation levels on speech perception in noise for elderly individuals with hearing loss. To compare aided speech perception performance in these individuals with age-matched normal hearing subjects. Twenty (normal hearing) participants within the age range of 60-68 years and 20 (mild-to-moderate sensorineural hearing loss) in the age range of 60-70 years were randomly recruited for the study. In the former group, SNR-50 was determined using phonetically balanced sentences that were mixed with speech-shaped noise presented at the most comfortable level. In the SNHL group, aided SNR-50 was determined at three different presentation levels (40, 60, and 80 dB HL) after fitting binaural hearing aids that had different compression time settings (fast and slow). In the SNHL group, slow compression time settings showed significantly better SNR-50 compared to fast release time. In addition, the mean of SNR-50 in the SNHL group was comparable to normal hearing participants while using a slow release time. A hearing aid with slow compression time settings led to significantly better speech perception in noise, compared to that of a hearing aid that had fast compression time settings.
Weir, Kristy A
Speech pathology students readily identify the importance of a sound understanding of anatomical structures central to their intended profession. In contrast, they often do not recognize the relevance of a broader understanding of structure and function. This study aimed to explore students' perceptions of the relevance of anatomy to speech pathology. The effect of two learning activities on students' perceptions was also evaluated. First, a written assignment required students to illustrate the relevance of anatomy to speech pathology by using an example selected from one of the four alternative structures. The second approach was the introduction of brief "scenarios" with directed questions into the practical class. The effects of these activities were assessed via two surveys designed to evaluate students' perceptions of the relevance of anatomy before and during the course experience. A focus group was conducted to clarify and extend discussion of issues arising from the survey data. The results showed that the students perceived some course material as irrelevant to speech pathology. The importance of relevance to the students' "state" motivation was well supported by the data. Although the students believed that the learning activities helped their understanding of the relevance of anatomy, some structures were considered less relevant at the end of the course. It is likely that the perceived amount of content and surface approach to learning may have prevented students from "thinking outside the box" regarding which anatomical structures are relevant to the profession.
van Besouw, Rachel M; Forrester, Lisa; Crowe, Nicholas D; Rowan, Daniel
A bilateral advantage for diotically presented stimuli has been observed for cochlear implant (CI) users and is suggested to be dependent on symmetrical implant performance. Studies using CI simulations have not shown a true "bilateral" advantage, but a "better ear" effect and have demonstrated that performance decreases with increasing basalward shift in insertion depth. This study aimed to determine whether there is a bilateral advantage for CI simulations with interaurally matched insertions and the extent to which performance is affected by interaural insertion depth mismatch. Speech perception in noise and self-reported ease of listening were measured using matched bilateral, mismatched bilateral and unilateral CI simulations over four insertion depths for seventeen normal hearing listeners. Speech scores and ease of listening reduced with increasing basalward shift in (interaurally matched) insertion depth. A bilateral advantage for speech perception was only observed when the insertion depths were interaurally matched and deep. No advantage was observed for small to moderate interaural insertion-depth mismatches, consistent with a better ear effect. Finally, both measures were poorer than expected for a better ear effect for large mismatches, suggesting that misalignment of the electrode arrays may prevent a bilateral advantage and detrimentally affect perception of diotically presented speech.
Gjaja, Marin N.
Neural networks for supervised and unsupervised learning are developed and applied to problems in remote sensing, continuous map learning, and speech perception. Adaptive Resonance Theory (ART) models are real-time neural networks for category learning, pattern recognition, and prediction. Unsupervised fuzzy ART networks synthesize fuzzy logic and neural networks, and supervised ARTMAP networks incorporate ART modules for prediction and classification. New ART and ARTMAP methods resulting from analyses of data structure, parameter specification, and category selection are developed. Architectural modifications providing flexibility for a variety of applications are also introduced and explored. A new methodology for automatic mapping from Landsat Thematic Mapper (TM) and terrain data, based on fuzzy ARTMAP, is developed. System capabilities are tested on a challenging remote sensing problem, prediction of vegetation classes in the Cleveland National Forest from spectral and terrain features. After training at the pixel level, performance is tested at the stand level, using sites not seen during training. Results are compared to those of maximum likelihood classifiers, back propagation neural networks, and K-nearest neighbor algorithms. Best performance is obtained using a hybrid system based on a convex combination of fuzzy ARTMAP and maximum likelihood predictions. This work forms the foundation for additional studies exploring fuzzy ARTMAP's capability to estimate class mixture composition for non-homogeneous sites. Exploratory simulations apply ARTMAP to the problem of learning continuous multidimensional mappings. A novel system architecture retains basic ARTMAP properties of incremental and fast learning in an on-line setting while adding components to solve this class of problems. The perceptual magnet effect is a language-specific phenomenon arising early in infant speech development that is characterized by a warping of speech sound perception. An
Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M
Cerebral mechanisms of musical abilities were explored in musically gifted children. For this purpose, psychophysiological characteristics of perception of emotional speech information were experimentally studied in samples of gifted and ordinary children. Forty six schoolchildren and forty eight musicians of three age groups (7-10, 11-13 and 14-17 years old) participated in the study. In experimental session, a test sentence was presented to a subject through headphones with two emotional intonations (joy and anger) and without emotional expression. A subject had to recognize the type of emotion. His/her answers were recorded. The analysis of variance revealed age- and gender-related features of emotional recognition: boys musicians led the schoolchildren of the same age by 4-6 years in the development of mechanisms of emotional recognition, whereas girls musicians were 1-3 years ahead. Musical education in girls induced the shift of predominant activities for emotional perception in the left hemisphere; in boys, on the contrary, initial distinct dominance of the left hemisphere was not retained in the process of further education.
Hertrich, Ingo; Kirsten, Mareike; Tiemann, Sonja; Beck, Sigrid; Wühle, Anja; Ackermann, Hermann; Rolke, Bettina
Discourse structure enables us to generate expectations based upon linguistic material that has already been introduced. The present magnetoencephalography (MEG) study addresses auditory perception of test sentences in which discourse coherence was manipulated by using presuppositions (PSP) that either correspond or fail to correspond to items in preceding context sentences with respect to uniqueness and existence. Context violations yielded delayed auditory M50 and enhanced auditory M200 cross-correlation responses to syllable onsets within an analysis window of 1.5s following the PSP trigger words. Furthermore, discourse incoherence yielded suppression of spectral power within an expanded alpha band ranging from 6 to 16Hz. This effect showed a bimodal temporal distribution, being significant in an early time window of 0.0-0.5s following the PSP trigger and a late interval of 2.0-2.5s. These findings indicate anticipatory top-down mechanisms interacting with various aspects of bottom-up processing during speech perception.
Jane, Griselda; Tunjungsari, Harini
Parental involvement in a speech therapy has not been prioritized in most therapy centers in Indonesia. One of the therapy centers that has recognized the importance of parental involvement is Kailila Speech Therapy Center. In Kailila speech therapy center, parental involvement in children's speech therapy is an obligation that has been…
Weikum, Whitney M.; Oberlander, Tim F.; Hensch, Takao K.; Werker, Janet F.
Language acquisition reflects a complex interplay between biology and early experience. Psychotropic medication exposure has been shown to alter neural plasticity and shift sensitive periods in perceptual development. Notably, serotonin reuptake inhibitors (SRIs) are antidepressant agents increasingly prescribed to manage antenatal mood disorders, and depressed maternal mood per se during pregnancy impacts infant behavior, also raising concerns about long-term consequences following such developmental exposure. We studied whether infants’ language development is altered by prenatal exposure to SRIs and whether such effects differ from exposure to maternal mood disturbances. Infants from non–SRI-treated mothers with little or no depression (control), depressed but non–SRI-treated (depressed-only), and depressed and treated with an SRI (SRI-exposed) were studied at 36 wk gestation (while still in utero) on a consonant and vowel discrimination task and at 6 and 10 mo of age on a nonnative speech and visual language discrimination task. Whereas the control infants responded as expected (success at 6 mo and failure at 10 mo) the SRI-exposed infants failed to discriminate the language differences at either age and the depressed-only infants succeeded at 10 mo instead of 6 mo. Fetuses at 36 wk gestation in the control condition performed as expected, with a response on vowel but not consonant discrimination, whereas the SRI-exposed fetuses showed accelerated perceptual development by discriminating both vowels and consonants. Thus, prenatal depressed maternal mood and SRI exposure were found to shift developmental milestones bidirectionally on infant speech perception tasks. PMID:23045665
Schmidt, Juliane; Janse, Esther; Scharenborg, Odette
This study investigated whether age and/or differences in hearing sensitivity influence the perception of the emotion dimensions arousal (calm vs. aroused) and valence (positive vs. negative attitude) in conversational speech. To that end, this study specifically focused on the relationship between participants’ ratings of short affective utterances and the utterances’ acoustic parameters (pitch, intensity, and articulation rate) known to be associated with the emotion dimensions arousal and valence. Stimuli consisted of short utterances taken from a corpus of conversational speech. In two rating tasks, younger and older adults either rated arousal or valence using a 5-point scale. Mean intensity was found to be the main cue participants used in the arousal task (i.e., higher mean intensity cueing higher levels of arousal) while mean F0 was the main cue in the valence task (i.e., higher mean F0 being interpreted as more negative). Even though there were no overall age group differences in arousal or valence ratings, compared to younger adults, older adults responded less strongly to mean intensity differences cueing arousal and responded more strongly to differences in mean F0 cueing valence. Individual hearing sensitivity among the older adults did not modify the use of mean intensity as an arousal cue. However, individual hearing sensitivity generally affected valence ratings and modified the use of mean F0. We conclude that age differences in the interpretation of mean F0 as a cue for valence are likely due to age-related hearing loss, whereas age differences in rating arousal do not seem to be driven by hearing sensitivity differences between age groups (as measured by pure-tone audiometry). PMID:27303340
Sheft, Stanley; Shafiro, Valeriy; Lorenzi, Christian; McMullen, Rachel; Farrell, Caitlin
Objective The frequency modulation (FM) of speech can convey linguistic information and also enhance speech-stream coherence and segmentation. Using a clinically oriented approach, the purpose of the present study was to examine the effects of age and hearing loss on the ability to discriminate between stochastic patterns of low-rate FM and determine whether difficulties in speech perception experienced by older listeners relate to a deficit in this ability. Design Data were collected from 18 normal-hearing young adults, and 18 participants who were at least 60 years old, nine normal-hearing and nine with a mild-to-moderate sensorineural hearing loss. Using stochastic frequency modulators derived from 5-Hz lowpass noise applied to a 1-kHz carrier, discrimination thresholds were measured in terms of frequency excursion (ΔF) both in quiet and with a speech-babble masker present, stimulus duration, and signal-to-noise ratio (SNRFM) in the presence of a speech-babble masker. Speech perception ability was evaluated using Quick Speech-in-Noise (QuickSIN) sentences in four-talker babble. Results Results showed a significant effect of age, but not of hearing loss among the older listeners, for FM discrimination conditions with masking present (ΔF and SNRFM). The effect of age was not significant for the FM measures based on stimulus duration. ΔF and SNRFM were also the two conditions for which performance was significantly correlated with listener age when controlling for effect of hearing loss as measured by pure-tone average. With respect to speech-in-noise ability, results from the SNRFM condition were significantly correlated with QuickSIN performance. Conclusions Results indicate that aging is associated with reduced ability to discriminate moderate-duration patterns of low-rate stochastic FM. Furthermore, the relationship between QuickSIN performance and the SNRFM thresholds suggests that the difficulty experienced by older listeners with speech
Ramirez, Joshua; Mann, Virginia
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.
Ramirez, Joshua; Mann, Virginia
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.
Venezia, Jonathan H; Fillmore, Paul; Matchin, William; Isenberg, A Lisette; Hickok, Gregory; Fridriksson, Julius
Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development.
Weitzman, Raymond S.
A major focus of research on language acquisition in infancy involves experimental studies of the infant's ability to discriminate various kinds of speech or speech-like stimuli. This research has demonstrated that infants are sensitive to many fine-grained differences in the acoustic properties of speech utterance. Furthermore, these empirical…
Venezia, Jonathan H.; Fillmore, Paul; Matchin, William; Isenberg, A. Lisette; Hickok, Gregory; Fridriksson, Julius
Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development. PMID:26608242
Gow, David W.; Segawa, Jennifer A.; Ahlfors, Seppo P.; Lin, Fa-Hsuan
Behavioural and functional imaging studies have demonstrated that lexical knowledge influences the categorization of perceptually ambiguous speech sounds. However, methodological and inferential constraints have so far been unable to resolve the question of whether this interaction takes the form of direct top-down influences on perceptual processing, or feedforward convergence during a decision process. We examined top-down lexical influences on the categorization of segments in a /s/−/∫/ continuum presented in different lexical contexts to produce a robust Ganong effect. Using integrated MEG/EEG and MRI data we found that, within a network identified by 40Hz gamma phase locking, activation in the supramarginal gyrus associated with wordform representation influences phonetic processing in the posterior superior temporal gyrus during a period of time associated with lexical processing. This result provides direct evidence that lexical processes influence lower level phonetic perception, and demonstrates the potential value of combining Granger causality analyses and high spatiotemporal resolution multimodal imaging data to explore the functional architecture of cognition. PMID:18703146
Tajima, Keiichi; Akahane-Yamada, Reiko
Past studies on second-language (L2) speech perception have suggested that L2 learners have difficulty exploiting contextual information when perceiving L2 utterances, and that they exhibit greater difficulty than native listeners when faced with variability in temporal context. The present study investigated the extent to which native Japanese listeners, who are known to have difficulties perceiving English syllables, are influenced by changes in speaking rate when asked to count syllables in spoken English words. The stimuli consisted of a set of English words and nonwords varying in syllable structure spoken at three rates by a native English speaker. The stimuli produced at the three rates were presented to native Japanese listeners in a random order. Results indicated that listeners' identification accuracy did not vary as a function of speaking rate, although it decreased significantly as the syllable structure of the stimuli became more complex. Moreover, even though speaking rate varied from trial to trial, Japanese listeners' performance did not decline compared to a condition in which the speaking rate was fixed. Theoretical and practical implications of these findings will be discussed. [Work supported by JSPS and NICT.
Mackersie, C L; Boothroyd, A; Minniear, D
Interlist equivalency and short-term practice effects were evaluated for the recorded stimuli of the Computer-Assisted Speech Perception Assessment (CASPA) Test. Twenty lists, each consisting of 10 consonant-vowel-consonant words, were administered to 20 adults with normal hearing. The lists were presented at 50 dB SPL (Leq) in the presence of spectrally matched steady-state noise (55 dB SPL Leq). Phoneme recognition scores for the first list presented were significantly lower than for the second through the twentieth list presented, indicating a small practice effect. Phoneme scores for 4 of the lists (3, 6, 7, and 16) were significantly higher than scores for the remaining 16 lists by approximately 10 percentage points. Eliminating the effects of interlist differences reduced the 95 percent confidence interval of a test score based on a single list from 18.4 to 16.1 percentage points. Although interlist differences have only a small effect on confidence limits, some clinicians may wish to eliminate them by excluding lists 3, 6, 7, and 16 from the test. The practice effect observed here can be eliminated by administering one 10-word practice list before beginning the test.
Murakami, Takenobu; Restle, Julia; Ziemann, Ulf
A left-hemispheric cortico-cortical network involving areas of the temporoparietal junction (Tpj) and the posterior inferior frontal gyrus (pIFG) is thought to support sensorimotor integration of speech perception into articulatory motor activation, but how this network links with the lip area of the primary motor cortex (M1) during speech perception is unclear. Using paired-coil focal transcranial magnetic stimulation (TMS) in healthy subjects, we demonstrate that Tpj→M1 and pIFG→M1 effective connectivity increased when listening to speech compared to white noise. A virtual lesion induced by continuous theta-burst TMS (cTBS) of the pIFG abolished the task-dependent increase in pIFG→M1 but not Tpj→M1 effective connectivity during speech perception, whereas cTBS of Tpj abolished the task-dependent increase of both effective connectivities. We conclude that speech perception enhances effective connectivity between areas of the auditory dorsal stream and M1. Tpj is situated at a hierarchically high level, integrating speech perception into motor activation through the pIFG.
Purdy, Suzanne C; Kelly, Andrea S
Speech perception varies widely across cochlear implant (CI) users and typically improves over time after implantation. There is also some evidence for improved auditory evoked potentials (shorter latencies, larger amplitudes) after implantation but few longitudinal studies have examined the relationship between behavioral and evoked potential measures after implantation in postlingually deaf adults. The relationship between speech perception and auditory evoked potentials was investigated in newly implanted cochlear implant users from the day of implant activation to 9 months postimplantation, on five occasions, in 10 adults age 27 to 57 years who had been bilaterally profoundly deaf for 1 to 30 years prior to receiving a unilateral CI24 cochlear implant. Changes over time in middle latency response (MLR), mismatch negativity, and obligatory cortical auditory evoked potentials and word and sentence speech perception scores were examined. Speech perception improved significantly over the 9-month period. MLRs varied and showed no consistent change over time. Three participants aged in their 50s had absent MLRs. The pattern of change in N1 amplitudes over the five visits varied across participants. P2 area increased significantly for 1,000- and 4,000-Hz tones but not for 250 Hz. The greatest change in P2 area occurred after 6 months of implant experience. Although there was a trend for mismatch negativity peak latency to reduce and width to increase after 3 months of implant experience, there was considerable variability and these changes were not significant. Only 60% of participants had a detectable mismatch initially; this increased to 100% at 9 months. The continued change in P2 area over the period evaluated, with a trend for greater change for right hemisphere recordings, is consistent with the pattern of incremental change in speech perception scores over time. MLR, N1, and mismatch negativity changes were inconsistent and hence P2 may be a more robust measure
Mayer, Jennifer L; Hannent, Ian; Heaton, Pamela F
Whilst enhanced perception has been widely reported in individuals with Autism Spectrum Disorders (ASDs), relatively little is known about the developmental trajectory and impact of atypical auditory processing on speech perception in intellectually high-functioning adults with ASD. This paper presents data on perception of complex tones and speech pitch in adult participants with high-functioning ASD and typical development, and compares these with pre-existing data using the same paradigm with groups of children and adolescents with and without ASD. As perceptual processing abnormalities are likely to influence behavioural performance, regression analyses were carried out on the adult data set. The findings revealed markedly different pitch discrimination trajectories and language correlates across diagnostic groups. While pitch discrimination increased with age and correlated with receptive vocabulary in groups without ASD, it was enhanced in childhood and stable across development in ASD. Pitch discrimination scores did not correlate with receptive vocabulary scores in the ASD group and for adults with ASD superior pitch perception was associated with sensory atypicalities and diagnostic measures of symptom severity. We conclude that the development of pitch discrimination, and its associated mechanisms markedly distinguish those with and without ASD.
Wang, Juan; Gao, Danqi; Li, Duan; Desroches, Amy S; Liu, Li; Li, Xiaoli
This study investigates how the interaction of different brain oscillations (particularly theta-gamma coupling) modulates the bottom-up and top-down processes during speech perception. We employed a speech perception paradigm that manipulated the congruency between a visually presented picture and an auditory stimulus and asked participants to judge whether they matched or mismatched. A group of children (mean age 10 years, 5 months) participated in this study and their electroencephalographic (EEG) data were recorded while performing the experimental task. It was found that in comparison with mismatch condition, match condition facilitated speech perception by eliciting greater theta-gamma coupling in the frontal area and smaller theta-gamma coupling in the left temporal area. These findings suggested that a top-down facilitation effect from congruent visual pictures engaged different mechanisms in low-level sensory (temporal) regions and high-level linguistic and decision (frontal) regions. Interestingly, hemispheric asymmetry is with higher theta-gamma coupling in the match condition in the right hemisphere and higher theta-gamma coupling in the mismatch condition in the left hemisphere. This indicates that a fast global processing strategy and a slow detailed processing strategy were differentially adopted in the match and mismatch conditions. This study provides new insight into the mechanisms of speech perception from the interaction of different oscillatory activities and provides neural evidence for theories of speech perception allowing for top-down feedback connections. Furthermore, it sheds light on children's speech perception development by showing a similar pattern of integration of bottom-up and top-down information during speech perception as previous studies have revealed in adults.
Francis, Alexander L.; MacPherson, Megan K.; Chandrasekaran, Bharath; Alvar, Ann M.
Typically, understanding speech seems effortless and automatic. However, a variety of factors may, independently or interactively, make listening more effortful. Physiological measures may help to distinguish between the application of different cognitive mechanisms whose operation is perceived as effortful. In the present study, physiological and behavioral measures associated with task demand were collected along with behavioral measures of performance while participants listened to and repeated sentences. The goal was to measure psychophysiological reactivity associated with three degraded listening conditions, each of which differed in terms of the source of the difficulty (distortion, energetic masking, and informational masking), and therefore were expected to engage different cognitive mechanisms. These conditions were chosen to be matched for overall performance (keywords correct), and were compared to listening to unmasked speech produced by a natural voice. The three degraded conditions were: (1) Unmasked speech produced by a computer speech synthesizer, (2) Speech produced by a natural voice and masked byspeech-shaped noise and (3) Speech produced by a natural voice and masked by two-talker babble. Masked conditions were both presented at a -8 dB signal to noise ratio (SNR), a level shown in previous research to result in comparable levels of performance for these stimuli and maskers. Performance was measured in terms of proportion of key words identified correctly, and task demand or effort was quantified subjectively by self-report. Measures of psychophysiological reactivity included electrodermal (skin conductance) response frequency and amplitude, blood pulse amplitude and pulse rate. Results suggest that the two masked conditions evoked stronger psychophysiological reactivity than did the two unmasked conditions even when behavioral measures of listening performance and listeners’ subjective perception of task demand were comparable across the
Getzmann, Stephan; Wascher, Edmund
Speech understanding in the presence of concurring sound is a major challenge especially for older persons. In particular, conversational turn-takings usually result in switch costs, as indicated by declined speech perception after changes in the relevant target talker. Here, we investigated whether visual cues indicating the future position of a target talker may reduce the costs of switching in younger and older adults. We employed a speech perception task, in which sequences of short words were simultaneously presented by three talkers, and analysed behavioural measures and event-related potentials (ERPs). Informative cues resulted in increased performance after a spatial change in target talker compared to uninformative cues, not indicating the future target position. Especially the older participants benefited from knowing the future target position in advance, indicated by reduced response times after informative cues. The ERP analysis revealed an overall reduced N2, and a reduced P3b to changes in the target talker location in older participants, suggesting reduced inhibitory control and context updating. On the other hand, a pronounced frontal late positive complex (f-LPC) to the informative cues indicated increased allocation of attentional resources to changes in target talker in the older group, in line with the decline-compensation hypothesis. Thus, knowing where to listen has the potential to compensate for age-related decline in attentional switching in a highly variable cocktail-party environment.
Mitterer, Holger; Tuinman, Annelie
Casual speech processes, such as /t/-reduction, make word recognition harder. Additionally, word recognition is also harder in a second language (L2). Combining these challenges, we investigated whether L2 learners have recourse to knowledge from their native language (L1) when dealing with casual speech processes in their L2. In three experiments, production and perception of /t/-reduction was investigated. An initial production experiment showed that /t/-reduction occurred in both languages and patterned similarly in proper nouns but differed when /t/ was a verbal inflection. Two perception experiments compared the performance of German learners of Dutch with that of native speakers for nouns and verbs. Mirroring the production patterns, German learners’ performance strongly resembled that of native Dutch listeners when the reduced /t/ was part of a word stem, but deviated where /t/ was a verbal inflection. These results suggest that a casual speech process in a second language is problematic for learners when the process is not known from the leaner’s native language, similar to what has been observed for phoneme contrasts. PMID:22811675
Zhang, Juan; McBride-Chang, Catherine
A 4-stage developmental model, in which auditory sensitivity is fully mediated by speech perception at both the segmental and suprasegmental levels, which are further related to word reading through their associations with phonological awareness, rapid automatized naming, verbal short-term memory and morphological awareness, was tested with concurrently collected data on 153 2nd- and 3rd-grade Hong Kong Chinese children. Nested model comparisons were conducted to test this model separately against alternatives in relation to both Chinese and English word reading using structural equation modeling. For Chinese word reading, the proposed 4-stage model was demonstrated to be the best model. Auditory sensitivity was associated with speech perception, which was related to Chinese word reading mainly through its relations to morphological awareness and rapid automatized naming. In contrast, for English word reading, the best model required an additional direct path from suprasegmental sensitivity (in Chinese) to English word reading. That is, in addition to phonological awareness, Chinese speech prosody was also directly associated with English word recognition.
Pasanisi, Enrico; Bacciu, Andrea; Vincenti, Vincenzo; Guida, Maurizio; Berghenti, Maria Teresa; Barbot, Anna; Panu, Francesco; Bacciu, Salvatore
Nine congenitally deaf children who received a Nucleus CI24M cochlear implant and who were fitted with the SPrint speech processor participated in this study. All subjects were initially programmed with the SPEAK coding strategy and then converted to the ACE strategy. Speech perception was evaluated before and after conversion to the new coding strategy using word and Common Phrase speech recognition tests in both the presence and absence of noise. In quiet conditions, the mean percent correct scores for words were 68.8% with SPEAK and 91% with ACE; for phrases the percentage was 66.6% with SPEAK and 85.5% with ACE. In the presence of noise (at +10 dB signal-to-noise ratio), the mean percent correct scores for words were 43.3% with SPEAK compared to 84.4% with ACE; for phrases the percentage was 41.1% with SPEAK and 82.2% with ACE. Statistical analysis revealed significant improvement in open-set speech recognition with ACE compared to SPEAK. Preliminary data suggest that converting children from SPEAK to the ACE strategy improves their performance. Subjects showed significant improvements for open-set word and sentence recognition in quiet as well as in noise when ACE was used in comparison with SPEAK. The greatest improvements were obtained when tests were presented in the presence of noise.
Rüsseler, J; Gerth, I; Heldmann, M; Münte, T F
The present study used event-related brain potentials (ERPs) to investigate audiovisual integration processes in the perception of natural speech in a group of German adult developmental dyslexic readers. Twelve dyslexic and twelve non-dyslexic adults viewed short videos of a male German speaker. Disyllabic German nouns served as stimulus material. The auditory and the visual stimulus streams were segregated to create four conditions: in the congruent condition, the spoken word and the auditory word were identical. In the incongruent condition, the auditory and the visual word (i.e., the lip movements of the utterance) were different. Furthermore, on half of the trials, white noise (45 dB SPL) was superimposed on the auditory trace. Subjects had to say aloud the word they understood after they viewed the video. Behavioral data. Dyslexic readers committed more errors compared to normal readers in the noise conditions, and this effect was particularly present for congruent trials. ERPs showed a distinct N170 component at temporo-parietal electrodes that was smaller in amplitude for dyslexic readers. Both, normal and dyslexic readers, showed a clear effect of noise at centro-parietal electrodes between 300 and 600 ms. An analysis of error trials reflecting audiovisual integration (verbal responses in the incongruent noise condition that are a mix of the visual and the auditory word) revealed more positive ERPs for dyslexic readers at temporo-parietal electrodes 200-500 ms poststimulus. For normal readers, no such effect was present. These findings are discussed as reflecting increased effort in dyslexics under circumstances of distorted acoustic input. The superimposition of noise leads dyslexics to rely more on the integration of auditory and visual input (lip reading). Furthermore, the smaller N170-amplitudes indicate deficits in the processing of moving faces in dyslexic adults.
Santarelli, Rosamaria; del Castillo, Ignacio; Cama, Elona; Scimemi, Pietro; Starr, Arnold
Mutations in the OTOF gene encoding otoferlin result in a disrupted function of the ribbon synapses with impairment of the multivesicular glutamate release. Most affected subjects present with congenital hearing loss and abnormal auditory brainstem potentials associated with preserved cochlear hair cell activities (otoacoustic emissions, cochlear microphonics [CMs]). Transtympanic electrocochleography (ECochG) has recently been proposed for defining the details of potentials arising in both the cochlea and auditory nerve in this disorder, and with a view to shedding light on the pathophysiological mechanisms underlying auditory dysfunction. We review the audiological and electrophysiological findings in children with congenital profound deafness carrying two mutant alleles of the OTOF gene. We show that cochlear microphonic (CM) amplitude and summating potential (SP) amplitude and latency are normal, consistently with a preserved outer and inner hair cell function. In the majority of OTOF children, the SP component is followed by a markedly prolonged low-amplitude negative potential replacing the compound action potential (CAP) recorded in normally-hearing children. This potential is identified at intensities as low as 90 dB below the behavioral threshold. In some ears, a synchronized CAP is superimposed on the prolonged responses at high intensity. Stimulation at high rates reduces the amplitude and duration of the prolonged potentials, consistently with their neural generation. In some children, however, the ECochG response only consists of the SP, with no prolonged potential. Cochlear implants restore hearing sensitivity, speech perception and neural CAP by electrically stimulating the auditory nerve fibers. These findings indicate that an impaired multivesicular glutamate release in OTOF-related disorders leads to abnormal auditory nerve fiber activation and a consequent impairment of spike generation. The magnitude of these effects seems to vary, ranging from
Sjerps, Matthias J; Reinisch, Eva
Listeners have to overcome variability of the speech signal that can arise, for example, because of differences in room acoustics, differences in speakers' vocal tract properties, or idiosyncrasies in pronunciation. Two mechanisms that are involved in resolving such variation are perceptually contrastive effects that arise from surrounding acoustic context and lexically guided perceptual learning. Although both processes have been studied in great detail, little attention has been paid to how they operate relative to each other in speech perception. The present study set out to address this issue. The carrier parts of exposure stimuli of a classical perceptual learning experiment were spectrally filtered such that the acoustically ambiguous final fricatives sounded relatively more like the lexically intended sound (Experiment 1) or the alternative (Experiment 2). Perceptual learning was found only in the latter case. The findings show that perceptual contrast effects precede lexically guided perceptual learning, at least in terms of temporal order, and potentially in terms of cognitive processing levels as well.
Baese-Berk, Melissa M.; Dilley, Laura C.; Schmidt, Stephanie; Morrill, Tuuli H.; Pitt, Mark A.
Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209
Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A
Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate.
Students exhibiting speech deficits may not have the appropriate skills or support structures necessary to obtain adequate or acceptable literacy development as mixed results from past research have indicated that some students with speech impairments have the capacity to gain appropriate literacy skills. The purpose of the qualitative holistic…
The current dissertation investigated clear speech production of Korean stops to examine the proposal that the phonetic targets of phonological categories are more closely approximated in hyperarticulated speech. The investigation also considered a sound change currently underway in Korean stops: younger speakers of the Seoul dialect produce the…
Cornell, Sonia A.; Lahiri, Aditi; Eulitz, Carsten
The precise structure of speech sound representations is still a matter of debate. In the present neurobiological study, we compared predictions about differential sensitivity to speech contrasts between models that assume full specification of all phonological information in the mental lexicon with those assuming sparse representations (only…
Greenwood, Nan; Wright, Jannet A.; Bithell, Christine
Background: Communication disorders affect both sexes and people from all ethnic groups, but members of minority ethnic groups and males in the UK are underrepresented in the speech and language therapy profession. Research in the area of recruitment is limited, but a possible explanation is poor awareness and understanding of speech and language…
Casini, Laurence; Burle, Boris; Nguyen, Noel
Time is essential to speech. The duration of speech segments plays a critical role in the perceptual identification of these segments, and therefore in that of spoken words. Here, using a French word identification task, we show that vowels are perceived as shorter when attention is divided between two tasks, as compared to a single task control…
Gow, David W., Jr.; Segawa, Jennifer A.
The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analysis of high spatiotemporal resolution…
Schiavetti, Nicholas; Whitehead, Robert L.; Metz, Dale Evan
This article reviews experiments completed over the past decade at the National Technical Institute for the Deaf and the State University of New York at Geneseo concerning speech produced during simultaneous communication (SC) and synthesizes the empirical evidence concerning the acoustical and perceptual characteristics of speech in SC.…
Robertson, Susie; von Hapsburg, Deborah; Hay, Jessica S.
Purpose: Infant-directed speech (IDS) facilitates language learning in infants with normal hearing, compared to adult-directed speech (ADS). It is well established that infants with normal hearing prefer to listen to IDS over ADS. The purpose of this study was to determine whether infants with hearing impairment (HI), like their NH peers, show a…
Adams, Tuuli Morrill
Listeners segment words from the continuous speech stream in their native language by using rhythmic structure, phrasal structure, and phonotactics (e.g. Christophe et al, 2003: McQueen, 1998). One challenging aspect of second language acquisition is the extraction of words from fluent speech, possibly because learners apply a native language…
Kennedy, Sara; Blanchet, Josée
To be effective second or additional language (L2) listeners, learners should be aware of typical processes in connected L2 speech (e.g. linking). This longitudinal study explored how learners' developing ability to perceive connected L2 speech was related to the quality of their language awareness. Thirty-two learners of L2 French at a university…
A renewed focus on foreign language (FL) learning and speech for communication has resulted in computer-assisted language learning (CALL) software developed with Automatic Speech Recognition (ASR). ASR features for FL pronunciation (Lafford, 2004) are functional components of CALL designs used for FL teaching and learning. The ASR features…
Rosner, Burton S.; Talcott, Joel B.; Witton, Caroline; Hogg, James D.; Richardson, Alexandra J.; Hansen, Peter C.; Stein, John F.
"Sine-wave speech" sentences contain only four frequency-modulated sine waves, lacking many acoustic cues present in natural speech. Adults with (n=19) and without (n=14) dyslexia were asked to reproduce orally sine-wave utterances in successive trials. Results suggest comprehension of sine-wave sentences is impaired in some adults with…
Vatakis, Argiro; Spence, Charles
This study investigated people's sensitivity to audiovisual asynchrony in briefly-presented speech and musical videos. A series of speech (letters and syllables) and guitar and piano music (single and double notes) video clips were presented randomly at a range of stimulus onset asynchronies (SOAs) using the method of constant stimuli. Participants made unspeeded temporal order judgments (TOJs) regarding which stream (auditory or visual) appeared to have been presented first. The accuracy of participants' TOJ performance (measured in terms of the just noticeable difference; JND) was significantly better for the speech than for either the guitar or piano music video clips, suggesting that people are more sensitive to asynchrony for speech than for music stimuli. The visual stream had to lead the auditory stream for the point of subjective simultaneity (PSS) to be achieved in the piano music clips while auditory leads were typically required for the guitar music clips. The PSS values obtained for the speech stimuli varied substantially as a function of the particular speech sound presented. These results provide the first empirical evidence regarding people's sensitivity to audiovisual asynchrony for musical stimuli. Our results also demonstrate that people's sensitivity to asynchrony in speech stimuli is better than has been suggested on the basis of previous research using continuous speech streams as stimuli.
Eskridge, Elizabeth N.; Galvin, John J., III; Aronoff, Justin M.; Li, Tianhao; Fu, Qian-Jie
Purpose: The goal of this study was to investigate how the spectral and temporal properties in background music may interfere with cochlear implant (CI) and normal-hearing listeners' (NH) speech understanding. Method: Speech-recognition thresholds (SRTs) were adaptively measured in 11 CI and 9 NH subjects. CI subjects were tested while using their…
Du, Yi; Buchsbaum, Bradley R.; Grady, Cheryl L.; Alain, Claude
Understanding speech in noisy environments is challenging, especially for seniors. Although evidence suggests that older adults increasingly recruit prefrontal cortices to offset reduced periphery and central auditory processing, the brain mechanisms underlying such compensation remain elusive. Here we show that relative to young adults, older adults show higher activation of frontal speech motor areas as measured by functional MRI during a syllable identification task at varying signal-to-noise ratios. This increased activity correlates with improved speech discrimination performance in older adults. Multivoxel pattern classification reveals that despite an overall phoneme dedifferentiation, older adults show greater specificity of phoneme representations in frontal articulatory regions than auditory regions. Moreover, older adults with stronger frontal activity have higher phoneme specificity in frontal and auditory regions. Thus, preserved phoneme specificity and upregulation of activity in speech motor regions provide a means of compensation in older adults for decoding impoverished speech representations in adverse listening conditions. PMID:27483187
Wang, Xiaoyue; Wang, Suiping; Fan, Yuebo; Huang, Dan; Zhang, Yang
Recent studies reveal that tonal language speakers with autism have enhanced neural sensitivity to pitch changes in nonspeech stimuli but not to lexical tone contrasts in their native language. The present ERP study investigated whether the distinct pitch processing pattern for speech and nonspeech stimuli in autism was due to a speech-specific deficit in categorical perception of lexical tones. A passive oddball paradigm was adopted to examine two groups (16 in the autism group and 15 in the control group) of Chinese children’s Mismatch Responses (MMRs) to equivalent pitch deviations representing within-category and between-category differences in speech and nonspeech contexts. To further examine group-level differences in the MMRs to categorical perception of speech/nonspeech stimuli or lack thereof, neural oscillatory activities at the single trial level were further calculated with the inter-trial phase coherence (ITPC) measure for the theta and beta frequency bands. The MMR and ITPC data from the children with autism showed evidence for lack of categorical perception in the lexical tone condition. In view of the important role of lexical tones in acquiring a tonal language, the results point to the necessity of early intervention for the individuals with autism who show such a speech-specific categorical perception deficit. PMID:28225070
Kislova, O O; Rusalova, M N
The article is a review of the general concepts and approaches in research of recognition of emotions in speech: psychological concepts, principles and methods of study and physiological data in studies on animals and human. The concepts of emotional intelligence (ability to understand and recognize emotions of other people and to understand and regulate personal emotions), emotional hearing (ability to recognize emotions in speech) are discussed, general review of the paradigms is presented. The research of brain mechanisms of speech emotions differentiation is based on the study of local injuries and dysfunctions, along with the study on healthy subjects.
Kim, Jongin; Lee, Suh-Kyung; Lee, Boreom
Objective. The objective of this study is to find components that might be related to phoneme representation in the brain and to discriminate EEG responses for each speech sound on a trial basis. Approach. We used multivariate empirical mode decomposition (MEMD) and common spatial pattern for feature extraction. We chose three vowel stimuli, /a/, /i/ and /u/, based on previous findings, such that the brain can detect change in formant frequency (F2) of vowels. EEG activity was recorded from seven native Korean speakers at Gwangju Institute of Science and Technology. We applied MEMD over EEG channels to extract speech-related brain signal sources, and looked for the intrinsic mode functions which were dominant in the alpha bands. After the MEMD procedure, we applied the common spatial pattern algorithm for enhancing the classification performance, and used linear discriminant analysis (LDA) as a classifier. Main results. The brain responses to the three vowels could be classified as one of the learned phonemes on a single-trial basis with our approach. Significance. The results of our study show that brain responses to vowels can be classified for single trials using MEMD and LDA. This approach may not only become a useful tool for the brain-computer interface but it could also be used for discriminating the neural correlates of categorical speech perception.
Conboy, Barbara T; Kuhl, Patricia K
Language experience 'narrows' speech perception by the end of infants' first year, reducing discrimination of non-native phoneme contrasts while improving native-contrast discrimination. Previous research showed that declines in non-native discrimination were reversed by second-language experience provided at 9-10 months, but it is not known whether second-language experience affects first-language speech sound processing. Using event-related potentials (ERPs), we examined learning-related changes in brain activity to Spanish and English phoneme contrasts in monolingual English-learning infants pre- and post-exposure to Spanish from 9.5-10.5 months of age. Infants showed a significant discriminatory ERP response to the Spanish contrast at 11 months (post-exposure), but not at 9 months (pre-exposure). The English contrast elicited an earlier discriminatory response at 11 months than at 9 months, suggesting improvement in native-language processing. The results show that infants rapidly encode new phonetic information, and that improvement in native speech processing can occur during second-language learning in infancy.
Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel
Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme. PMID:25781470
Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel
Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
Slater, Jessica; Skoe, Erika; Strait, Dana L; O'Connell, Samantha; Thompson, Elaine; Kraus, Nina
Music training may strengthen auditory skills that help children not only in musical performance but in everyday communication. Comparisons of musicians and non-musicians across the lifespan have provided some evidence for a "musician advantage" in understanding speech in noise, although reports have been mixed. Controlled longitudinal studies are essential to disentangle effects of training from pre-existing differences, and to determine how much music training is necessary to confer benefits. We followed a cohort of elementary school children for 2 years, assessing their ability to perceive speech in noise before and after musical training. After the initial assessment, participants were randomly assigned to one of two groups: one group began music training right away and completed 2 years of training, while the second group waited a year and then received 1 year of music training. Outcomes provide the first longitudinal evidence that speech-in-noise perception improves after 2 years of group music training. The children were enrolled in an established and successful community-based music program and followed the standard curriculum, therefore these findings provide an important link between laboratory-based research and real-world assessment of the impact of music training on everyday communication skills.
Reiss, Lina A.J.; Perreau, Ann E.; Turner, Christopher W.
Because some users of a Hybrid short-electrode cochlear implant (CI) lose their low-frequency residual hearing after receiving the CI, we tested whether increasing the CI speech processor frequency allocation range to include lower frequencies improves speech perception in these individuals. A secondary goal was to see if pitch perception changed after experience with the new CI frequency allocation. Three subjects who had lost all residual hearing in the implanted ear were recruited to use an experimental CI frequency allocation with a lower frequency cutoff than their current clinical frequency allocation. Speech and pitch perception results were collected at multiple time points throughout the study. In general, subjects showed little or no improvement for speech recognition with the experimental allocation when the CI was worn with a hearing aid in the contralateral ear. However, all three subjects showed changes in pitch perception that followed the changes in frequency allocations over time, consistent with previous studies showing that pitch perception changes upon provision of a CI. PMID:22907151
Kartushina, Natalia; Hervais-Adelman, Alexis; Frauenfelder, Ulrich Hans; Golestani, Narly
Second-language learners often experience major difficulties in producing non-native speech sounds. This paper introduces a training method that uses a real-time analysis of the acoustic properties of vowels produced by non-native speakers to provide them with immediate, trial-by-trial visual feedback about their articulation alongside that of the same vowels produced by native speakers. The Mahalanobis acoustic distance between non-native productions and target native acoustic spaces was used to assess L2 production accuracy. The experiment shows that 1 h of training per vowel improves the production of four non-native Danish vowels: the learners' productions were closer to the corresponding Danish target vowels after training. The production performance of a control group remained unchanged. Comparisons of pre- and post-training vowel discrimination performance in the experimental group showed improvements in perception. Correlational analyses of training-related changes in production and perception revealed no relationship. These results suggest, first, that this training method is effective in improving non-native vowel production. Second, training purely on production improves perception. Finally, it appears that improvements in production and perception do not systematically progress at equal rates within individuals.
Audiovisual perception of conflicting stimuli displays a large level of intersubject variability, generally larger than pure auditory or visual data. However, it is not clear whether this actually reflects differences in integration per se or just the consequence of slight differences in unisensory perception. It is argued that the debate has been blurred by methodological problems in the analysis of experimental data, particularly when using the fuzzy-logical model of perception (FLMP) [Massaro, D. W. (1987). Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry (Laurence Erlbaum Associates, London)] shown to display overfitting abilities with McGurk stimuli [Schwartz, J. L. (2006). J. Acoust. Soc. Am. 120, 1795-1798]. A large corpus of McGurk data is reanalyzed, using a methodology based on (1) comparison of FLMP and a variant with subject-dependent weights of the auditory and visual inputs in the fusion process, weighted FLMP (WFLMP); (2) use of a Bayesian selection model criterion instead of a root mean square error fit in model assessment; and (3) systematic exploration of the number of useful parameters in the models to compare, attempting to discard poorly explicative parameters. It is shown that WFLMP performs significantly better than FLMP, suggesting that audiovisual fusion is indeed subject-dependent, some subjects being more "auditory," and others more "visual." Intersubject variability has important consequences for theoretical understanding of the fusion process, and re-education of hearing impaired people.
Prodi, Nicola; Visentin, Chiara; Feletti, Alice
It is well documented that the interference of noise in the classroom puts younger pupils at a disadvantage for speech perception tasks. Nevertheless, the dependence of this phenomenon on the type of noise, and the way it is realized for each class by a specific combination of intelligibility and effort have not been fully investigated. Following on a previous laboratory study on "listening efficiency," which stems from a combination of accuracy and latency measures, this work tackles the problems above to better understand the basic mechanisms governing the speech perception performance of pupils in noisy classrooms. Listening tests were conducted in real classrooms for a relevant number of students, and tests in quiet were also developed. The statistical analysis is based on stochastic ordering and is able to clarify the behavior of the classes and the different impacts of noises on performance. It is found that the joint babble and activity noise has the worst effect on performance whereas tapping and external traffic noises are less disruptive.
Svirsky, Mario A; Teoh, Su-Wooi; Neuburger, Heidi
Like any other surgery requiring anesthesia, cochlear implantation in the first few years of life carries potential risks, which makes it important to assess the potential benefits. This study introduces a new method to assess the effect of age at implantation on cochlear implant outcomes: developmental trajectory analysis (DTA). DTA compares curves representing change in an outcome measure over time (i.e. developmental trajectories) for two groups of children that differ along a potentially important independent variable (e.g. age at intervention). This method was used to compare language development and speech perception outcomes in children who received cochlear implants in the second, third or fourth year of life. Within this range of age at implantation, it was found that implantation before the age of 2 resulted in speech perception and language advantages that were significant both from a statistical and a practical point of view. Additionally, the present results are consistent with the existence of a 'sensitive period' for language development, a gradual decline in language acquisition skills as a function of age.
Williams, Joshua T; Darcy, Isabelle; Newman, Sharlene D
The aim of the present study was to characterize effects of learning a sign language on the processing of a spoken language. Specifically, audiovisual phoneme comprehension was assessed before and after 13 weeks of sign language exposure. L2 ASL learners performed this task in the fMRI scanner. Results indicated that L2 American Sign Language (ASL) learners' behavioral classification of the speech sounds improved with time compared to hearing nonsigners. Results indicated increased activation in the supramarginal gyrus (SMG) after sign language exposure, which suggests concomitant increased phonological processing of speech. A multiple regression analysis indicated that learner's rating on co-sign speech use and lipreading ability was correlated with SMG activation. This pattern of results indicates that the increased use of mouthing and possibly lipreading during sign language acquisition may concurrently improve audiovisual speech processing in budding hearing bimodal bilinguals.
Mitterer, Holger; McQueen, James M
Understanding foreign speech is difficult, in part because of unusual mappings between sounds and words. It is known that listeners in their native language can use lexical knowledge (about how words ought to sound) to learn how to interpret unusual speech-sounds. We therefore investigated whether subtitles, which provide lexical information, support perceptual learning about foreign speech. Dutch participants, unfamiliar with Scottish and Australian regional accents of English, watched Scottish or Australian English videos with Dutch, English or no subtitles, and then repeated audio fragments of both accents. Repetition of novel fragments was worse after Dutch-subtitle exposure but better after English-subtitle exposure. Native-language subtitles appear to create lexical interference, but foreign-language subtitles assist speech learning by indicating which words (and hence sounds) are being spoken.
Mitterer, Holger; McQueen, James M.
Understanding foreign speech is difficult, in part because of unusual mappings between sounds and words. It is known that listeners in their native language can use lexical knowledge (about how words ought to sound) to learn how to interpret unusual speech-sounds. We therefore investigated whether subtitles, which provide lexical information, support perceptual learning about foreign speech. Dutch participants, unfamiliar with Scottish and Australian regional accents of English, watched Scottish or Australian English videos with Dutch, English or no subtitles, and then repeated audio fragments of both accents. Repetition of novel fragments was worse after Dutch-subtitle exposure but better after English-subtitle exposure. Native-language subtitles appear to create lexical interference, but foreign-language subtitles assist speech learning by indicating which words (and hence sounds) are being spoken. PMID:19918371
Berding, Georg; Wilke, Florian; Rode, Thilo; Haense, Cathleen; Joseph, Gert; Meyer, Geerd J; Mamach, Martin; Lenarz, Minoo; Geworski, Lilli; Bengel, Frank M; Lenarz, Thomas; Lim, Hubert H
Considerable progress has been made in the treatment of hearing loss with auditory implants. However, there are still many implanted patients that experience hearing deficiencies, such as limited speech understanding or vanishing perception with continuous stimulation (i.e., abnormal loudness adaptation). The present study aims to identify specific patterns of cerebral cortex activity involved with such deficiencies. We performed O-15-water positron emission tomography (PET) in patients implanted with electrodes within the cochlea, brainstem, or midbrain to investigate the pattern of cortical activation in response to speech or continuous multi-tone stimuli directly inputted into the implant processor that then delivered electrical patterns through those electrodes. Statistical parametric mapping was performed on a single subject basis. Better speech understanding was correlated with a larger extent of bilateral auditory cortex activation. In contrast to speech, the continuous multi-tone stimulus elicited mainly unilateral auditory cortical activity in which greater loudness adaptation corresponded to weaker activation and even deactivation. Interestingly, greater loudness adaptation was correlated with stronger activity within the ventral prefrontal cortex, which could be up-regulated to suppress the irrelevant or aberrant signals into the auditory cortex. The ability to detect these specific cortical patterns and differences across patients and stimuli demonstrates the potential for using PET to diagnose auditory function or dysfunction in implant patients, which in turn could guide the development of appropriate stimulation strategies for improving hearing rehabilitation. Beyond hearing restoration, our study also reveals a potential role of the frontal cortex in suppressing irrelevant or aberrant activity within the auditory cortex, and thus may be relevant for understanding and treating tinnitus.
Zhang, Ting; Spahr, Anthony J.; Dorman, Michael F.
Objectives Our aim was to assess, for patients with a cochlear implant in one ear and low-frequency acoustic hearing in the contralateral ear, whether reducing the overlap in frequencies conveyed in the acoustic signal and those analyzed by the cochlear implant speech processor would improve speech recognition. Design The recognition of monosyllabic words in quiet and sentences in noise was evaluated in three listening configurations: electric stimulation alone, acoustic stimulation alone, and combined electric and acoustic stimulation. The acoustic stimuli were either unfiltered or low-pass (LP) filtered at 250 Hz, 500 Hz, or 750 Hz. The electric stimuli were either unfiltered or high-pass (HP) filtered at 250 Hz, 500 Hz or 750 Hz. In the combined condition the unfiltered acoustic signal was paired with the unfiltered electric signal, the 250 LP acoustic signal was paired with the 250 Hz HP electric signal, the 500 Hz LP acoustic signal was paired with the 500 Hz HP electric signal and the 750 Hz LP acoustic signal was paired with the 750 Hz HP electric signal. Results For both acoustic and electric signals performance increased as the bandwith increased. The highest level of performance in the combined condition was observed in the unfiltered acoustic plus unfiltered electric condition. Conclusions Reducing the overlap in frequency representation between acoustic and electric stimulation does not increase speech understanding scores for patients who have residual hearing in the ear contralateral to the implant. We find that acoustic information below 250 Hz significantly improves performance for patients who combine electric and acoustic stimulation and accounts for the majority of the speech-perception benefit when acoustic stimulation is combined with electric stimulation. PMID:19915474
Pedersen, Julie H.; Laugesen, Søren; Santurette, Sébastien; Dau, Torsten; MacDonald, Ewen N.
This study investigated the relationship between speech perception performance in spatially complex, lateralized listening scenarios and temporal fine-structure (TFS) coding at low frequencies. Young normal-hearing (NH) and two groups of elderly hearing-impaired (HI) listeners with mild or moderate hearing loss above 1.5 kHz participated in the study. Speech reception thresholds (SRTs) were estimated in the presence of either speech-shaped noise, two-, four-, or eight-talker babble played reversed, or a nonreversed two-talker masker. Target audibility was ensured by applying individualized linear gains to the stimuli, which were presented over headphones. The target and masker streams were lateralized to the same or to opposite sides of the head by introducing 0.7-ms interaural time differences between the ears. TFS coding was assessed by measuring frequency discrimination thresholds and interaural phase difference thresholds at 250 Hz. NH listeners had clearly better SRTs than the HI listeners. However, when maskers were spatially separated from the target, the amount of SRT benefit due to binaural unmasking differed only slightly between the groups. Neither the frequency discrimination threshold nor the interaural phase difference threshold tasks showed a correlation with the SRTs or with the amount of masking release due to binaural unmasking, respectively. The results suggest that, although HI listeners with normal hearing thresholds below 1.5 kHz experienced difficulties with speech understanding in spatially complex environments, these limitations were unrelated to TFS coding abilities and were only weakly associated with a reduction in binaural-unmasking benefit for spatially separated competing sources. PMID:27601071
Deschamps, Isabelle; Tremblay, Pascale
The processing of fluent speech involves complex computational steps that begin with the segmentation of the continuous flow of speech sounds into syllables and words. One question that naturally arises pertains to the type of syllabic information that speech processes act upon. Here, we used functional magnetic resonance imaging to profile regions, using a combination of whole-brain and exploratory anatomical region-of-interest (ROI) approaches, that were sensitive to syllabic information during speech perception by parametrically manipulating syllabic complexity along two dimensions: (1) individual syllable complexity, and (2) sequence complexity (supra-syllabic). We manipulated the complexity of the syllable by using the simplest syllable template—a consonant and vowel (CV)-and inserting an additional consonant to create a complex onset (CCV). The supra-syllabic complexity was manipulated by creating sequences composed of the same syllable repeated six times (e.g., /pa-pa-pa-pa-pa-pa/) and sequences of three different syllables each repeated twice (e.g., /pa-ta-ka-pa-ta-ka/). This parametrical design allowed us to identify brain regions sensitive to (1) syllabic complexity independent of supra-syllabic complexity, (2) supra-syllabic complexity independent of syllabic complexity and, (3) both syllabic and supra-syllabic complexity. High-resolution scans were acquired for 15 healthy adults. An exploratory anatomical ROI analysis of the supratemporal plane (STP) identified bilateral regions within the anterior two-third of the planum temporale, the primary auditory cortices as well as the anterior two-third of the superior temporal gyrus that showed different patterns of sensitivity to syllabic and supra-syllabic information. These findings demonstrate that during passive listening of syllable sequences, sublexical information is processed automatically, and sensitivity to syllabic and supra-syllabic information is localized almost exclusively within the STP
Barlow, Nathan; Purdy, Suzanne C.; Sharma, Mridula; Giles, Ellen; Narne, Vijay
This study investigated whether a short intensive psychophysical auditory training program is associated with speech perception benefits and changes in cortical auditory evoked potentials (CAEPs) in adult cochlear implant (CI) users. Ten adult implant recipients trained approximately 7 hours on psychophysical tasks (Gap-in-Noise Detection, Frequency Discrimination, Spectral Rippled Noise [SRN], Iterated Rippled Noise, Temporal Modulation). Speech performance was assessed before and after training using Lexical Neighborhood Test (LNT) words in quiet and in eight-speaker babble. CAEPs evoked by a natural speech stimulus /baba/ with varying syllable stress were assessed pre- and post-training, in quiet and in noise. SRN psychophysical thresholds showed a significant improvement (78% on average) over the training period, but performance on other psychophysical tasks did not change. LNT scores in noise improved significantly post-training by 11% on average compared with three pretraining baseline measures. N1P2 amplitude changed post-training for /baba/ in quiet (p = 0.005, visit 3 pretraining versus visit 4 post-training). CAEP changes did not correlate with behavioral measures. CI recipients' clinical records indicated a plateau in speech perception performance prior to participation in the study. A short period of intensive psychophysical training produced small but significant gains in speech perception in noise and spectral discrimination ability. There remain questions about the most appropriate type of training and the duration or dosage of training that provides the most robust outcomes for adults with CIs. PMID:27587925
Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M; Barnes, Lisa; Fosker, Tim
Here we use two filtered speech tasks to investigate children's processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 - 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed.
Mesgarani, Nima; Chang, Edward F
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.
Schuerman, William L; Meyer, Antje; McQueen, James M
In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.
The current study investigated the impact of recasts together with form-focused instruction (FFI) on the development of second language speech perception and production of English /?/ by Japanese learners. Forty-five learners were randomly assigned to three groups--FFI recasts, FFI only, and Control--and exposed to four hours of communicatively…
Murakami, Takenobu; Restle, Julia; Ziemann, Ulf
A left-hemispheric cortico-cortical network involving areas of the temporoparietal junction (Tpj) and the posterior inferior frontal gyrus (pIFG) is thought to support sensorimotor integration of speech perception into articulatory motor activation, but how this network links with the lip area of the primary motor cortex (M1) during speech…
Loucas, Tom; Riches, Nick Greatorex; Charman, Tony; Pickles, Andrew; Simonoff, Emily; Chandler, Susie; Baird, Gillian
Background: The cognitive bases of language impairment in specific language impairment (SLI) and autism spectrum disorders (ASD) were investigated in a novel non-word comparison task which manipulated phonological short-term memory (PSTM) and speech perception, both implicated in poor non-word repetition. Aims: This study aimed to investigate the…
Boothroyd, Arthur; Eisenberg, Laurie S.; Martinez, Amy S.
Purpose: The goal was to assess the effects of maturation and phonological development on performance, by normally hearing children, on an imitative test of auditory capacity (On-Line Imitative Test of Speech-Pattern Contrast Perception [OlimSpac]; Boothroyd, Eisenberg, & Martinez, 2006; Eisenberg, Martinez, & Boothroyd, 2003, 2007). Method:…
Boets, Bart; Wouters, Jan; van Wieringen, Astrid; Ghesquiere, Pol
This study investigates whether the core bottleneck of literacy-impairment should be situated at the phonological level or at a more basic sensory level, as postulated by supporters of the auditory temporal processing theory. Phonological ability, speech perception and low-level auditory processing were assessed in a group of 5-year-old pre-school…
Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.
Whitmal, Nathaniel A.; Poissant, Sarah F.
Two experiments examined the effects of source-to-listener distance (SLD) on sentence recognition in simulations of cochlear implant usage in noisy, reverberant rooms. Experiment 1 tested sentence recognition for three locations in the reverberant field of a small classroom (volume=79.2 m3). Subjects listened to sentences mixed with speech-spectrum noise that were processed with simulated reverberation followed by either vocoding (6, 12, or 24 spectral channels) or no further processing. Results indicated that changes in SLD within a small room produced only minor changes in recognition performance, a finding likely related to the listener remaining in the reverberant field. Experiment 2 tested sentence recognition for a simulated six-channel implant in a larger classroom (volume=175.9 m3) with varying levels of reverberation that could place the three listening locations in either the direct or reverberant field of the room. Results indicated that reducing SLD did improve performance, particularly when direct sound dominated the signal, but did not completely eliminate the effects of reverberation. Scores for both experiments were predicted accurately from speech transmission index values that modeled the effects of SLD, reverberation, and noise in terms of their effects on modulations of the speech envelope. Such models may prove to be a useful predictive tool for evaluating the quality of listening environments for cochlear implant users. PMID:19894835
Heikkinen, Jenna; Jansson-Verkasalo, Eira; Toivanen, Juhani; Suominen, Kalervo; Väyrynen, Eero; Moilanen, Irma; Seppänen, Tapio
Asperger's syndrome (AS) belongs to the group of autism spectrum disorders and is characterized by deficits in social interaction, as manifested e.g. by the lack of social or emotional reciprocity. The disturbance causes clinically significant impairment in social interaction. Abnormal prosody has been frequently identified as a core feature of AS. There are virtually no studies on recognition of basic emotions from speech. This study focuses on how adolescents with AS (n=12) and their typically developed controls (n=15) recognize the basic emotions happy, sad, angry, and 'neutral' from speech prosody. Adolescents with AS recognized basic emotions from speech prosody as well as their typically developed controls did. Possibly the recognition of basic emotions develops during the childhood.
Dreisbach, Laura E.; Leek, Marjorie R.; Lentz, Jennifer J.
The ability to discriminate the spectral shapes of complex sounds is critical to accurate speech perception. Part of the difficulty experienced by listeners with hearing loss in understanding speech sounds in noise may be related to a smearing of the internal representation of the spectral peaks and valleys because of the loss of sensitivity and…
Schmid, Gabriele; Thielmann, Anke; Ziegler, Wolfram
Patients with lesions of the left hemisphere often suffer from oral-facial apraxia, apraxia of speech, and aphasia. In these patients, visual features often play a critical role in speech and language therapy, when pictured lip shapes or the therapist's visible mouth movements are used to facilitate speech production and articulation. This demands…
Weidema, Joey L.; Roncaglia-Denissen, M. P.; Honing, Henkjan
Whether pitch in language and music is governed by domain-specific or domain-general cognitive mechanisms is contentiously debated. The aim of the present study was to investigate whether mechanisms governing pitch contour perception operate differently when pitch information is interpreted as either speech or music. By modulating listening mode, this study aspired to demonstrate that pitch contour perception relies on domain-specific cognitive mechanisms, which are regulated by top–down influences from language and music. Three groups of participants (Mandarin speakers, Dutch speaking non-musicians, and Dutch musicians) were exposed to identical pitch contours, and tested on their ability to identify these contours in a language and musical context. Stimuli consisted of disyllabic words spoken in Mandarin, and melodic tonal analogs, embedded in a linguistic and melodic carrier phrase, respectively. Participants classified identical pitch contours as significantly different depending on listening mode. Top–down influences from language appeared to alter the perception of pitch contour in speakers of Mandarin. This was not the case for non-musician speakers of Dutch. Moreover, this effect was lacking in Dutch speaking musicians. The classification patterns of pitch contours in language and music seem to suggest that domain-specific categorization is modulated by top–down influences from language and music. PMID:27313552
Polka, Linda; Rvachew, Susan; Molnar, Monika
The role of selective attention in infant phonetic perception was examined using a distraction masker paradigm. We compared perception of /bu/ versus /gu/ in 6- to 8-month-olds using a visual fixation procedure. Infants were habituated to multiple natural productions of 1 syllable type and then presented 4 test trials (old-new-old-new). Perception…
Bione, Tiago; Grimshaw, Jennica; Cardoso, Walcir
As stated in Cardoso, Smith, and Garcia Fuentes (2015), second language researchers and practitioners have explored the pedagogical capabilities of Text-To-Speech synthesizers (TTS) for their potential to enhance the acquisition of writing (e.g. Kirstein, 2006), vocabulary and reading (e.g. Proctor, Dalton, & Grisham, 2007), and pronunciation…
Meltzner, Geoffrey S.; Hillman, Robert E.
A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…
Stevenson, Ryan A.; Siemann, Justin K.; Woynaroski, Tiffany G.; Schneider, Brittany C.; Eberly, Haley E.; Camarata, Stephen M.; Wallace, Mark T.
Atypical communicative abilities are a core marker of Autism Spectrum Disorders (ASD). A number of studies have shown that, in addition to auditory comprehension differences, individuals with autism frequently show atypical responses to audiovisual speech, suggesting a multisensory contribution to these communicative differences from their…
Utianski, Rene L; Caviness, John N; Liss, Julie M
High-density electroencephalography was used to evaluate cortical activity during speech comprehension via a sentence verification task. Twenty-four participants assigned true or false to sentences produced with 3 noise-vocoded channel levels (1--unintelligible, 6--decipherable, 16--intelligible), during simultaneous EEG recording. Participant data were sorted into higher- (HP) and lower-performing (LP) groups. The identification of a late-event related potential for LP listeners in the intelligible condition and in all listeners when challenged with a 6-Ch signal supports the notion that this induced potential may be related to either processing degraded speech, or degraded processing of intelligible speech. Different cortical locations are identified as neural generators responsible for this activity; HP listeners are engaging motor aspects of their language system, utilizing an acoustic-phonetic based strategy to help resolve the sentence, while LP listeners do not. This study presents evidence for neurophysiological indices associated with more or less successful speech comprehension performance across listening conditions.
Clarke, Jeanne; Pals, Carina; Benard, Michel R.; Bhargava, Pranesh; Saija, Jefta; Sarampalis, Anastasios; Wagner, Anita; Gaudrain, Etienne
External degradations in incoming speech reduce understanding, and hearing impairment further compounds the problem. While cognitive mechanisms alleviate some of the difficulties, their effectiveness may change with age. In our research, reviewed here, we investigated cognitive compensation with hearing impairment, cochlear implants, and aging, via (a) phonemic restoration as a measure of top-down filling of missing speech, (b) listening effort and response times as a measure of increased cognitive processing, and (c) visual world paradigm and eye gazing as a measure of the use of context and its time course. Our results indicate that between speech degradations and their cognitive compensation, there is a fine balance that seems to vary greatly across individuals. Hearing impairment or inadequate hearing device settings may limit compensation benefits. Cochlear implants seem to allow the effective use of sentential context, but likely at the cost of delayed processing. Linguistic and lexical knowledge, which play an important role in compensation, may be successfully employed in advanced age, as some compensatory mechanisms seem to be preserved. These findings indicate that cognitive compensation in hearing impairment can be highly complicated—not always absent, but also not easily predicted by speech intelligibility tests only.