audiovisual speech processing: Topics by Science.gov

Sample records for audiovisual speech processing

A General Audiovisual Temporal Processing Deficit in Adult Readers With Dyslexia.

PubMed

Francisco, Ana A; Jesse, Alexandra; Groen, Margriet A; McQueen, James M

2017-01-01

Because reading is an audiovisual process, reading impairment may reflect an audiovisual processing deficit. The aim of the present study was to test the existence and scope of such a deficit in adult readers with dyslexia. We tested 39 typical readers and 51 adult readers with dyslexia on their sensitivity to the simultaneity of audiovisual speech and nonspeech stimuli, their time window of audiovisual integration for speech (using incongruent /aCa/ syllables), and their audiovisual perception of phonetic categories. Adult readers with dyslexia showed less sensitivity to audiovisual simultaneity than typical readers for both speech and nonspeech events. We found no differences between readers with dyslexia and typical readers in the temporal window of integration for audiovisual speech or in the audiovisual perception of phonetic categories. The results suggest an audiovisual temporal deficit in dyslexia that is not specific to speech-related events. But the differences found for audiovisual temporal sensitivity did not translate into a deficit in audiovisual speech perception. Hence, there seems to be a hiatus between simultaneity judgment and perception, suggesting a multisensory system that uses different mechanisms across tasks. Alternatively, it is possible that the audiovisual deficit in dyslexia is only observable when explicit judgments about audiovisual simultaneity are required.
Multi-sensory learning and learning to read.

PubMed

Blomert, Leo; Froyen, Dries

2010-09-01

The basis of literacy acquisition in alphabetic orthographies is the learning of the associations between the letters and the corresponding speech sounds. In spite of this primacy in learning to read, there is only scarce knowledge on how this audiovisual integration process works and which mechanisms are involved. Recent electrophysiological studies of letter-speech sound processing have revealed that normally developing readers take years to automate these associations and dyslexic readers hardly exhibit automation of these associations. It is argued that the reason for this effortful learning may reside in the nature of the audiovisual process that is recruited for the integration of in principle arbitrarily linked elements. It is shown that letter-speech sound integration does not resemble the processes involved in the integration of natural audiovisual objects such as audiovisual speech. The automatic symmetrical recruitment of the assumedly uni-sensory visual and auditory cortices in audiovisual speech integration does not occur for letter and speech sound integration. It is also argued that letter-speech sound integration only partly resembles the integration of arbitrarily linked unfamiliar audiovisual objects. Letter-sound integration and artificial audiovisual objects share the necessity of a narrow time window for integration to occur. However, they differ from these artificial objects, because they constitute an integration of partly familiar elements which acquire meaning through the learning of an orthography. Although letter-speech sound pairs share similarities with audiovisual speech processing as well as with unfamiliar, arbitrary objects, it seems that letter-speech sound pairs develop into unique audiovisual objects that furthermore have to be processed in a unique way in order to enable fluent reading and thus very likely recruit other neurobiological learning mechanisms than the ones involved in learning natural or arbitrary unfamiliar audiovisual associations. Copyright 2010 Elsevier B.V. All rights reserved.
The Role of Audiovisual Speech in the Early Stages of Lexical Processing as Revealed by the ERP Word Repetition Effect

ERIC Educational Resources Information Center

Basirat, Anahita; Brunellière, Angèle; Hartsuiker, Robert

2018-01-01

Numerous studies suggest that audiovisual speech influences lexical processing. However, it is not clear which stages of lexical processing are modulated by audiovisual speech. In this study, we examined the time course of the access to word representations in long-term memory when they were presented in auditory-only and audiovisual modalities.…
Audio-visual speech processing in age-related hearing loss: Stronger integration and increased frontal lobe recruitment.

PubMed

Rosemann, Stephanie; Thiel, Christiane M

2018-07-15

Hearing loss is associated with difficulties in understanding speech, especially under adverse listening conditions. In these situations, seeing the speaker improves speech intelligibility in hearing-impaired participants. On the neuronal level, previous research has shown cross-modal plastic reorganization in the auditory cortex following hearing loss leading to altered processing of auditory, visual and audio-visual information. However, how reduced auditory input effects audio-visual speech perception in hearing-impaired subjects is largely unknown. We here investigated the impact of mild to moderate age-related hearing loss on processing audio-visual speech using functional magnetic resonance imaging. Normal-hearing and hearing-impaired participants performed two audio-visual speech integration tasks: a sentence detection task inside the scanner and the McGurk illusion outside the scanner. Both tasks consisted of congruent and incongruent audio-visual conditions, as well as auditory-only and visual-only conditions. We found a significantly stronger McGurk illusion in the hearing-impaired participants, which indicates stronger audio-visual integration. Neurally, hearing loss was associated with an increased recruitment of frontal brain areas when processing incongruent audio-visual, auditory and also visual speech stimuli, which may reflect the increased effort to perform the task. Hearing loss modulated both the audio-visual integration strength measured with the McGurk illusion and brain activation in frontal areas in the sentence task, showing stronger integration and higher brain activation with increasing hearing loss. Incongruent compared to congruent audio-visual speech revealed an opposite brain activation pattern in left ventral postcentral gyrus in both groups, with higher activation in hearing-impaired participants in the incongruent condition. Our results indicate that already mild to moderate hearing loss impacts audio-visual speech processing accompanied by changes in brain activation particularly involving frontal areas. These changes are modulated by the extent of hearing loss. Copyright © 2018 Elsevier Inc. All rights reserved.
Audiovisual Matching in Speech and Nonspeech Sounds: A Neurodynamical Model

ERIC Educational Resources Information Center

Loh, Marco; Schmid, Gabriele; Deco, Gustavo; Ziegler, Wolfram

2010-01-01

Audiovisual speech perception provides an opportunity to investigate the mechanisms underlying multimodal processing. By using nonspeech stimuli, it is possible to investigate the degree to which audiovisual processing is specific to the speech domain. It has been shown in a match-to-sample design that matching across modalities is more difficult…
Multistage audiovisual integration of speech: dissociating identification and detection.

PubMed

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias S

2011-02-01

Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.
The Development of Face Perception in Infancy: Intersensory Interference and Unimodal Visual Facilitation

PubMed Central

Bahrick, Lorraine E.; Lickliter, Robert; Castellanos, Irina

2014-01-01

Although research has demonstrated impressive face perception skills of young infants, little attention has focused on conditions that enhance versus impair infant face perception. The present studies tested the prediction, generated from the Intersensory Redundancy Hypothesis (IRH), that face discrimination, which relies on detection of visual featural information, would be impaired in the context of intersensory redundancy provided by audiovisual speech, and enhanced in the absence of intersensory redundancy (unimodal visual and asynchronous audiovisual speech) in early development. Later in development, following improvements in attention, faces should be discriminated in both redundant audiovisual and nonredundant stimulation. Results supported these predictions. Two-month-old infants discriminated a novel face in unimodal visual and asynchronous audiovisual speech but not in synchronous audiovisual speech. By 3 months, face discrimination was evident even during synchronous audiovisual speech. These findings indicate that infant face perception is enhanced and emerges developmentally earlier following unimodal visual than synchronous audiovisual exposure and that intersensory redundancy generated by naturalistic audiovisual speech can interfere with face processing. PMID:23244407
Electrophysiological evidence for speech-specific audiovisual integration.

PubMed

Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

2014-01-01

Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.
Spatio-temporal Dynamics of Audiovisual Speech Processing

PubMed Central

Bernstein, Lynne E.; Auer, Edward T.; Wagner, Michael; Ponton, Curtis W.

2007-01-01

The cortical processing of auditory-alone, visual-alone, and audiovisual speech information is temporally and spatially distributed, and functional magnetic resonance imaging (fMRI) cannot adequately resolve its temporal dynamics. In order to investigate a hypothesized spatio-temporal organization for audiovisual speech processing circuits, event-related potentials (ERPs) were recorded using electroencephalography (EEG). Stimuli were congruent audiovisual /bα/, incongruent auditory /bα/ synchronized with visual /gα/, auditory-only /bα/, and visual-only /bα/ and /gα/. Current density reconstructions (CDRs) of the ERP data were computed across the latency interval of 50-250 milliseconds. The CDRs demonstrated complex spatio-temporal activation patterns that differed across stimulus conditions. The hypothesized circuit that was investigated here comprised initial integration of audiovisual speech by the middle superior temporal sulcus (STS), followed by recruitment of the intraparietal sulcus (IPS), followed by activation of Broca's area (Miller and d'Esposito, 2005). The importance of spatio-temporally sensitive measures in evaluating processing pathways was demonstrated. Results showed, strikingly, early (< 100 msec) and simultaneous activations in areas of the supramarginal and angular gyrus (SMG/AG), the IPS, the inferior frontal gyrus, and the dorsolateral prefrontal cortex. Also, emergent left hemisphere SMG/AG activation, not predicted based on the unisensory stimulus conditions was observed at approximately 160 to 220 msec. The STS was neither the earliest nor most prominent activation site, although it is frequently considered the sine qua non of audiovisual speech integration. As discussed here, the relatively late activity of the SMG/AG solely under audiovisual conditions is a possible candidate audiovisual speech integration response. PMID:17920933
The contribution of dynamic visual cues to audiovisual speech perception.

PubMed

Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

2015-08-01

Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.
Influences of selective adaptation on perception of audiovisual speech

PubMed Central

Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.

2016-01-01

Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781
A measure for assessing the effects of audiovisual speech integration.

PubMed

Altieri, Nicholas; Townsend, James T; Wenger, Michael J

2014-06-01

We propose a measure of audiovisual speech integration that takes into account accuracy and response times. This measure should prove beneficial for researchers investigating multisensory speech recognition, since it relates to normal-hearing and aging populations. As an example, age-related sensory decline influences both the rate at which one processes information and the ability to utilize cues from different sensory modalities. Our function assesses integration when both auditory and visual information are available, by comparing performance on these audiovisual trials with theoretical predictions for performance under the assumptions of parallel, independent self-terminating processing of single-modality inputs. We provide example data from an audiovisual identification experiment and discuss applications for measuring audiovisual integration skills across the life span.
Language/Culture Modulates Brain and Gaze Processes in Audiovisual Speech Perception.

PubMed

Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki

2016-10-13

Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs' response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs' early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.
Processing of Audiovisually Congruent and Incongruent Speech in School-Age Children with a History of Specific Language Impairment: A Behavioral and Event-Related Potentials Study

ERIC Educational Resources Information Center

Kaganovich, Natalya; Schumaker, Jennifer; Macias, Danielle; Gustafson, Dana

2015-01-01

Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we…
A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography.

PubMed

Ozker, Muge; Schepers, Inga M; Magnotti, John F; Yoshor, Daniel; Beauchamp, Michael S

2017-06-01

Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.
Cortical Integration of Audio-Visual Information

PubMed Central

Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.

2013-01-01

We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442
Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

PubMed

Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

2015-05-01

The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.
Neural correlates of audiovisual speech processing in a second language.

PubMed

Barrós-Loscertales, Alfonso; Ventura-Campos, Noelia; Visser, Maya; Alsius, Agnès; Pallier, Christophe; Avila Rivera, César; Soto-Faraco, Salvador

2013-09-01

Neuroimaging studies of audiovisual speech processing have exclusively addressed listeners' native language (L1). Yet, several behavioural studies now show that AV processing plays an important role in non-native (L2) speech perception. The current fMRI study measured brain activity during auditory, visual, audiovisual congruent and audiovisual incongruent utterances in L1 and L2. BOLD responses to congruent AV speech in the pSTS were stronger than in either unimodal condition in both L1 and L2. Yet no differences in AV processing were expressed according to the language background in this area. Instead, the regions in the bilateral occipital lobe had a stronger congruency effect on the BOLD response (congruent higher than incongruent) in L2 as compared to L1. According to these results, language background differences are predominantly expressed in these unimodal regions, whereas the pSTS is similarly involved in AV integration regardless of language dominance. Copyright © 2013 Elsevier Inc. All rights reserved.
Brain responses to audiovisual speech mismatch in infants are associated with individual differences in looking behaviour.

PubMed

Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Ribeiro, Helena; Potton, Anita; Axelsson, Emma L; Murphy, Elizabeth; Moore, Derek G

2013-11-01

Research on audiovisual speech integration has reported high levels of individual variability, especially among young infants. In the present study we tested the hypothesis that this variability results from individual differences in the maturation of audiovisual speech processing during infancy. A developmental shift in selective attention to audiovisual speech has been demonstrated between 6 and 9 months with an increase in the time spent looking to articulating mouths as compared to eyes (Lewkowicz & Hansen-Tift. (2012) Proc. Natl Acad. Sci. USA, 109, 1431-1436; Tomalski et al. (2012) Eur. J. Dev. Psychol., 1-14). In the present study we tested whether these changes in behavioural maturational level are associated with differences in brain responses to audiovisual speech across this age range. We measured high-density event-related potentials (ERPs) in response to videos of audiovisually matching and mismatched syllables /ba/ and /ga/, and subsequently examined visual scanning of the same stimuli with eye-tracking. There were no clear age-specific changes in ERPs, but the amplitude of audiovisual mismatch response (AVMMR) to the combination of visual /ba/ and auditory /ga/ was strongly negatively associated with looking time to the mouth in the same condition. These results have significant implications for our understanding of individual differences in neural signatures of audiovisual speech processing in infants, suggesting that they are not strictly related to chronological age but instead associated with the maturation of looking behaviour, and develop at individual rates in the second half of the first year of life. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Robot Command Interface Using an Audio-Visual Speech Recognition System

NASA Astrophysics Data System (ADS)

Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography

PubMed Central

Ozker, Muge; Schepers, Inga M.; Magnotti, John F.; Yoshor, Daniel; Beauchamp, Michael S.

2017-01-01

Human speech can be comprehended using only auditory information from the talker’s voice. However, comprehension is improved if the talker’s face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl’s gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech. PMID:28253074
Effect of attentional load on audiovisual speech perception: evidence from ERPs.

PubMed

Alsius, Agnès; Möttönen, Riikka; Sams, Mikko E; Soto-Faraco, Salvador; Tiippana, Kaisa

2014-01-01

Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs) generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual, and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e., a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.
Bimodal bilingualism as multisensory training?: Evidence for improved audiovisual speech perception after sign language exposure.

PubMed

Williams, Joshua T; Darcy, Isabelle; Newman, Sharlene D

2016-02-15

The aim of the present study was to characterize effects of learning a sign language on the processing of a spoken language. Specifically, audiovisual phoneme comprehension was assessed before and after 13 weeks of sign language exposure. L2 ASL learners performed this task in the fMRI scanner. Results indicated that L2 American Sign Language (ASL) learners' behavioral classification of the speech sounds improved with time compared to hearing nonsigners. Results indicated increased activation in the supramarginal gyrus (SMG) after sign language exposure, which suggests concomitant increased phonological processing of speech. A multiple regression analysis indicated that learner's rating on co-sign speech use and lipreading ability was correlated with SMG activation. This pattern of results indicates that the increased use of mouthing and possibly lipreading during sign language acquisition may concurrently improve audiovisual speech processing in budding hearing bimodal bilinguals. Copyright © 2015 Elsevier B.V. All rights reserved.
Some Behavioral and Neurobiological Constraints on Theories of Audiovisual Speech Integration: A Review and Suggestions for New Directions

PubMed Central

Altieri, Nicholas; Pisoni, David B.; Townsend, James T.

2012-01-01

Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield’s feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081
Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions.

PubMed

Altieri, Nicholas; Pisoni, David B; Townsend, James T

2011-01-01

Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield's feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration.
Visual and Auditory Components in the Perception of Asynchronous Audiovisual Speech

PubMed Central

Alcalá-Quintana, Rocío

2015-01-01

Research on asynchronous audiovisual speech perception manipulates experimental conditions to observe their effects on synchrony judgments. Probabilistic models establish a link between the sensory and decisional processes underlying such judgments and the observed data, via interpretable parameters that allow testing hypotheses and making inferences about how experimental manipulations affect such processes. Two models of this type have recently been proposed, one based on independent channels and the other using a Bayesian approach. Both models are fitted here to a common data set, with a subsequent analysis of the interpretation they provide about how experimental manipulations affected the processes underlying perceived synchrony. The data consist of synchrony judgments as a function of audiovisual offset in a speech stimulus, under four within-subjects manipulations of the quality of the visual component. The Bayesian model could not accommodate asymmetric data, was rejected by goodness-of-fit statistics for 8/16 observers, and was found to be nonidentifiable, which renders uninterpretable parameter estimates. The independent-channels model captured asymmetric data, was rejected for only 1/16 observers, and identified how sensory and decisional processes mediating asynchronous audiovisual speech perception are affected by manipulations that only alter the quality of the visual component of the speech signal. PMID:27551361
Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception.

PubMed

Baart, Martijn; Lindborg, Alma; Andersen, Tobias S

2017-11-01

Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech-induced suppression of P2 amplitude (which is generally taken as a measure of audiovisual integration) for fusions was similar to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli. © 2017 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
An Acquired Deficit of Audiovisual Speech Processing

ERIC Educational Resources Information Center

Hamilton, Roy H.; Shenton, Jeffrey T.; Coslett, H. Branch

2006-01-01

We report a 53-year-old patient (AWF) who has an acquired deficit of audiovisual speech integration, characterized by a perceived temporal mismatch between speech sounds and the sight of moving lips. AWF was less accurate on an auditory digit span task with vision of a speaker's face as compared to a condition in which no visual information from…
Effect of attentional load on audiovisual speech perception: evidence from ERPs

PubMed Central

Alsius, Agnès; Möttönen, Riikka; Sams, Mikko E.; Soto-Faraco, Salvador; Tiippana, Kaisa

2014-01-01

Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs) generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual, and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e., a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech. PMID:25076922
Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment

PubMed Central

PONS, FERRAN; ANDREU, LLORENC.; SANZ-TORRENT, MONICA; BUIL-LEGAZ, LUCIA; LEWKOWICZ, DAVID J.

2014-01-01

Speech perception involves the integration of auditory and visual articulatory information and, thus, requires the perception of temporal synchrony between this information. There is evidence that children with Specific Language Impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component followed the visual component. None of the groups perceived an audiovisual asynchrony of 366ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception. PMID:22874648
Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment.

PubMed

Pons, Ferran; Andreu, Llorenç; Sanz-Torrent, Monica; Buil-Legaz, Lucía; Lewkowicz, David J

2013-06-01

Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666 ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component preceded [corrected] the visual component. None of the groups perceived an audiovisual asynchrony of 366 ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception.
Audiovisual integration of speech in a patient with Broca's Aphasia

PubMed Central

Andersen, Tobias S.; Starrfelt, Randi

2015-01-01

Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca's aphasia. PMID:25972819
Audiovisual integration for speech during mid-childhood: electrophysiological evidence.

PubMed

Kaganovich, Natalya; Schumaker, Jennifer

2014-12-01

Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7-8-year-olds and 10-11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. Copyright © 2014 Elsevier Inc. All rights reserved.
Audiovisual integration for speech during mid-childhood: Electrophysiological evidence

PubMed Central

Kaganovich, Natalya; Schumaker, Jennifer

2014-01-01

Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7–8-year-olds and 10–11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. PMID:25463815
Neural initialization of audiovisual integration in prereaders at varying risk for developmental dyslexia.

PubMed

I Karipidis, Iliana; Pleisch, Georgette; Röthlisberger, Martina; Hofstetter, Christoph; Dornbierer, Dario; Stämpfli, Philipp; Brem, Silvia

2017-02-01

Learning letter-speech sound correspondences is a major step in reading acquisition and is severely impaired in children with dyslexia. Up to now, it remains largely unknown how quickly neural networks adopt specific functions during audiovisual integration of linguistic information when prereading children learn letter-speech sound correspondences. Here, we simulated the process of learning letter-speech sound correspondences in 20 prereading children (6.13-7.17 years) at varying risk for dyslexia by training artificial letter-speech sound correspondences within a single experimental session. Subsequently, we acquired simultaneously event-related potentials (ERP) and functional magnetic resonance imaging (fMRI) scans during implicit audiovisual presentation of trained and untrained pairs. Audiovisual integration of trained pairs correlated with individual learning rates in right superior temporal, left inferior temporal, and bilateral parietal areas and with phonological awareness in left temporal areas. In correspondence, a differential left-lateralized parietooccipitotemporal ERP at 400 ms for trained pairs correlated with learning achievement and familial risk. Finally, a late (650 ms) posterior negativity indicating audiovisual congruency of trained pairs was associated with increased fMRI activation in the left occipital cortex. Taken together, a short (<30 min) letter-speech sound training initializes audiovisual integration in neural systems that are responsible for processing linguistic information in proficient readers. To conclude, the ability to learn grapheme-phoneme correspondences, the familial history of reading disability, and phonological awareness of prereading children account for the degree of audiovisual integration in a distributed brain network. Such findings on emerging linguistic audiovisual integration could allow for distinguishing between children with typical and atypical reading development. Hum Brain Mapp 38:1038-1055, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Electrophysiological evidence for a self-processing advantage during audiovisual speech integration.

PubMed

Treille, Avril; Vilain, Coriandre; Kandel, Sonia; Sato, Marc

2017-09-01

Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.
The role of left inferior frontal cortex during audiovisual speech perception in infants.

PubMed

Altvater-Mackensen, Nicole; Grossmann, Tobias

2016-06-01

In the first year of life, infants' speech perception attunes to their native language. While the behavioral changes associated with native language attunement are fairly well mapped, the underlying mechanisms and neural processes are still only poorly understood. Using fNIRS and eye tracking, the current study investigated 6-month-old infants' processing of audiovisual speech that contained matching or mismatching auditory and visual speech cues. Our results revealed that infants' speech-sensitive brain responses in inferior frontal brain regions were lateralized to the left hemisphere. Critically, our results further revealed that speech-sensitive left inferior frontal regions showed enhanced responses to matching when compared to mismatching audiovisual speech, and that infants with a preference to look at the speaker's mouth showed an enhanced left inferior frontal response to speech compared to infants with a preference to look at the speaker's eyes. These results suggest that left inferior frontal regions play a crucial role in associating information from different modalities during native language attunement, fostering the formation of multimodal phonological categories. Copyright © 2016 Elsevier Inc. All rights reserved.
Neural Correlates of Intersensory Processing in Five-Month-Old Infants

PubMed Central

Reynolds, Greg D.; Bahrick, Lorraine E.; Lickliter, Robert; Guy, Maggie W.

2014-01-01

Two experiments assessing event-related potentials in 5-month-old infants were conducted to examine neural correlates of attentional salience and efficiency of processing of a visual event (woman speaking) paired with redundant (synchronous) speech, nonredundant (asynchronous) speech, or no speech. In Experiment 1, the Nc component associated with attentional salience was greater in amplitude following synchronous audiovisual as compared with asynchronous audiovisual and unimodal visual presentations. A block design was utilized in Experiment 2 to examine efficiency of processing of a visual event. Only infants exposed to synchronous audiovisual speech demonstrated a significant reduction in amplitude of the late slow wave associated with successful stimulus processing and recognition memory from early to late blocks of trials. These findings indicate that events that provide intersensory redundancy are associated with enhanced neural responsiveness indicative of greater attentional salience and more efficient stimulus processing as compared with the same events when they provide no intersensory redundancy in 5-month-old infants. PMID:23423948
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

PubMed

Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

2018-05-01

Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
Modulations of 'late' event-related brain potentials in humans by dynamic audiovisual speech stimuli.

PubMed

Lebib, Riadh; Papo, David; Douiri, Abdel; de Bode, Stella; Gillon Dowens, Margaret; Baudonnière, Pierre-Marie

2004-11-30

Lipreading reliably improve speech perception during face-to-face conversation. Within the range of good dubbing, however, adults tolerate some audiovisual (AV) discrepancies and lipreading, then, can give rise to confusion. We used event-related brain potentials (ERPs) to study the perceptual strategies governing the intermodal processing of dynamic and bimodal speech stimuli, either congruently dubbed or not. Electrophysiological analyses revealed that non-coherent audiovisual dubbings modulated in amplitude an endogenous ERP component, the N300, we compared to a 'N400-like effect' reflecting the difficulty to integrate these conflicting pieces of information. This result adds further support for the existence of a cerebral system underlying 'integrative processes' lato sensu. Further studies should take advantage of this 'N400-like effect' with AV speech stimuli to open new perspectives in the domain of psycholinguistics.

The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech.

PubMed

Crosse, Michael J; Lalor, Edmund C

2014-04-01

Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information.
Modeling the Development of Audiovisual Cue Integration in Speech Perception

PubMed Central

Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.

2017-01-01

Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558
Modeling the Development of Audiovisual Cue Integration in Speech Perception.

PubMed

Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

2017-03-21

Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

PubMed

Magnotti, John F; Beauchamp, Michael S

2017-02-01

Audiovisual speech integration combines information from auditory speech (talker's voice) and visual speech (talker's mouth movements) to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga), that are integrated to produce a fused percept ("da"). This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba). We describe a simplified model of causal inference in multisensory speech perception (CIMS) that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.
Simulation of talking faces in the human brain improves auditory speech recognition

PubMed Central

von Kriegstein, Katharina; Dogan, Özgür; Grüter, Martina; Giraud, Anne-Lise; Kell, Christian A.; Grüter, Thomas; Kleinschmidt, Andreas; Kiebel, Stefan J.

2008-01-01

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face. PMID:18436648
Reduced efficiency of audiovisual integration for nonnative speech.

PubMed

Yi, Han-Gyol; Phelps, Jasmine E B; Smiljanic, Rajka; Chandrasekaran, Bharath

2013-11-01

The role of visual cues in native listeners' perception of speech produced by nonnative speakers has not been extensively studied. Native perception of English sentences produced by native English and Korean speakers in audio-only and audiovisual conditions was examined. Korean speakers were rated as more accented in audiovisual than in the audio-only condition. Visual cues enhanced word intelligibility for native English speech but less so for Korean-accented speech. Reduced intelligibility of Korean-accented audiovisual speech was associated with implicit visual biases, suggesting that listener-related factors partially influence the efficiency of audiovisual integration for nonnative speech perception.
Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

NASA Astrophysics Data System (ADS)

Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.
Mismatch Negativity with Visual-only and Audiovisual Speech

PubMed Central

Ponton, Curtis W.; Bernstein, Lynne E.; Auer, Edward T.

2009-01-01

The functional organization of cortical speech processing is thought to be hierarchical, increasing in complexity and proceeding from primary sensory areas centrifugally. The current study used the mismatch negativity (MMN) obtained with electrophysiology (EEG) to investigate the early latency period of visual speech processing under both visual-only (VO) and audiovisual (AV) conditions. Current density reconstruction (CDR) methods were used to model the cortical MMN generator locations. MMNs were obtained with VO and AV speech stimuli at early latencies (approximately 82-87 ms peak in time waveforms relative to the acoustic onset) and in regions of the right lateral temporal and parietal cortices. Latencies were consistent with bottom-up processing of the visible stimuli. We suggest that a visual pathway extracts phonetic cues from visible speech, and that previously reported effects of AV speech in classical early auditory areas, given later reported latencies, could be attributable to modulatory feedback from visual phonetic processing. PMID:19404730
Processing of audiovisually congruent and incongruent speech in school-age children with a history of Specific Language Impairment: a behavioral and event-related potentials study

PubMed Central

Kaganovich, Natalya; Schumaker, Jennifer; Macias, Danielle; Gustafson, Dana

2014-01-01

Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we examined perception of both audiovisually congruent and audiovisually incongruent speech in school-age children with a history of SLI (H-SLI), their typically developing (TD) peers, and adults. In the first experiment, all participants watched videos of a talker articulating syllables ‘ba,’ ‘da,’ and ‘ga’ under three conditions – audiovisual (AV), auditory only (A), and visual only (V). The amplitude of the N1 (but not of the P2) event-related component elicited in the AV condition was significantly reduced compared to the N1 amplitude measured from the sum of the A and V conditions in all groups of participants. Because N1 attenuation to AV speech is thought to index the degree to which facial movements predict the onset of the auditory signal, our findings suggest that this aspect of audiovisual speech perception is mature by mid-childhood and is normal in the H-SLI children. In the second experiment, participants watched videos of audivisually incongruent syllables created to elicit the so-called McGurk illusion (with an auditory ‘pa’ dubbed onto a visual articulation of ‘ka,’ and the expectant perception being that of 'ta' if audiovisual integration took place). As a group, H-SLI children were significantly more likely than either TD children or adults to hear the McGurk syllable as ‘pa’ (in agreement with its auditory component) than as ‘ka’ (in agreement with its visual component), suggesting that susceptibility to the McGurk illusion is reduced in at least some children with a history of SLI. Taken together, the results of the two experiments argue against global audiovisual integration impairment in children with a history of SLI and suggest that, when present, audiovisual integration difficulties in this population likely stem from a later (non-sensory) stage of processing. PMID:25440407
Auditory and audio-visual processing in patients with cochlear, auditory brainstem, and auditory midbrain implants: An EEG study.

PubMed

Schierholz, Irina; Finke, Mareike; Kral, Andrej; Büchner, Andreas; Rach, Stefan; Lenarz, Thomas; Dengler, Reinhard; Sandmann, Pascale

2017-04-01

There is substantial variability in speech recognition ability across patients with cochlear implants (CIs), auditory brainstem implants (ABIs), and auditory midbrain implants (AMIs). To better understand how this variability is related to central processing differences, the current electroencephalography (EEG) study compared hearing abilities and auditory-cortex activation in patients with electrical stimulation at different sites of the auditory pathway. Three different groups of patients with auditory implants (Hannover Medical School; ABI: n = 6, CI: n = 6; AMI: n = 2) performed a speeded response task and a speech recognition test with auditory, visual, and audio-visual stimuli. Behavioral performance and cortical processing of auditory and audio-visual stimuli were compared between groups. ABI and AMI patients showed prolonged response times on auditory and audio-visual stimuli compared with NH listeners and CI patients. This was confirmed by prolonged N1 latencies and reduced N1 amplitudes in ABI and AMI patients. However, patients with central auditory implants showed a remarkable gain in performance when visual and auditory input was combined, in both speech and non-speech conditions, which was reflected by a strong visual modulation of auditory-cortex activation in these individuals. In sum, the results suggest that the behavioral improvement for audio-visual conditions in central auditory implant patients is based on enhanced audio-visual interactions in the auditory cortex. Their findings may provide important implications for the optimization of electrical stimulation and rehabilitation strategies in patients with central auditory prostheses. Hum Brain Mapp 38:2206-2225, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Atypical audio-visual speech perception and McGurk effects in children with specific language impairment

PubMed Central

Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

2014-01-01

Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed. PMID:24904454
Atypical audio-visual speech perception and McGurk effects in children with specific language impairment.

PubMed

Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

2014-01-01

Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.
Audiovisual speech perception development at varying levels of perceptual processing

PubMed Central

Lalonde, Kaylah; Holt, Rachael Frush

2016-01-01

This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children. PMID:27106318
Audiovisual speech perception development at varying levels of perceptual processing.

PubMed

Lalonde, Kaylah; Holt, Rachael Frush

2016-04-01

This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.
Visual information can hinder working memory processing of speech.

PubMed

Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

2013-08-01

The purpose of the present study was to evaluate the new Cognitive Spare Capacity Test (CSCT), which measures aspects of working memory capacity for heard speech in the audiovisual and auditory-only modalities of presentation. In Experiment 1, 20 young adults with normal hearing performed the CSCT and an independent battery of cognitive tests. In the CSCT, they listened to and recalled 2-digit numbers according to instructions inducing executive processing at 2 different memory loads. In Experiment 2, 10 participants performed a less executively demanding free recall task using the same stimuli. CSCT performance demonstrated an effect of memory load and was associated with independent measures of executive function and inference making but not with general working memory capacity. Audiovisual presentation was associated with lower CSCT scores but higher free recall performance scores. CSCT is an executively challenging test of the ability to process heard speech. It captures cognitive aspects of listening related to sentence comprehension that are quantitatively and qualitatively different from working memory capacity. Visual information provided in the audiovisual modality of presentation can hinder executive processing in working memory of nondegraded speech material.
The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise

PubMed Central

Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath

2017-01-01

Purpose Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing varying degrees of linguistic information (2-talker babble or pink noise). Method Twenty-nine monolingual English speakers were instructed to ignore the lexical status of spoken syllables (e.g., gift vs. kift) and to only categorize the initial phonemes (/g/ vs. /k/). The same participants then performed speech recognition tasks in the presence of 2-talker babble or pink noise in audio-only and audiovisual conditions. Results Individuals who demonstrated greater lexical influences on phonemic processing experienced greater speech processing difficulties in 2-talker babble than in pink noise. These selective difficulties were present across audio-only and audiovisual conditions. Conclusion Individuals with greater reliance on lexical processes during speech perception exhibit impaired speech recognition in listening conditions in which competing talkers introduce audible linguistic interferences. Future studies should examine the locus of lexical influences/interferences on phonemic processing and speech-in-speech processing. PMID:28586824
Quantified acoustic-optical speech signal incongruity identifies cortical sites of audiovisual speech processing

PubMed Central

Bernstein, Lynne E.; Lu, Zhong-Lin; Jiang, Jintao

2008-01-01

A fundamental question about human perception is how the speech perceiving brain combines auditory and visual phonetic stimulus information. We assumed that perceivers learn the normal relationship between acoustic and optical signals. We hypothesized that when the normal relationship is perturbed by mismatching the acoustic and optical signals, cortical areas responsible for audiovisual stimulus integration respond as a function of the magnitude of the mismatch. To test this hypothesis, in a previous study, we developed quantitative measures of acoustic-optical speech stimulus incongruity that correlate with perceptual measures. In the current study, we presented low incongruity (LI, matched), medium incongruity (MI, moderately mismatched), and high incongruity (HI, highly mismatched) audiovisual nonsense syllable stimuli during fMRI scanning. Perceptual responses differed as a function of the incongruity level, and BOLD measures were found to vary regionally and quantitatively with perceptual and quantitative incongruity levels. Each increase in level of incongruity resulted in an increase in overall levels of cortical activity and in additional activations. However, the only cortical region that demonstrated differential sensitivity to the three stimulus incongruity levels (HI > MI > LI) was a subarea of the left supramarginal gyrus (SMG). The left SMG might support a fine-grained analysis of the relationship between audiovisual phonetic input in comparison with stored knowledge, as hypothesized here. The methods here show that quantitative manipulation of stimulus incongruity is a new and powerful tool for disclosing the system that processes audiovisual speech stimuli. PMID:18495091
Prediction and constraint in audiovisual speech perception

PubMed Central

Peelle, Jonathan E.; Sommers, Mitchell S.

2015-01-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. PMID:25890390
Dissociating Cortical Activity during Processing of Native and Non-Native Audiovisual Speech from Early to Late Infancy

PubMed Central

Fava, Eswen; Hull, Rachel; Bortfeld, Heather

2014-01-01

Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech). Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity. PMID:25116572
Perception of Intersensory Synchrony in Audiovisual Speech: Not that Special

ERIC Educational Resources Information Center

Vroomen, Jean; Stekelenburg, Jeroen J.

2011-01-01

Perception of intersensory temporal order is particularly difficult for (continuous) audiovisual speech, as perceivers may find it difficult to notice substantial timing differences between speech sounds and lip movements. Here we tested whether this occurs because audiovisual speech is strongly paired ("unity assumption"). Participants made…

Visemic Processing in Audiovisual Discrimination of Natural Speech: A Simultaneous fMRI-EEG Study

ERIC Educational Resources Information Center

Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle

2012-01-01

In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…
Early and late beta-band power reflect audiovisual perception in the McGurk illusion

PubMed Central

Senkowski, Daniel; Keil, Julian

2015-01-01

The McGurk illusion is a prominent example of audiovisual speech perception and the influence that visual stimuli can have on auditory perception. In this illusion, a visual speech stimulus influences the perception of an incongruent auditory stimulus, resulting in a fused novel percept. In this high-density electroencephalography (EEG) study, we were interested in the neural signatures of the subjective percept of the McGurk illusion as a phenomenon of speech-specific multisensory integration. Therefore, we examined the role of cortical oscillations and event-related responses in the perception of congruent and incongruent audiovisual speech. We compared the cortical activity elicited by objectively congruent syllables with incongruent audiovisual stimuli. Importantly, the latter elicited a subjectively congruent percept: the McGurk illusion. We found that early event-related responses (N1) to audiovisual stimuli were reduced during the perception of the McGurk illusion compared with congruent stimuli. Most interestingly, our study showed a stronger poststimulus suppression of beta-band power (13–30 Hz) at short (0–500 ms) and long (500–800 ms) latencies during the perception of the McGurk illusion compared with congruent stimuli. Our study demonstrates that auditory perception is influenced by visual context and that the subsequent formation of a McGurk illusion requires stronger audiovisual integration even at early processing stages. Our results provide evidence that beta-band suppression at early stages reflects stronger stimulus processing in the McGurk illusion. Moreover, stronger late beta-band suppression in McGurk illusion indicates the resolution of incongruent physical audiovisual input and the formation of a coherent, illusory multisensory percept. PMID:25568160
Early and late beta-band power reflect audiovisual perception in the McGurk illusion.

PubMed

Roa Romero, Yadira; Senkowski, Daniel; Keil, Julian

2015-04-01

The McGurk illusion is a prominent example of audiovisual speech perception and the influence that visual stimuli can have on auditory perception. In this illusion, a visual speech stimulus influences the perception of an incongruent auditory stimulus, resulting in a fused novel percept. In this high-density electroencephalography (EEG) study, we were interested in the neural signatures of the subjective percept of the McGurk illusion as a phenomenon of speech-specific multisensory integration. Therefore, we examined the role of cortical oscillations and event-related responses in the perception of congruent and incongruent audiovisual speech. We compared the cortical activity elicited by objectively congruent syllables with incongruent audiovisual stimuli. Importantly, the latter elicited a subjectively congruent percept: the McGurk illusion. We found that early event-related responses (N1) to audiovisual stimuli were reduced during the perception of the McGurk illusion compared with congruent stimuli. Most interestingly, our study showed a stronger poststimulus suppression of beta-band power (13-30 Hz) at short (0-500 ms) and long (500-800 ms) latencies during the perception of the McGurk illusion compared with congruent stimuli. Our study demonstrates that auditory perception is influenced by visual context and that the subsequent formation of a McGurk illusion requires stronger audiovisual integration even at early processing stages. Our results provide evidence that beta-band suppression at early stages reflects stronger stimulus processing in the McGurk illusion. Moreover, stronger late beta-band suppression in McGurk illusion indicates the resolution of incongruent physical audiovisual input and the formation of a coherent, illusory multisensory percept. Copyright © 2015 the American Physiological Society.
Rapid, generalized adaptation to asynchronous audiovisual speech

PubMed Central

Van der Burg, Erik; Goodbourn, Patrick T.

2015-01-01

The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity. PMID:25716790
Audiovisual training is better than auditory-only training for auditory-only speech-in-noise identification.

PubMed

Lidestam, Björn; Moradi, Shahram; Pettersson, Rasmus; Ricklefs, Theodor

2014-08-01

The effects of audiovisual versus auditory training for speech-in-noise identification were examined in 60 young participants. The training conditions were audiovisual training, auditory-only training, and no training (n = 20 each). In the training groups, gated consonants and words were presented at 0 dB signal-to-noise ratio; stimuli were either audiovisual or auditory-only. The no-training group watched a movie clip without performing a speech identification task. Speech-in-noise identification was measured before and after the training (or control activity). Results showed that only audiovisual training improved speech-in-noise identification, demonstrating superiority over auditory-only training.
Prediction and constraint in audiovisual speech perception.

PubMed

Peelle, Jonathan E; Sommers, Mitchell S

2015-07-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect.

PubMed

Van Engen, Kristin J; Xie, Zilong; Chandrasekaran, Bharath

2017-02-01

In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners' auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants' susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners' McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

PubMed

Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

2016-06-17

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. © The Author(s) 2016.
Language familiarity modulates relative attention to the eyes and mouth of a talker.

PubMed

Barenholtz, Elan; Mavica, Lauren; Lewkowicz, David J

2016-02-01

We investigated whether the audiovisual speech cues available in a talker's mouth elicit greater attention when adults have to process speech in an unfamiliar language vs. a familiar language. Participants performed a speech-encoding task while watching and listening to videos of a talker in a familiar language (English) or an unfamiliar language (Spanish or Icelandic). Attention to the mouth increased in monolingual subjects in response to an unfamiliar language condition but did not in bilingual subjects when the task required speech processing. In the absence of an explicit speech-processing task, subjects attended equally to the eyes and mouth in response to both familiar and unfamiliar languages. Overall, these results demonstrate that language familiarity modulates selective attention to the redundant audiovisual speech cues in a talker's mouth in adults. When our findings are considered together with similar findings from infants, they suggest that this attentional strategy emerges very early in life. Copyright © 2015 Elsevier B.V. All rights reserved.
The organization and reorganization of audiovisual speech perception in the first year of life.

PubMed

Danielson, D Kyle; Bruderer, Alison G; Kandhadai, Padmapriya; Vatikiotis-Bateson, Eric; Werker, Janet F

2017-04-01

The period between six and 12 months is a sensitive period for language learning during which infants undergo auditory perceptual attunement, and recent results indicate that this sensitive period may exist across sensory modalities. We tested infants at three stages of perceptual attunement (six, nine, and 11 months) to determine 1) whether they were sensitive to the congruence between heard and seen speech stimuli in an unfamiliar language, and 2) whether familiarization with congruent audiovisual speech could boost subsequent non-native auditory discrimination. Infants at six- and nine-, but not 11-months, detected audiovisual congruence of non-native syllables. Familiarization to incongruent, but not congruent, audiovisual speech changed auditory discrimination at test for six-month-olds but not nine- or 11-month-olds. These results advance the proposal that speech perception is audiovisual from early in ontogeny, and that the sensitive period for audiovisual speech perception may last somewhat longer than that for auditory perception alone.
The organization and reorganization of audiovisual speech perception in the first year of life

PubMed Central

Danielson, D. Kyle; Bruderer, Alison G.; Kandhadai, Padmapriya; Vatikiotis-Bateson, Eric; Werker, Janet F.

2017-01-01

The period between six and 12 months is a sensitive period for language learning during which infants undergo auditory perceptual attunement, and recent results indicate that this sensitive period may exist across sensory modalities. We tested infants at three stages of perceptual attunement (six, nine, and 11 months) to determine 1) whether they were sensitive to the congruence between heard and seen speech stimuli in an unfamiliar language, and 2) whether familiarization with congruent audiovisual speech could boost subsequent non-native auditory discrimination. Infants at six- and nine-, but not 11-months, detected audiovisual congruence of non-native syllables. Familiarization to incongruent, but not congruent, audiovisual speech changed auditory discrimination at test for six-month-olds but not nine- or 11-month-olds. These results advance the proposal that speech perception is audiovisual from early in ontogeny, and that the sensitive period for audiovisual speech perception may last somewhat longer than that for auditory perception alone. PMID:28970650
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals

PubMed Central

Lidestam, Björn; Rönnberg, Jerker

2016-01-01

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. PMID:27317667
Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia.

PubMed

Ye, Zheng; Rüsseler, Jascha; Gerth, Ivonne; Münte, Thomas F

2017-07-25

Dyslexia is an impairment of reading and spelling that affects both children and adults even after many years of schooling. Dyslexic readers have deficits in the integration of auditory and visual inputs but the neural mechanisms of the deficits are still unclear. This fMRI study examined the neural processing of auditorily presented German numbers 0-9 and videos of lip movements of a German native speaker voicing numbers 0-9 in unimodal (auditory or visual) and bimodal (always congruent) conditions in dyslexic readers and their matched fluent readers. We confirmed results of previous studies that the superior temporal gyrus/sulcus plays a critical role in audiovisual speech integration: fluent readers showed greater superior temporal activations for combined audiovisual stimuli than auditory-/visual-only stimuli. Importantly, such an enhancement effect was absent in dyslexic readers. Moreover, the auditory network (bilateral superior temporal regions plus medial PFC) was dynamically modulated during audiovisual integration in fluent, but not in dyslexic readers. These results suggest that superior temporal dysfunction may underly poor audiovisual speech integration in readers with dyslexia. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.
Rapid, generalized adaptation to asynchronous audiovisual speech.

PubMed

Van der Burg, Erik; Goodbourn, Patrick T

2015-04-07

The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Speech Cues Contribute to Audiovisual Spatial Integration

PubMed Central

Bishop, Christopher W.; Miller, Lee M.

2011-01-01

Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral ‘what’ and dorsal ‘where’ pathways. PMID:21909378
Cross-modal interactions during perception of audiovisual speech and nonspeech signals: an fMRI study.

PubMed

Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

2011-01-01

During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream--prior to its fusion with auditory phonological features [Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. Time course of early audiovisual interactions during speech and non-speech central-auditory processing: An MEG study. Journal of Cognitive Neuroscience, 21, 259-274, 2009]. Using functional magnetic resonance imaging, the present follow-up study aims to further elucidate the topographic distribution of visual-phonological operations and audiovisual (AV) interactions during speech perception. Ambiguous acoustic syllables--disambiguated to /pa/ or /ta/ by the visual channel (speaking face)--served as test materials, concomitant with various control conditions (nonspeech AV signals, visual-only and acoustic-only speech, and nonspeech stimuli). (i) Visual speech yielded an AV-subadditive activation of primary auditory cortex and the anterior superior temporal gyrus (STG), whereas the posterior STG responded both to speech and nonspeech motion. (ii) The inferior frontal and the fusiform gyrus of the right hemisphere showed a strong phonetic/phonological impact (differential effects of visual /pa/ vs. /ta/) upon hemodynamic activation during presentation of speaking faces. Taken together with the previous MEG data, these results point at a dual-pathway model of visual speech information processing: On the one hand, access to the auditory system via the anterior supratemporal “what" path may give rise to direct activation of "auditory objects." On the other hand, visual speech information seems to be represented in a right-hemisphere visual working memory, providing a potential basis for later interactions with auditory information such as the McGurk effect.
Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

ERIC Educational Resources Information Center

Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

2016-01-01

Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…
Role of Visual Speech in Phonological Processing by Children with Hearing Loss

ERIC Educational Resources Information Center

Jerger, Susan; Tye-Murray, Nancy; Abdi, Herve

2009-01-01

Purpose: This research assessed the influence of visual speech on phonological processing by children with hearing loss (HL). Method: Children with HL and children with normal hearing (NH) named pictures while attempting to ignore auditory or audiovisual speech distractors whose onsets relative to the pictures were either congruent, conflicting in…
Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis

PubMed Central

Altieri, Nicholas; Wenger, Michael J.

2013-01-01

Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of −12 dB, and S/N ratio of −18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity. PMID:24058358
Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis.

PubMed

Altieri, Nicholas; Wenger, Michael J

2013-01-01

Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of -12 dB, and S/N ratio of -18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity.

Brain responses and looking behavior during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life

PubMed Central

Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Potton, Anita; Birtles, Deidre; Frostick, Caroline; Moore, Derek G.

2013-01-01

The use of visual cues during the processing of audiovisual (AV) speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6–9 months to 14–16 months of age. We used eye-tracking to examine whether individual differences in visual attention during AV processing of speech in 6–9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6–9 month old infants also participated in an event-related potential (ERP) AV task within the same experimental session. Language development was then followed-up at the age of 14–16 months, using two measures of language development, the Preschool Language Scale and the Oxford Communicative Development Inventory. The results show that those infants who were less efficient in auditory speech processing at the age of 6–9 months had lower receptive language scores at 14–16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audiovisually incongruent stimuli at 6–9 months were both significantly associated with language development at 14–16 months. These findings add to the understanding of individual differences in neural signatures of AV processing and associated looking behavior in infants. PMID:23882240
Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

PubMed

Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha

2015-03-01

While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed more by visual distractors when confronted with incongruent audiovisual stimuli. To cope with this multimodal conflict, CI users activate the left inferior frontal gyrus to adopt a top-down cognitive modulation pathway, whereas normal hearing individuals primarily adopt a bottom-up strategy.
Audiovisual Speech Perception and Eye Gaze Behavior of Adults with Asperger Syndrome

ERIC Educational Resources Information Center

Saalasti, Satu; Katsyri, Jari; Tiippana, Kaisa; Laine-Hernandez, Mari; von Wendt, Lennart; Sams, Mikko

2012-01-01

Audiovisual speech perception was studied in adults with Asperger syndrome (AS), by utilizing the McGurk effect, in which conflicting visual articulation alters the perception of heard speech. The AS group perceived the audiovisual stimuli differently from age, sex and IQ matched controls. When a voice saying /p/ was presented with a face…
The influence of visual and auditory information on the perception of speech and non-speech oral movements in patients with left hemisphere lesions.

PubMed

Schmid, Gabriele; Thielmann, Anke; Ziegler, Wolfram

2009-03-01

Patients with lesions of the left hemisphere often suffer from oral-facial apraxia, apraxia of speech, and aphasia. In these patients, visual features often play a critical role in speech and language therapy, when pictured lip shapes or the therapist's visible mouth movements are used to facilitate speech production and articulation. This demands audiovisual processing both in speech and language treatment and in the diagnosis of oral-facial apraxia. The purpose of this study was to investigate differences in audiovisual perception of speech as compared to non-speech oral gestures. Bimodal and unimodal speech and non-speech items were used and additionally discordant stimuli constructed, which were presented for imitation. This study examined a group of healthy volunteers and a group of patients with lesions of the left hemisphere. Patients made substantially more errors than controls, but the factors influencing imitation accuracy were more or less the same in both groups. Error analyses in both groups suggested different types of representations for speech as compared to the non-speech domain, with speech having a stronger weight on the auditory modality and non-speech processing on the visual modality. Additionally, this study was able to show that the McGurk effect is not limited to speech.
Distributed neural signatures of natural audiovisual speech and music in the human auditory cortex.

PubMed

Salmi, Juha; Koistinen, Olli-Pekka; Glerean, Enrico; Jylänki, Pasi; Vehtari, Aki; Jääskeläinen, Iiro P; Mäkelä, Sasu; Nummenmaa, Lauri; Nummi-Kuisma, Katarina; Nummi, Ilari; Sams, Mikko

2017-08-15

During a conversation or when listening to music, auditory and visual information are combined automatically into audiovisual objects. However, it is still poorly understood how specific type of visual information shapes neural processing of sounds in lifelike stimulus environments. Here we applied multi-voxel pattern analysis to investigate how naturally matching visual input modulates supratemporal cortex activity during processing of naturalistic acoustic speech, singing and instrumental music. Bayesian logistic regression classifiers with sparsity-promoting priors were trained to predict whether the stimulus was audiovisual or auditory, and whether it contained piano playing, speech, or singing. The predictive performances of the classifiers were tested by leaving one participant at a time for testing and training the model using the remaining 15 participants. The signature patterns associated with unimodal auditory stimuli encompassed distributed locations mostly in the middle and superior temporal gyrus (STG/MTG). A pattern regression analysis, based on a continuous acoustic model, revealed that activity in some of these MTG and STG areas were associated with acoustic features present in speech and music stimuli. Concurrent visual stimulus modulated activity in bilateral MTG (speech), lateral aspect of right anterior STG (singing), and bilateral parietal opercular cortex (piano). Our results suggest that specific supratemporal brain areas are involved in processing complex natural speech, singing, and piano playing, and other brain areas located in anterior (facial speech) and posterior (music-related hand actions) supratemporal cortex are influenced by related visual information. Those anterior and posterior supratemporal areas have been linked to stimulus identification and sensory-motor integration, respectively. Copyright © 2017 Elsevier Inc. All rights reserved.
Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception

ERIC Educational Resources Information Center

Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.

2016-01-01

Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…
Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

ERIC Educational Resources Information Center

Pilling, Michael; Thomas, Sharon

2011-01-01

Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…
Multisensory and modality specific processing of visual speech in different regions of the premotor cortex

PubMed Central

Callan, Daniel E.; Jones, Jeffery A.; Callan, Akiko

2014-01-01

Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex (PMC) has been shown to be active during both observation and execution of action (“Mirror System” properties), and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI) study, participants identified vowels produced by a speaker in audio-visual (saw the speaker's articulating face and heard her voice), visual only (only saw the speaker's articulating face), and audio only (only heard the speaker's voice) conditions with varying audio signal-to-noise ratios in order to determine the regions of the PMC involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the functional magnetic resonance imaging (fMRI) analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and PMC. The left ventral inferior premotor cortex (PMvi) showed properties of multimodal (audio-visual) enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex (PMvs/PMd) did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the PMC are involved with mapping unimodal (in this case visual) sensory features of the speech signal with articulatory speech gestures. PMID:24860526
No, There Is No 150 ms Lead of Visual Speech on Auditory Speech, but a Range of Audiovisual Asynchronies Varying from Small Audio Lead to Large Audio Lag

PubMed Central

Schwartz, Jean-Luc; Savariaux, Christophe

2014-01-01

An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction. PMID:25079216
Audiovisual materials are effective for enhancing the correction of articulation disorders in children with cleft palate.

PubMed

Pamplona, María Del Carmen; Ysunza, Pablo Antonio; Morales, Santiago

2017-02-01

Children with cleft palate frequently show speech disorders known as compensatory articulation. Compensatory articulation requires a prolonged period of speech intervention that should include reinforcement at home. However, frequently relatives do not know how to work with their children at home. To study whether the use of audiovisual materials especially designed for complementing speech pathology treatment in children with compensatory articulation can be effective for stimulating articulation practice at home and consequently enhancing speech normalization in children with cleft palate. Eighty-two patients with compensatory articulation were studied. Patients were randomly divided into two groups. Both groups received speech pathology treatment aimed to correct articulation placement. In addition, patients from the active group received a set of audiovisual materials to be used at home. Parents were instructed about strategies and ideas about how to use the materials with their children. Severity of compensatory articulation was compared at the onset and at the end of the speech intervention. After the speech therapy period, the group of patients using audiovisual materials at home demonstrated significantly greater improvement in articulation, as compared with the patients receiving speech pathology treatment on - site without audiovisual supporting materials. The results of this study suggest that audiovisual materials especially designed for practicing adequate articulation placement at home can be effective for reinforcing and enhancing speech pathology treatment of patients with cleft palate and compensatory articulation. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Perception of the multisensory coherence of fluent audiovisual speech in infancy: its emergence and the role of experience.

PubMed

Lewkowicz, David J; Minar, Nicholas J; Tift, Amy H; Brandon, Melissa

2015-02-01

To investigate the developmental emergence of the perception of the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8- to 10-, and 12- to 14-month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor 8- to 10-month-old infants exhibited audiovisual matching in that they did not look longer at the matching monologue. In contrast, the 12- to 14-month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, perceived the multisensory coherence of native-language monologues earlier in the test trials than that of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12- to 14-month-olds did not depend on audiovisual synchrony, whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audiovisual synchrony cues are more important in the perception of the multisensory coherence of non-native speech than that of native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. Copyright © 2014 Elsevier Inc. All rights reserved.
Differential Gaze Patterns on Eyes and Mouth During Audiovisual Speech Segmentation

PubMed Central

Lusk, Laina G.; Mitchel, Aaron D.

2016-01-01

Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel and Weiss, 2014) established that adults can utilize facial cues (i.e., visual prosody) to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014). Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation. PMID:26869959
Common variation in the autism risk gene CNTNAP2, brain structural connectivity and multisensory speech integration.

PubMed

Ross, Lars A; Del Bene, Victor A; Molholm, Sophie; Jae Woo, Young; Andrade, Gizely N; Abrahams, Brett S; Foxe, John J

2017-11-01

Three lines of evidence motivated this study. 1) CNTNAP2 variation is associated with autism risk and speech-language development. 2) CNTNAP2 variations are associated with differences in white matter (WM) tracts comprising the speech-language circuitry. 3) Children with autism show impairment in multisensory speech perception. Here, we asked whether an autism risk-associated CNTNAP2 single nucleotide polymorphism in neurotypical adults was associated with multisensory speech perception performance, and whether such a genotype-phenotype association was mediated through white matter tract integrity in speech-language circuitry. Risk genotype at rs7794745 was associated with decreased benefit from visual speech and lower fractional anisotropy (FA) in several WM tracts (right precentral gyrus, left anterior corona radiata, right retrolenticular internal capsule). These structural connectivity differences were found to mediate the effect of genotype on audiovisual speech perception, shedding light on possible pathogenic pathways in autism and biological sources of inter-individual variation in audiovisual speech processing in neurotypicals. Copyright © 2017 Elsevier Inc. All rights reserved.
Audio-Visual Speech Perception Is Special

ERIC Educational Resources Information Center

Tuomainen, J.; Andersen, T.S.; Tiippana, K.; Sams, M.

2005-01-01

In face-to-face conversation speech is perceived by ear and eye. We studied the prerequisites of audio-visual speech perception by using perceptually ambiguous sine wave replicas of natural speech as auditory stimuli. When the subjects were not aware that the auditory stimuli were speech, they showed only negligible integration of auditory and…
Lip movements affect infants' audiovisual speech perception.

PubMed

Yeung, H Henny; Werker, Janet F

2013-05-01

Speech is robustly audiovisual from early in infancy. Here we show that audiovisual speech perception in 4.5-month-old infants is influenced by sensorimotor information related to the lip movements they make while chewing or sucking. Experiment 1 consisted of a classic audiovisual matching procedure, in which two simultaneously displayed talking faces (visual [i] and [u]) were presented with a synchronous vowel sound (audio /i/ or /u/). Infants' looking patterns were selectively biased away from the audiovisual matching face when the infants were producing lip movements similar to those needed to produce the heard vowel. Infants' looking patterns returned to those of a baseline condition (no lip movements, looking longer at the audiovisual matching face) when they were producing lip movements that did not match the heard vowel. Experiment 2 confirmed that these sensorimotor effects interacted with the heard vowel, as looking patterns differed when infants produced these same lip movements while seeing and hearing a talking face producing an unrelated vowel (audio /a/). These findings suggest that the development of speech perception and speech production may be mutually informative.
Audio-Visual Speech in Noise Perception in Dyslexia

ERIC Educational Resources Information Center

van Laarhoven, Thijs; Keetels, Mirjam; Schakel, Lemmy; Vroomen, Jean

2018-01-01

Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and adults with DD. We found that children with a…
The shadow of a doubt? Evidence for perceptuo-motor linkage during auditory and audiovisual close-shadowing

PubMed Central

Scarbel, Lucie; Beautemps, Denis; Schwartz, Jean-Luc; Sato, Marc

2014-01-01

One classical argument in favor of a functional role of the motor system in speech perception comes from the close-shadowing task in which a subject has to identify and to repeat as quickly as possible an auditory speech stimulus. The fact that close-shadowing can occur very rapidly and much faster than manual identification of the speech target is taken to suggest that perceptually induced speech representations are already shaped in a motor-compatible format. Another argument is provided by audiovisual interactions often interpreted as referring to a multisensory-motor framework. In this study, we attempted to combine these two paradigms by testing whether the visual modality could speed motor response in a close-shadowing task. To this aim, both oral and manual responses were evaluated during the perception of auditory and audiovisual speech stimuli, clear or embedded in white noise. Overall, oral responses were faster than manual ones, but it also appeared that they were less accurate in noise, which suggests that motor representations evoked by the speech input could be rough at a first processing stage. In the presence of acoustic noise, the audiovisual modality led to both faster and more accurate responses than the auditory modality. No interaction was however, observed between modality and response. Altogether, these results are interpreted within a two-stage sensory-motor framework, in which the auditory and visual streams are integrated together and with internally generated motor representations before a final decision may be available. PMID:25009512
Auditory perception in the aging brain: the role of inhibition and facilitation in early processing.

PubMed

Stothart, George; Kazanina, Nina

2016-11-01

Aging affects the interplay between peripheral and cortical auditory processing. Previous studies have demonstrated that older adults are less able to regulate afferent sensory information and are more sensitive to distracting information. Using auditory event-related potentials we investigated the role of cortical inhibition on auditory and audiovisual processing in younger and older adults. Across puretone, auditory and audiovisual speech paradigms older adults showed a consistent pattern of inhibitory deficits, manifested as increased P50 and/or N1 amplitudes and an absent or significantly reduced N2. Older adults were still able to use congruent visual articulatory information to aid auditory processing but appeared to require greater neural effort to resolve conflicts generated by incongruent visual information. In combination, the results provide support for the Inhibitory Deficit Hypothesis of aging. They extend previous findings into the audiovisual domain and highlight older adults' ability to benefit from congruent visual information during speech processing. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Audiovisual speech facilitates voice learning.

PubMed

Sheffert, Sonya M; Olson, Elizabeth

2004-02-01

In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.
Audiovisual Asynchrony Detection in Human Speech

ERIC Educational Resources Information Center

Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

2011-01-01

Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex.

PubMed

Rhone, Ariane E; Nourski, Kirill V; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A; McMurray, Bob

In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.
Audiovisual Perception of Noise Vocoded Speech in Dyslexic and Non-Dyslexic Adults: The Role of Low-Frequency Visual Modulations

ERIC Educational Resources Information Center

Megnin-Viggars, Odette; Goswami, Usha

2013-01-01

Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…
Sensitivity to audio-visual synchrony and its relation to language abilities in children with and without ASD.

PubMed

Righi, Giulia; Tenenbaum, Elena J; McCormick, Carolyn; Blossom, Megan; Amso, Dima; Sheinkopf, Stephen J

2018-04-01

Autism Spectrum Disorder (ASD) is often accompanied by deficits in speech and language processing. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to examine whether young children with ASD show reduced sensitivity to temporal asynchronies in a speech processing task when compared to typically developing controls, and to examine how this sensitivity might relate to language proficiency. Using automated eye tracking methods, we found that children with ASD failed to demonstrate sensitivity to asynchronies of 0.3s, 0.6s, or 1.0s between a video of a woman speaking and the corresponding audio track. In contrast, typically developing children who were language-matched to the ASD group, were sensitive to both 0.6s and 1.0s asynchronies. We also demonstrated that individual differences in sensitivity to audiovisual asynchronies and individual differences in orientation to relevant facial features were both correlated with scores on a standardized measure of language abilities. Results are discussed in the context of attention to visual language and audio-visual processing as potential precursors to language impairment in ASD. Autism Res 2018, 11: 645-653. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to explore whether children with ASD process audio-visual synchrony in ways comparable to their typically developing peers, and the relationship between preference for synchrony and language ability. Results showed that there are differences in attention to audiovisual synchrony between typically developing children and children with ASD. Preference for synchrony was related to the language abilities of children across groups. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
Audiovisual speech perception in infancy: The influence of vowel identity and infants' productive abilities on sensitivity to (mis)matches between auditory and visual speech cues.

PubMed

Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

2016-02-01

Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds on their ability to detect mismatches between concurrently presented auditory and visual vowels and related their performance to their productive abilities and later vocabulary size. Results show that infants' ability to detect mismatches between auditory and visually presented vowels differs depending on the vowels involved. Furthermore, infants' sensitivity to mismatches is modulated by their current articulatory knowledge and correlates with their vocabulary size at 12 months of age. This suggests that-aside from infants' ability to match nonnative audiovisual cues (Pons et al., 2009)-their ability to match native auditory and visual cues continues to develop during the first year of life. Our findings point to a potential role of salient vowel cues and productive abilities in the development of audiovisual speech perception, and further indicate a relation between infants' early sensitivity to audiovisual speech cues and their later language development. PsycINFO Database Record (c) 2016 APA, all rights reserved.
Normative Data on Audiovisual Speech Integration Using Sentence Recognition and Capacity Measures

PubMed Central

Altieri, Nicholas; Hudock, Daniel

2016-01-01

Objective The ability to use visual speech cues and integrate them with auditory information is important, especially in noisy environments and for hearing-impaired (HI) listeners. Providing data on measures of integration skills that encompass accuracy and processing speed will benefit researchers and clinicians. Design The study consisted of two experiments: First, accuracy scores were obtained using CUNY sentences, and capacity measures that assessed reaction-time distributions were obtained from a monosyllabic word recognition task. Study Sample We report data on two measures of integration obtained from a sample comprised of 86 young and middle-age adult listeners: Results To summarize our results, capacity showed a positive correlation with accuracy measures of audiovisual benefit obtained from sentence recognition. More relevant, factor analysis indicated that a single-factor model captured audiovisual speech integration better than models containing more factors. Capacity exhibited strong loadings on the factor, while the accuracy-based measures from sentence recognition exhibited weaker loadings. Conclusions Results suggest that a listener’s integration skills may be assessed optimally using a measure that incorporates both processing speed and accuracy. PMID:26853446
Normative data on audiovisual speech integration using sentence recognition and capacity measures.

PubMed

Altieri, Nicholas; Hudock, Daniel

2016-01-01

The ability to use visual speech cues and integrate them with auditory information is important, especially in noisy environments and for hearing-impaired (HI) listeners. Providing data on measures of integration skills that encompass accuracy and processing speed will benefit researchers and clinicians. The study consisted of two experiments: First, accuracy scores were obtained using City University of New York (CUNY) sentences, and capacity measures that assessed reaction-time distributions were obtained from a monosyllabic word recognition task. We report data on two measures of integration obtained from a sample comprised of 86 young and middle-age adult listeners: To summarize our results, capacity showed a positive correlation with accuracy measures of audiovisual benefit obtained from sentence recognition. More relevant, factor analysis indicated that a single-factor model captured audiovisual speech integration better than models containing more factors. Capacity exhibited strong loadings on the factor, while the accuracy-based measures from sentence recognition exhibited weaker loadings. Results suggest that a listener's integration skills may be assessed optimally using a measure that incorporates both processing speed and accuracy.
Mismatch negativity evoked by the McGurk-MacDonald effect: a phonetic representation within short-term memory.

PubMed

Colin, C; Radeau, M; Soquet, A; Demolin, D; Colin, F; Deltenre, P

2002-04-01

The McGurk-MacDonald illusory percept is obtained by dubbing an incongruent articulatory movement on an auditory phoneme. This type of audiovisual speech perception contributes to the assessment of theories of speech perception. The mismatch negativity (MMN) reflects the detection of a deviant stimulus within the auditory short-term memory and besides an acoustic component, possesses, under certain conditions, a phonetic one. The present study assessed the existence of an MMN evoked by McGurk-MacDonald percepts elicited by audiovisual stimuli with constant auditory components. Cortical evoked potentials were recorded using the oddball paradigm on 8 adults in 3 experimental conditions: auditory alone, visual alone and audiovisual stimulation. Obtaining illusory percepts was confirmed in an additional psychophysical condition. The auditory deviant syllables and the audiovisual incongruent syllables elicited a significant MMN at Fz. In the visual condition, no negativity was observed either at Fz, or at O(z). An MMN can be evoked by visual articulatory deviants, provided they are presented in a suitable auditory context leading to a phonetically significant interaction. The recording of an MMN elicited by illusory McGurk percepts suggests that audiovisual integration mechanisms in speech take place rather early during the perceptual processes.
Neurophysiological evidence for the interplay of speech segmentation and word-referent mapping during novel word learning.

PubMed

François, Clément; Cunillera, Toni; Garcia, Enara; Laine, Matti; Rodriguez-Fornells, Antoni

2017-04-01

Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning. Copyright © 2016 Elsevier Ltd. All rights reserved.
Involvement of Right STS in Audio-Visual Integration for Affective Speech Demonstrated Using MEG

PubMed Central

Hagan, Cindy C.; Woods, Will; Johnson, Sam; Green, Gary G. R.; Young, Andrew W.

2013-01-01

Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals. PMID:23950977
Involvement of right STS in audio-visual integration for affective speech demonstrated using MEG.

PubMed

Hagan, Cindy C; Woods, Will; Johnson, Sam; Green, Gary G R; Young, Andrew W

2013-01-01

Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals.
Perception of the Multisensory Coherence of Fluent Audiovisual Speech in Infancy: Its Emergence & the Role of Experience

PubMed Central

Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa

2014-01-01

To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038
Immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention: a mismatch negativity study.

PubMed

Li, X; Yang, Y; Ren, G

2009-06-16

Language is often perceived together with visual information. Recent experimental evidences indicated that, during spoken language comprehension, the brain can immediately integrate visual information with semantic or syntactic information from speech. Here we used the mismatch negativity to further investigate whether prosodic information from speech could be immediately integrated into a visual scene context or not, and especially the time course and automaticity of this integration process. Sixteen Chinese native speakers participated in the study. The materials included Chinese spoken sentences and picture pairs. In the audiovisual situation, relative to the concomitant pictures, the spoken sentence was appropriately accented in the standard stimuli, but inappropriately accented in the two kinds of deviant stimuli. In the purely auditory situation, the speech sentences were presented without pictures. It was found that the deviants evoked mismatch responses in both audiovisual and purely auditory situations; the mismatch negativity in the purely auditory situation peaked at the same time as, but was weaker than that evoked by the same deviant speech sounds in the audiovisual situation. This pattern of results suggested immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention.
Temporal event structure and timing in schizophrenia: preserved binding in a longer "now".

PubMed

Martin, Brice; Giersch, Anne; Huron, Caroline; van Wassenhove, Virginie

2013-01-01

Patients with schizophrenia experience a loss of temporal continuity or subjective fragmentation along the temporal dimension. Here, we develop the hypothesis that impaired temporal awareness results from a perturbed structuring of events in time-i.e., canonical neural dynamics. To address this, 26 patients and their matched controls took part in two psychophysical studies using desynchronized audiovisual speech. Two tasks were used and compared: first, an identification task testing for multisensory binding impairments in which participants reported what they heard while looking at a speaker's face; in a second task, we tested the perceived simultaneity of the same audiovisual speech stimuli. In both tasks, we used McGurk fusion and combination that are classic ecologically valid multisensory illusions. First, and contrary to previous reports, our results show that patients do not significantly differ from controls in their rate of illusory reports. Second, the illusory reports of patients in the identification task were more sensitive to audiovisual speech desynchronies than those of controls. Third, and surprisingly, patients considered audiovisual speech to be synchronized for longer delays than controls. As such, the temporal tolerance profile observed in a temporal judgement task was less of a predictor for sensory binding in schizophrenia than for that obtained in controls. We interpret our results as an impairment of temporal event structuring in schizophrenia which does not specifically affect sensory binding operations but rather, the explicit access to timing information associated here with audiovisual speech processing. Our findings are discussed in the context of curent neurophysiological frameworks for the binding and the structuring of sensory events in time. Copyright © 2012 Elsevier Ltd. All rights reserved.
Electrophysiological evidence for Audio-visuo-lingual speech integration.

PubMed

Treille, Avril; Vilain, Coriandre; Schwartz, Jean-Luc; Hueber, Thomas; Sato, Marc

2018-01-31

Recent neurophysiological studies demonstrate that audio-visual speech integration partly operates through temporal expectations and speech-specific predictions. From these results, one common view is that the binding of auditory and visual, lipread, speech cues relies on their joint probability and prior associative audio-visual experience. The present EEG study examined whether visual tongue movements integrate with relevant speech sounds, despite little associative audio-visual experience between the two modalities. A second objective was to determine possible similarities and differences of audio-visual speech integration between unusual audio-visuo-lingual and classical audio-visuo-labial modalities. To this aim, participants were presented with auditory, visual, and audio-visual isolated syllables, with the visual presentation related to either a sagittal view of the tongue movements or a facial view of the lip movements of a speaker, with lingual and facial movements previously recorded by an ultrasound imaging system and a video camera. In line with previous EEG studies, our results revealed an amplitude decrease and a latency facilitation of P2 auditory evoked potentials in both audio-visual-lingual and audio-visuo-labial conditions compared to the sum of unimodal conditions. These results argue against the view that auditory and visual speech cues solely integrate based on prior associative audio-visual perceptual experience. Rather, they suggest that dynamic and phonetic informational cues are sharable across sensory modalities, possibly through a cross-modal transfer of implicit articulatory motor knowledge. Copyright © 2017 Elsevier Ltd. All rights reserved.
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation

PubMed Central

Banks, Briony; Gowen, Emma; Munro, Kevin J.; Adank, Patti

2015-01-01

Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker’s facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants’ eye gaze was recorded to verify that they looked at the speaker’s face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation. PMID:26283946
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

PubMed

Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

2015-01-01

Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.
An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception

PubMed Central

Altieri, Nicholas; Townsend, James T.

2011-01-01

Research has shown that visual speech perception can assist accuracy in identification of spoken words. However, little is known about the dynamics of the processing mechanisms involved in audiovisual integration. In particular, architecture and capacity, measured using response time methodologies, have not been investigated. An issue related to architecture concerns whether the auditory and visual sources of the speech signal are integrated “early” or “late.” We propose that “early” integration most naturally corresponds to coactive processing whereas “late” integration corresponds to separate decisions parallel processing. We implemented the double factorial paradigm in two studies. First, we carried out a pilot study using a two-alternative forced-choice discrimination task to assess architecture, decision rule, and provide a preliminary assessment of capacity (integration efficiency). Next, Experiment 1 was designed to specifically assess audiovisual integration efficiency in an ecologically valid way by including lower auditory S/N ratios and a larger response set size. Results from the pilot study support a separate decisions parallel, late integration model. Results from both studies showed that capacity was severely limited for high auditory signal-to-noise ratios. However, Experiment 1 demonstrated that capacity improved as the auditory signal became more degraded. This evidence strongly suggests that integration efficiency is vitally affected by the S/N ratio. PMID:21980314
Bilingualism affects audiovisual phoneme identification

PubMed Central

Burfin, Sabine; Pascalis, Olivier; Ruiz Tada, Elisa; Costa, Albert; Savariaux, Christophe; Kandel, Sonia

2014-01-01

We all go through a process of perceptual narrowing for phoneme identification. As we become experts in the languages we hear in our environment we lose the ability to identify phonemes that do not exist in our native phonological inventory. This research examined how linguistic experience—i.e., the exposure to a double phonological code during childhood—affects the visual processes involved in non-native phoneme identification in audiovisual speech perception. We conducted a phoneme identification experiment with bilingual and monolingual adult participants. It was an ABX task involving a Bengali dental-retroflex contrast that does not exist in any of the participants' languages. The phonemes were presented in audiovisual (AV) and audio-only (A) conditions. The results revealed that in the audio-only condition monolinguals and bilinguals had difficulties in discriminating the retroflex non-native phoneme. They were phonologically “deaf” and assimilated it to the dental phoneme that exists in their native languages. In the audiovisual presentation instead, both groups could overcome the phonological deafness for the retroflex non-native phoneme and identify both Bengali phonemes. However, monolinguals were more accurate and responded quicker than bilinguals. This suggests that bilinguals do not use the same processes as monolinguals to decode visual speech. PMID:25374551
Infant Perception of Audio-Visual Speech Synchrony

ERIC Educational Resources Information Center

Lewkowicz, David J.

2010-01-01

Three experiments investigated perception of audio-visual (A-V) speech synchrony in 4- to 10-month-old infants. Experiments 1 and 2 used a convergent-operations approach by habituating infants to an audiovisually synchronous syllable (Experiment 1) and then testing for detection of increasing degrees of A-V asynchrony (366, 500, and 666 ms) or by…
Audio Visual Integration with Competing Sources in the Framework of Audio Visual Speech Scene Analysis.

PubMed

Ganesh, Attigodu Chandrashekara; Berthommier, Frédéric; Schwartz, Jean-Luc

2016-01-01

We introduce "Audio-Visual Speech Scene Analysis" (AVSSA) as an extension of the two-stage Auditory Scene Analysis model towards audiovisual scenes made of mixtures of speakers. AVSSA assumes that a coherence index between the auditory and the visual input is computed prior to audiovisual fusion, enabling to determine whether the sensory inputs should be bound together. Previous experiments on the modulation of the McGurk effect by audiovisual coherent vs. incoherent contexts presented before the McGurk target have provided experimental evidence supporting AVSSA. Indeed, incoherent contexts appear to decrease the McGurk effect, suggesting that they produce lower audiovisual coherence hence less audiovisual fusion. The present experiments extend the AVSSA paradigm by creating contexts made of competing audiovisual sources and measuring their effect on McGurk targets. The competing audiovisual sources have respectively a high and a low audiovisual coherence (that is, large vs. small audiovisual comodulations in time). The first experiment involves contexts made of two auditory sources and one video source associated to either the first or the second audio source. It appears that the McGurk effect is smaller after the context made of the visual source associated to the auditory source with less audiovisual coherence. In the second experiment with the same stimuli, the participants are asked to attend to either one or the other source. The data show that the modulation of fusion depends on the attentional focus. Altogether, these two experiments shed light on audiovisual binding, the AVSSA process and the role of attention.

Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children.

PubMed

Huyse, Aurélie; Berthommier, Frédéric; Leybaert, Jacqueline

2013-01-01

The aim of the present study was to examine audiovisual speech integration in cochlear-implanted children and in normally hearing children exposed to degraded auditory stimuli. Previous studies have shown that speech perception in cochlear-implanted users is biased toward the visual modality when audition and vision provide conflicting information. Our main question was whether an experimentally designed degradation of the visual speech cue would increase the importance of audition in the response pattern. The impact of auditory proficiency was also investigated. A group of 31 children with cochlear implants and a group of 31 normally hearing children matched for chronological age were recruited. All children with cochlear implants had profound congenital deafness and had used their implants for at least 2 years. Participants had to perform an /aCa/ consonant-identification task in which stimuli were presented randomly in three conditions: auditory only, visual only, and audiovisual (congruent and incongruent McGurk stimuli). In half of the experiment, the visual speech cue was normal; in the other half (visual reduction) a degraded visual signal was presented, aimed at preventing lipreading of good quality. The normally hearing children received a spectrally reduced speech signal (simulating the input delivered by the cochlear implant). First, performance in visual-only and in congruent audiovisual modalities were decreased, showing that the visual reduction technique used here was efficient at degrading lipreading. Second, in the incongruent audiovisual trials, visual reduction led to a major increase in the number of auditory based responses in both groups. Differences between proficient and nonproficient children were found in both groups, with nonproficient children's responses being more visual and less auditory than those of proficient children. Further analysis revealed that differences between visually clear and visually reduced conditions and between groups were not only because of differences in unisensory perception but also because of differences in the process of audiovisual integration per se. Visual reduction led to an increase in the weight of audition, even in cochlear-implanted children, whose perception is generally dominated by vision. This result suggests that the natural bias in favor of vision is not immutable. Audiovisual speech integration partly depends on the experimental situation, which modulates the informational content of the sensory channels and the weight that is awarded to each of them. Consequently, participants, whether deaf with cochlear implants or having normal hearing, not only base their perception on the most reliable modality but also award it an additional weight.
Multisensory integration of speech sounds with letters vs. visual speech: only visual speech induces the mismatch negativity.

PubMed

Stekelenburg, Jeroen J; Keetels, Mirjam; Vroomen, Jean

2018-05-01

Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect. Here, we examined whether written text, like visual speech, can induce an illusory change in the perception of speech sounds on both the behavioural and neural levels. In a sound categorization task, we found that both text and visual speech changed the identity of speech sounds from an /aba/-/ada/ continuum, but the size of this audiovisual effect was considerably smaller for text than visual speech. To examine at which level in the information processing hierarchy these multisensory interactions occur, we recorded electroencephalography in an audiovisual mismatch negativity (MMN, a component of the event-related potential reflecting preattentive auditory change detection) paradigm in which deviant text or visual speech was used to induce an illusory change in a sequence of ambiguous sounds halfway between /aba/ and /ada/. We found that only deviant visual speech induced an MMN, but not deviant text, which induced a late P3-like positive potential. These results demonstrate that text has much weaker effects on sound processing than visual speech does, possibly because text has different biological roots than visual speech. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

ERIC Educational Resources Information Center

Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

2013-01-01

Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…
Neural Development of Networks for Audiovisual Speech Comprehension

ERIC Educational Resources Information Center

Dick, Anthony Steven; Solodkin, Ana; Small, Steven L.

2010-01-01

Everyday conversation is both an auditory and a visual phenomenon. While visual speech information enhances comprehension for the listener, evidence suggests that the ability to benefit from this information improves with development. A number of brain regions have been implicated in audiovisual speech comprehension, but the extent to which the…
How Children and Adults Produce and Perceive Uncertainty in Audiovisual Speech

ERIC Educational Resources Information Center

Krahmer, Emiel; Swerts, Marc

2005-01-01

We describe two experiments on signaling and detecting uncertainty in audiovisual speech by adults and children. In the first study, utterances from adult speakers and child speakers (aged 7-8) were elicited and annotated with a set of six audiovisual features. It was found that when adult speakers were uncertain they were more likely to produce…
Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

PubMed

Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

2018-06-01

Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
Audiovisual perception in amblyopia: A review and synthesis.

PubMed

Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

2018-05-17

Amblyopia is a common developmental sensory disorder that has been extensively and systematically investigated as a unisensory visual impairment. However, its effects are increasingly recognized to extend beyond vision to the multisensory domain. Indeed, amblyopia is associated with altered cross-modal interactions in audiovisual temporal perception, audiovisual spatial perception, and audiovisual speech perception. Furthermore, although the visual impairment in amblyopia is typically unilateral, the multisensory abnormalities tend to persist even when viewing with both eyes. Knowledge of the extent and mechanisms of the audiovisual impairments in amblyopia, however, remains in its infancy. This work aims to review our current understanding of audiovisual processing and integration deficits in amblyopia, and considers the possible mechanisms underlying these abnormalities. Copyright © 2018. Published by Elsevier Ltd.
Neural Processing of Congruent and Incongruent Audiovisual Speech in School-Age Children and Adults

ERIC Educational Resources Information Center

Heikkilä, Jenni; Tiippana, Kaisa; Loberg, Otto; Leppänen, Paavo H. T.

2018-01-01

Seeing articulatory gestures enhances speech perception. Perception of auditory speech can even be changed by incongruent visual gestures, which is known as the McGurk effect (e.g., dubbing a voice saying /mi/ onto a face articulating /ni/, observers often hear /ni/). In children, the McGurk effect is weaker than in adults, but no previous…
Cross-Modal Matching of Audio-Visual German and French Fluent Speech in Infancy

PubMed Central

Kubicek, Claudia; Hillairet de Boisferon, Anne; Dupierrix, Eve; Pascalis, Olivier; Lœvenbruck, Hélène; Gervain, Judit; Schwarzer, Gudrun

2014-01-01

The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants’ audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matching native as well as non-native audio and visual speech stimuli, whereas 6-month-olds perceived the audio-visual correspondence of native language stimuli only. This suggests that intersensory matching narrows for fluent speech between 4.5 and 6 months of age. In Experiment 2, auditory and visual speech information was presented simultaneously, therefore, providing temporal synchrony cues. Here, 6-month-olds were found to match native as well as non-native speech indicating facilitation of temporal synchrony cues on the intersensory perception of non-native fluent speech. Intriguingly, despite the fact that audio and visual stimuli cohered temporally, 12-month-olds matched the non-native language only. Results were discussed with regard to multisensory perceptual narrowing during the first year of life. PMID:24586651
Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech.

PubMed

Venezia, Jonathan H; Fillmore, Paul; Matchin, William; Isenberg, A Lisette; Hickok, Gregory; Fridriksson, Julius

2016-02-01

Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development. Copyright © 2015 Elsevier Inc. All rights reserved.
Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech

PubMed Central

Venezia, Jonathan H.; Fillmore, Paul; Matchin, William; Isenberg, A. Lisette; Hickok, Gregory; Fridriksson, Julius

2015-01-01

Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development. PMID:26608242
The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

PubMed

Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

2009-04-01

The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained from the subtitles. Speech comprehension improved even for relatively low ASR accuracy levels; for example, participants obtained about 2 dB SNR audiovisual benefit for ASR accuracies around 74%. Delaying the presentation of the text reduced the benefit and increased the listening effort. Participants with relatively low unimodal speech comprehension obtained greater benefit from the subtitles than participants with better unimodal speech comprehension. We observed an age-related decline in the working-memory capacity of the listeners with normal hearing. A higher age and a lower working memory capacity were associated with increased effort required to use the subtitles to improve speech comprehension. Participants were able to use partly incorrect and delayed subtitles to increase their comprehension of speech in noise, regardless of age and hearing loss. This supports the further development and evaluation of an assistive listening system that displays automatically recognized speech to aid speech comprehension by listeners with hearing impairment.
Keeping time in the brain: Autism spectrum disorder and audiovisual temporal processing.

PubMed

Stevenson, Ryan A; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Camarata, Stephen; Wallace, Mark T

2016-07-01

A growing area of interest and relevance in the study of autism spectrum disorder (ASD) focuses on the relationship between multisensory temporal function and the behavioral, perceptual, and cognitive impairments observed in ASD. Atypical sensory processing is becoming increasingly recognized as a core component of autism, with evidence of atypical processing across a number of sensory modalities. These deviations from typical processing underscore the value of interpreting ASD within a multisensory framework. Furthermore, converging evidence illustrates that these differences in audiovisual processing may be specifically related to temporal processing. This review seeks to bridge the connection between temporal processing and audiovisual perception, and to elaborate on emerging data showing differences in audiovisual temporal function in autism. We also discuss the consequence of such changes, the specific impact on the processing of different classes of audiovisual stimuli (e.g. speech vs. nonspeech, etc.), and the presumptive brain processes and networks underlying audiovisual temporal integration. Finally, possible downstream behavioral implications, and possible remediation strategies are outlined. Autism Res 2016, 9: 720-738. © 2015 International Society for Autism Research, Wiley Periodicals, Inc. © 2015 International Society for Autism Research, Wiley Periodicals, Inc.
Effects of Audio-Visual Information on the Intelligibility of Alaryngeal Speech

ERIC Educational Resources Information Center

Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline

2010-01-01

Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…
Audiovisual Speech Recalibration in Children

ERIC Educational Resources Information Center

van Linden, Sabine; Vroomen, Jean

2008-01-01

In order to examine whether children adjust their phonetic speech categories, children of two age groups, five-year-olds and eight-year-olds, were exposed to a video of a face saying /aba/ or /ada/ accompanied by an auditory ambiguous speech sound halfway between /b/ and /d/. The effect of exposure to these audiovisual stimuli was measured on…
An ALE meta-analysis on the audiovisual integration of speech signals.

PubMed

Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E

2014-11-01

The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.
The Influence of Phonetic Dimensions on Aphasic Speech Perception

ERIC Educational Resources Information Center

Hessler, Dorte; Jonkers, Roel; Bastiaanse, Roelien

2010-01-01

Individuals with aphasia have more problems detecting small differences between speech sounds than larger ones. This paper reports how phonemic processing is impaired and how this is influenced by speechreading. A non-word discrimination task was carried out with "audiovisual", "auditory only" and "visual only" stimulus display. Subjects had to…
[Ventriloquism and audio-visual integration of voice and face].

PubMed

Yokosawa, Kazuhiko; Kanaya, Shoko

2012-07-01

Presenting synchronous auditory and visual stimuli in separate locations creates the illusion that the sound originates from the direction of the visual stimulus. Participants' auditory localization bias, called the ventriloquism effect, has revealed factors affecting the perceptual integration of audio-visual stimuli. However, many studies on audio-visual processes have focused on performance in simplified experimental situations, with a single stimulus in each sensory modality. These results cannot necessarily explain our perceptual behavior in natural scenes, where various signals exist within a single sensory modality. In the present study we report the contributions of a cognitive factor, that is, the audio-visual congruency of speech, although this factor has often been underestimated in previous ventriloquism research. Thus, we investigated the contribution of speech congruency on the ventriloquism effect using a spoken utterance and two videos of a talking face. The salience of facial movements was also manipulated. As a result, when bilateral visual stimuli are presented in synchrony with a single voice, cross-modal speech congruency was found to have a significant impact on the ventriloquism effect. This result also indicated that more salient visual utterances attracted participants' auditory localization. The congruent pairing of audio-visual utterances elicited greater localization bias than did incongruent pairing, whereas previous studies have reported little dependency on the reality of stimuli in ventriloquism. Moreover, audio-visual illusory congruency, owing to the McGurk effect, caused substantial visual interference to auditory localization. This suggests that a greater flexibility in responding to multi-sensory environments exists than has been previously considered.
Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech.

PubMed

Alm, Magnus; Behne, Dawn

2013-10-01

Previous research indicates that perception of audio-visual (AV) synchrony changes in adulthood. Possible explanations for these age differences include a decline in hearing acuity, a decline in cognitive processing speed, and increased experience with AV binding. The current study aims to isolate the effect of AV experience by comparing synchrony judgments from 20 young adults (20 to 30 yrs) and 20 normal-hearing middle-aged adults (50 to 60 yrs), an age range for which a decline of cognitive processing speed is expected to be minimal. When presented with AV stop consonant syllables with asynchronies ranging from 440 ms audio-lead to 440 ms visual-lead, middle-aged adults showed significantly less tolerance for audio-lead than young adults. Middle-aged adults also showed a greater shift in their point of subjective simultaneity than young adults. Natural audio-lead asynchronies are arguably more predictable than natural visual-lead asynchronies, and this predictability may render audio-lead thresholds more prone to experience-related fine-tuning.
Dissociable Effects of Aging and Mild Cognitive Impairment on Bottom-Up Audiovisual Integration.

PubMed

Festa, Elena K; Katz, Andrew P; Ott, Brian R; Tremont, Geoffrey; Heindel, William C

2017-01-01

Effective audiovisual sensory integration involves dynamic changes in functional connectivity between superior temporal sulcus and primary sensory areas. This study examined whether disrupted connectivity in early Alzheimer's disease (AD) produces impaired audiovisual integration under conditions requiring greater corticocortical interactions. Audiovisual speech integration was examined in healthy young adult controls (YC), healthy elderly controls (EC), and patients with amnestic mild cognitive impairment (MCI) using McGurk-type stimuli (providing either congruent or incongruent audiovisual speech information) under conditions differing in the strength of bottom-up support and the degree of top-down lexical asymmetry. All groups accurately identified auditory speech under congruent audiovisual conditions, and displayed high levels of visual bias under strong bottom-up incongruent conditions. Under weak bottom-up incongruent conditions, however, EC and amnestic MCI groups displayed opposite patterns of performance, with enhanced visual bias in the EC group and reduced visual bias in the MCI group relative to the YC group. Moreover, there was no overlap between the EC and MCI groups in individual visual bias scores reflecting the change in audiovisual integration from the strong to the weak stimulus conditions. Top-down lexicality influences on visual biasing were observed only in the MCI patients under weaker bottom-up conditions. Results support a deficit in bottom-up audiovisual integration in early AD attributable to disruptions in corticocortical connectivity. Given that this deficit is not simply an exacerbation of changes associated with healthy aging, tests of audiovisual speech integration may serve as sensitive and specific markers of the earliest cognitive change associated with AD.

Auditory Event-Related Potentials (ERPs) in Audiovisual Speech Perception

ERIC Educational Resources Information Center

Pilling, Michael

2009-01-01

Purpose: It has recently been reported (e.g., V. van Wassenhove, K. W. Grant, & D. Poeppel, 2005) that audiovisual (AV) presented speech is associated with an N1/P2 auditory event-related potential (ERP) response that is lower in peak amplitude compared with the responses associated with auditory only (AO) speech. This effect was replicated.…
Narrative Processing in Typically Developing Children and Children with Early Unilateral Brain Injury: Seeing Gesture Matters

PubMed Central

Demir, Özlem Ece; Fisher, Joan A.; Goldin-Meadow, Susan; Levine, Susan C.

2014-01-01

Narrative skill in kindergarteners has been shown to be a reliable predictor of later reading comprehension and school achievement. However, we know little about how to scaffold children’s narrative skill. Here we examine whether the quality of kindergarten children’s narrative retellings depends on the kind of narrative elicitation they are given. We asked this question in typically developing (TD) kindergarten children and in children with pre- or perinatal unilateral brain injury (PL), a group that has been shown to have difficulty with narrative production. We compared children’s skill in story retellings under four different elicitation formats: (1) wordless cartoons, (2) stories told by a narrator through the auditory modality, (3) stories told by a narrator through the audiovisual modality without co-speech gestures, and (4) stories told by a narrator in the audiovisual modality with co-speech gestures. We found that children told better structured narratives in the fourth, audiovisual + gesture elicitation format than in the other three elicitation formats, consistent with findings that co-speech gestures can scaffold other aspects of language and memory. The audiovisual + gesture elicitation format was particularly beneficial to children who had the most difficulty telling a well-structured narrative, a group that included children with larger lesions associated with cerebrovascular infarcts. PMID:24127729
Visual speech alters the discrimination and identification of non-intact auditory speech in children with hearing loss.

PubMed

Jerger, Susan; Damian, Markus F; McAlpine, Rachel P; Abdi, Hervé

2017-03-01

Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/-B/aa or/-B/az). The items started with an easy-to-speechread/B/or difficult-to-speechread/G/onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/-B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same-as opposed to different-responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g.,/-B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz-as opposed to az- responses in the audiovisual than auditory mode. Performance in the audiovisual mode showed more same responses for the intact vs. non-intact different pairs (e.g., Baa:/-B/aa) and more intact onset responses for nonword repetition (Baz for/-B/az). Thus visual speech altered both discrimination and identification in the CHL-to a large extent for the/B/onsets but only minimally for the/G/onsets. The CHL identified the stimuli similarly to the CNH but did not discriminate the stimuli similarly. A bias-free measure of the children's discrimination skills (i.e., d' analysis) revealed that the CHL had greater difficulty discriminating intact from non-intact speech in both modes. As the degree of HL worsened, the ability to discriminate the intact vs. non-intact onsets in the auditory mode worsened. Discrimination ability in CHL significantly predicted their identification of the onsets-even after variation due to the other variables was controlled. These results clearly established that visual speech can fill in non-intact auditory speech, and this effect, in turn, made the non-intact onsets more difficult to discriminate from intact speech and more likely to be perceived as intact. Such results 1) demonstrate the value of visual speech at multiple levels of linguistic processing and 2) support intervention programs that view visual speech as a powerful asset for developing spoken language in CHL. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Visual Speech Alters the Discrimination and Identification of Non-Intact Auditory Speech in Children with Hearing Loss

PubMed Central

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Hervé

2017-01-01

Objectives Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Methods Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/–B/aa or /–B/az). The items started with an easy-to-speechread /B/ or difficult-to-speechread /G/ onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/–B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same—as opposed to different—responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g., /–B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz—as opposed to az— responses in the audiovisual than auditory mode. Results Performance in the audiovisual mode showed more same responses for the intact vs. non-intact different pairs (e.g., Baa:/–B/aa) and more intact onset responses for nonword repetition (Baz for/–B/az). Thus visual speech altered both discrimination and identification in the CHL—to a large extent for the /B/ onsets but only minimally for the /G/ onsets. The CHL identified the stimuli similarly to the CNH but did not discriminate the stimuli similarly. A bias-free measure of the children’s discrimination skills (i.e., d’ analysis) revealed that the CHL had greater difficulty discriminating intact from non-intact speech in both modes. As the degree of HL worsened, the ability to discriminate the intact vs. non-intact onsets in the auditory mode worsened. Discrimination ability in CHL significantly predicted their identification of the onsets—even after variation due to the other variables was controlled. Conclusions These results clearly established that visual speech can fill in non-intact auditory speech, and this effect, in turn, made the non-intact onsets more difficult to discriminate from intact speech and more likely to be perceived as intact. Such results 1) demonstrate the value of visual speech at multiple levels of linguistic processing and 2) support intervention programs that view visual speech as a powerful asset for developing spoken language in CHL. PMID:28167003
The McGurk effect in children with autism and Asperger syndrome.

PubMed

Bebko, James M; Schroeder, Jessica H; Weiss, Jonathan A

2014-02-01

Children with autism may have difficulties in audiovisual speech perception, which has been linked to speech perception and language development. However, little has been done to examine children with Asperger syndrome as a group on tasks assessing audiovisual speech perception, despite this group's often greater language skills. Samples of children with autism, Asperger syndrome, and Down syndrome, as well as a typically developing sample, were presented with an auditory-only condition, a speech-reading condition, and an audiovisual condition designed to elicit the McGurk effect. Children with autism demonstrated unimodal performance at the same level as the other groups, yet showed a lower rate of the McGurk effect compared with the Asperger, Down and typical samples. These results suggest that children with autism may have unique intermodal speech perception difficulties linked to their representations of speech sounds. © 2013 International Society for Autism Research, Wiley Periodicals, Inc.
Learning to match auditory and visual speech cues: social influences on acquisition of phonological categories.

PubMed

Altvater-Mackensen, Nicole; Grossmann, Tobias

2015-01-01

Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential looking paradigm, 44 German 6-month olds' ability to detect mismatches between concurrently presented auditory and visual native vowels was tested. Outcomes were related to mothers' speech style and interactive behavior assessed during free play with their infant, and to infant-specific factors assessed through a questionnaire. Results show that mothers' and infants' social behavior modulated infants' preference for matching audiovisual speech. Moreover, infants' audiovisual speech perception correlated with later vocabulary size, suggesting a lasting effect on language development. © 2014 The Authors. Child Development © 2014 Society for Research in Child Development, Inc.
Visual Information Can Hinder Working Memory Processing of Speech

ERIC Educational Resources Information Center

Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Ronnberg, Jerker; Rudner, Mary

2013-01-01

Purpose: The purpose of the present study was to evaluate the new Cognitive Spare Capacity Test (CSCT), which measures aspects of working memory capacity for heard speech in the audiovisual and auditory-only modalities of presentation. Method: In Experiment 1, 20 young adults with normal hearing performed the CSCT and an independent battery of…
Cross-Modal Interactions during Perception of Audiovisual Speech and Nonspeech Signals: An fMRI Study

ERIC Educational Resources Information Center

Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

2011-01-01

During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream--prior to its fusion with auditory phonological features [Hertrich,…
[Intermodal timing cues for audio-visual speech recognition].

PubMed

Hashimoto, Masahiro; Kumashiro, Masaharu

2004-06-01

The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.
A Selective Deficit in Phonetic Recalibration by Text in Developmental Dyslexia.

PubMed

Keetels, Mirjam; Bonte, Milene; Vroomen, Jean

2018-01-01

Upon hearing an ambiguous speech sound, listeners may adjust their perceptual interpretation of the speech input in accordance with contextual information, like accompanying text or lipread speech (i.e., phonetic recalibration; Bertelson et al., 2003). As developmental dyslexia (DD) has been associated with reduced integration of text and speech sounds, we investigated whether this deficit becomes manifest when text is used to induce this type of audiovisual learning. Adults with DD and normal readers were exposed to ambiguous consonants halfway between /aba/ and /ada/ together with text or lipread speech. After this audiovisual exposure phase, they categorized auditory-only ambiguous test sounds. Results showed that individuals with DD, unlike normal readers, did not use text to recalibrate their phoneme categories, whereas their recalibration by lipread speech was spared. Individuals with DD demonstrated similar deficits when ambiguous vowels (halfway between /wIt/ and /wet/) were recalibrated by text. These findings indicate that DD is related to a specific letter-speech sound association deficit that extends over phoneme classes (vowels and consonants), but - as lipreading was spared - does not extend to a more general audio-visual integration deficit. In particular, these results highlight diminished reading-related audiovisual learning in addition to the commonly reported phonological problems in developmental dyslexia.
The level of audiovisual print-speech integration deficits in dyslexia.

PubMed

Kronschnabel, Jens; Brem, Silvia; Maurer, Urs; Brandeis, Daniel

2014-09-01

The classical phonological deficit account of dyslexia is increasingly linked to impairments in grapho-phonological conversion, and to dysfunctions in superior temporal regions associated with audiovisual integration. The present study investigates mechanisms of audiovisual integration in typical and impaired readers at the critical developmental stage of adolescence. Congruent and incongruent audiovisual as well as unimodal (visual only and auditory only) material was presented. Audiovisual presentations were single letters and three-letter (consonant-vowel-consonant) stimuli accompanied by matching or mismatching speech sounds. Three-letter stimuli exhibited fast phonetic transitions as in real-life language processing and reading. Congruency effects, i.e. different brain responses to congruent and incongruent stimuli were taken as an indicator of audiovisual integration at a phonetic level (grapho-phonological conversion). Comparisons of unimodal and audiovisual stimuli revealed basic, more sensory aspects of audiovisual integration. By means of these two criteria of audiovisual integration, the generalizability of audiovisual deficits in dyslexia was tested. Moreover, it was expected that the more naturalistic three-letter stimuli are superior to single letters in revealing group differences. Electrophysiological and hemodynamic (EEG and fMRI) data were acquired simultaneously in a simple target detection task. Applying the same statistical models to event-related EEG potentials and fMRI responses allowed comparing the effects detected by the two techniques at a descriptive level. Group differences in congruency effects (congruent against incongruent) were observed in regions involved in grapho-phonological processing, including the left inferior frontal and angular gyri and the inferotemporal cortex. Importantly, such differences also emerged in superior temporal key regions. Three-letter stimuli revealed stronger group differences than single letters. No significant differences in basic measures of audiovisual integration emerged. Convergence of hemodynamic and electrophysiological signals appeared to be limited and mainly occurred for highly significant and large effects in visual cortices. The findings suggest efficient superior temporal tuning to audiovisual congruency in controls. In impaired readers, however, grapho-phonological conversion is effortful and inefficient, although basic audiovisual mechanisms seem intact. This unprecedented demonstration of audiovisual deficits in adolescent dyslexics provides critical evidence that the phonological deficit might be explained by impaired audiovisual integration at a phonetic level, especially for naturalistic and word-like stimulation. Copyright © 2014 Elsevier Ltd. All rights reserved.
Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech.

PubMed

Shahin, Antoine J; Shen, Stanley; Kerlin, Jess R

2017-01-01

We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync . Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
Timing in audiovisual speech perception: A mini review and new psychophysical data.

PubMed

Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory

2016-02-01

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.
Timing in Audiovisual Speech Perception: A Mini Review and New Psychophysical Data

PubMed Central

Venezia, Jonathan H.; Thurman, Steven M.; Matchin, William; George, Sahara E.; Hickok, Gregory

2015-01-01

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually-relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (∼35% identification of /apa/ compared to ∼5% in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually-relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (∼130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309
Audio-visual speech perception: a developmental ERP investigation

PubMed Central

Knowland, Victoria CP; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael SC

2014-01-01

Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language learning. We therefore explored this at the neural level. The event-related potential (ERP) technique has been used to assess the mechanisms of audio-visual speech perception in adults, with visual cues reliably modulating auditory ERP responses to speech. Previous work has shown congruence-dependent shortening of auditory N1/P2 latency and congruence-independent attenuation of amplitude in the presence of auditory and visual speech signals, compared to auditory alone. The aim of this study was to chart the development of these well-established modulatory effects over mid-to-late childhood. Experiment 1 employed an adult sample to validate a child-friendly stimulus set and paradigm by replicating previously observed effects of N1/P2 amplitude and latency modulation by visual speech cues; it also revealed greater attenuation of component amplitude given incongruent audio-visual stimuli, pointing to a new interpretation of the amplitude modulation effect. Experiment 2 used the same paradigm to map cross-sectional developmental change in these ERP responses between 6 and 11 years of age. The effect of amplitude modulation by visual cues emerged over development, while the effect of latency modulation was stable over the child sample. These data suggest that auditory ERP modulation by visual speech represents separable underlying cognitive processes, some of which show earlier maturation than others over the course of development. PMID:24176002
Audio-visual speech perception in adult readers with dyslexia: an fMRI study.

PubMed

Rüsseler, Jascha; Ye, Zheng; Gerth, Ivonne; Szycik, Gregor R; Münte, Thomas F

2018-04-01

Developmental dyslexia is a specific deficit in reading and spelling that often persists into adulthood. In the present study, we used slow event-related fMRI and independent component analysis to identify brain networks involved in perception of audio-visual speech in a group of adult readers with dyslexia (RD) and a group of fluent readers (FR). Participants saw a video of a female speaker saying a disyllabic word. In the congruent condition, audio and video input were identical whereas in the incongruent condition, the two inputs differed. Participants had to respond to occasionally occurring animal names. The independent components analysis (ICA) identified several components that were differently modulated in FR and RD. Two of these components including fusiform gyrus and occipital gyrus showed less activation in RD compared to FR possibly indicating a deficit to extract face information that is needed to integrate auditory and visual information in natural speech perception. A further component centered on the superior temporal sulcus (STS) also exhibited less activation in RD compared to FR. This finding is corroborated in the univariate analysis that shows less activation in STS for RD compared to FR. These findings suggest a general impairment in recruitment of audiovisual processing areas in dyslexia during the perception of natural speech.
Text-to-audiovisual speech synthesizer for children with learning disabilities.

PubMed

Mendi, Engin; Bayrak, Coskun

2013-01-01

Learning disabilities affect the ability of children to learn, despite their having normal intelligence. Assistive tools can highly increase functional capabilities of children with learning disorders such as writing, reading, or listening. In this article, we describe a text-to-audiovisual synthesizer that can serve as an assistive tool for such children. The system automatically converts an input text to audiovisual speech, providing synchronization of the head, eye, and lip movements of the three-dimensional face model with appropriate facial expressions and word flow of the text. The proposed system can enhance speech perception and help children having learning deficits to improve their chances of success.
Developmental Shifts in Children's Sensitivity to Visual Speech: A New Multimodal Picture-Word Task

ERIC Educational Resources Information Center

Jerger, Susan; Damian, Markus F.; Spence, Melanie J.; Tye-Murray, Nancy; Abdi, Herve

2009-01-01

This research developed a multimodal picture-word task for assessing the influence of visual speech on phonological processing by 100 children between 4 and 14 years of age. We assessed how manipulation of seemingly to-be-ignored auditory (A) and audiovisual (AV) phonological distractors affected picture naming without participants consciously…
Mandarin Visual Speech Information

ERIC Educational Resources Information Center

Chen, Trevor H.

2010-01-01

While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…
Speech Entrainment Compensates for Broca's Area Damage

PubMed Central

Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

2015-01-01

Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443

Narrative processing in typically developing children and children with early unilateral brain injury: seeing gesture matters.

PubMed

Demir, Özlem Ece; Fisher, Joan A; Goldin-Meadow, Susan; Levine, Susan C

2014-03-01

Narrative skill in kindergarteners has been shown to be a reliable predictor of later reading comprehension and school achievement. However, we know little about how to scaffold children's narrative skill. Here we examine whether the quality of kindergarten children's narrative retellings depends on the kind of narrative elicitation they are given. We asked this question with respect to typically developing (TD) kindergarten children and children with pre- or perinatal unilateral brain injury (PL), a group that has been shown to have difficulty with narrative production. We compared children's skill in retelling stories originally presented to them in 4 different elicitation formats: (a) wordless cartoons, (b) stories told by a narrator through the auditory modality, (c) stories told by a narrator through the audiovisual modality without co-speech gestures, and (e) stories told by a narrator in the audiovisual modality with co-speech gestures. We found that children told better structured narratives in response to the audiovisual + gesture elicitation format than in response to the other 3 elicitation formats, consistent with findings that co-speech gestures can scaffold other aspects of language and memory. The audiovisual + gesture elicitation format was particularly beneficial for children who had the most difficulty telling a well-structured narrative, a group that included children with larger lesions associated with cerebrovascular infarcts. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Evaluation of an Audiovisual-FM System: Investigating the Interaction between Illumination Level and a Talker's Skin Color on Speech-Reading Performance

ERIC Educational Resources Information Center

Gagne, Jean-Pierre; Laplante-Levesque, Ariane; Labelle, Maude; Doucet, Katrine; Potvin, Marie-Christine

2006-01-01

A program designed to evaluate the benefits of an audiovisual-frequency modulated (FM) system led to some questions concerning the effects of illumination level and a talker's skin color on speech-reading performance. To address those issues, the speech of a Caucasian female was videotaped under 2 conditions: a light skin color condition and a…
Brief Report: Arrested Development of Audiovisual Speech Perception in Autism Spectrum Disorders

ERIC Educational Resources Information Center

Stevenson, Ryan A.; Siemann, Justin K.; Woynaroski, Tiffany G.; Schneider, Brittany C.; Eberly, Haley E.; Camarata, Stephen M.; Wallace, Mark T.

2014-01-01

Atypical communicative abilities are a core marker of Autism Spectrum Disorders (ASD). A number of studies have shown that, in addition to auditory comprehension differences, individuals with autism frequently show atypical responses to audiovisual speech, suggesting a multisensory contribution to these communicative differences from their…
Multisensory Speech Perception in Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Woynaroski, Tiffany G.; Kwakye, Leslie D.; Foss-Feig, Jennifer H.; Stevenson, Ryan A.; Stone, Wendy L.; Wallace, Mark T.

2013-01-01

This study examined unisensory and multisensory speech perception in 8-17 year old children with autism spectrum disorders (ASD) and typically developing controls matched on chronological age, sex, and IQ. Consonant-vowel syllables were presented in visual only, auditory only, matched audiovisual, and mismatched audiovisual ("McGurk")…
Visual contribution to the multistable perception of speech.

PubMed

Sato, Marc; Basirat, Anahita; Schwartz, Jean-Luc

2007-11-01

The multistable perception of speech, or verbal transformation effect, refers to perceptual changes experienced while listening to a speech form that is repeated rapidly and continuously. In order to test whether visual information from the speaker's articulatory gestures may modify the emergence and stability of verbal auditory percepts, subjects were instructed to report any perceptual changes during unimodal, audiovisual, and incongruent audiovisual presentations of distinct repeated syllables. In a first experiment, the perceptual stability of reported auditory percepts was significantly modulated by the modality of presentation. In a second experiment, when audiovisual stimuli consisting of a stable audio track dubbed with a video track that alternated between congruent and incongruent stimuli were presented, a strong correlation between the timing of perceptual transitions and the timing of video switches was found. Finally, a third experiment showed that the vocal tract opening onset event provided by the visual input could play the role of a bootstrap mechanism in the search for transformations. Altogether, these results demonstrate the capacity of visual information to control the multistable perception of speech in its phonetic content and temporal course. The verbal transformation effect thus provides a useful experimental paradigm to explore audiovisual interactions in speech perception.
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

PubMed

Borrie, Stephanie A

2015-03-01

This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Bilingualism modulates infants' selective attention to the mouth of a talking face.

PubMed

Pons, Ferran; Bosch, Laura; Lewkowicz, David J

2015-04-01

Infants growing up in bilingual environments succeed at learning two languages. What adaptive processes enable them to master the more complex nature of bilingual input? One possibility is that bilingual infants take greater advantage of the redundancy of the audiovisual speech that they usually experience during social interactions. Thus, we investigated whether bilingual infants' need to keep languages apart increases their attention to the mouth as a source of redundant and reliable speech cues. We measured selective attention to talking faces in 4-, 8-, and 12-month-old Catalan and Spanish monolingual and bilingual infants. Monolinguals looked more at the eyes than the mouth at 4 months and more at the mouth than the eyes at 8 months in response to both native and nonnative speech, but they looked more at the mouth than the eyes at 12 months only in response to nonnative speech. In contrast, bilinguals looked equally at the eyes and mouth at 4 months, more at the mouth than the eyes at 8 months, and more at the mouth than the eyes at 12 months, and these patterns of responses were found for both native and nonnative speech at all ages. Thus, to support their dual-language acquisition processes, bilingual infants exploit the greater perceptual salience of redundant audiovisual speech cues at an earlier age and for a longer time than monolingual infants. © The Author(s) 2015.
Audio-visual speech intelligibility benefits with bilateral cochlear implants when talker location varies.

PubMed

van Hoesel, Richard J M

2015-04-01

One of the key benefits of using cochlear implants (CIs) in both ears rather than just one is improved localization. It is likely that in complex listening scenes, improved localization allows bilateral CI users to orient toward talkers to improve signal-to-noise ratios and gain access to visual cues, but to date, that conjecture has not been tested. To obtain an objective measure of that benefit, seven bilateral CI users were assessed for both auditory-only and audio-visual speech intelligibility in noise using a novel dynamic spatial audio-visual test paradigm. For each trial conducted in spatially distributed noise, first, an auditory-only cueing phrase that was spoken by one of four talkers was selected and presented from one of four locations. Shortly afterward, a target sentence was presented that was either audio-visual or, in another test configuration, audio-only and was spoken by the same talker and from the same location as the cueing phrase. During the target presentation, visual distractors were added at other spatial locations. Results showed that in terms of speech reception thresholds (SRTs), the average improvement for bilateral listening over the better performing ear alone was 9 dB for the audio-visual mode, and 3 dB for audition-alone. Comparison of bilateral performance for audio-visual and audition-alone showed that inclusion of visual cues led to an average SRT improvement of 5 dB. For unilateral device use, no such benefit arose, presumably due to the greatly reduced ability to localize the target talker to acquire visual information. The bilateral CI speech intelligibility advantage over the better ear in the present study is much larger than that previously reported for static talker locations and indicates greater everyday speech benefits and improved cost-benefit than estimated to date.
High visual resolution matters in audiovisual speech perception, but only for some.

PubMed

Alsius, Agnès; Wayne, Rachel V; Paré, Martin; Munhall, Kevin G

2016-07-01

The basis for individual differences in the degree to which visual speech input enhances comprehension of acoustically degraded speech is largely unknown. Previous research indicates that fine facial detail is not critical for visual enhancement when auditory information is available; however, these studies did not examine individual differences in ability to make use of fine facial detail in relation to audiovisual speech perception ability. Here, we compare participants based on their ability to benefit from visual speech information in the presence of an auditory signal degraded with noise, modulating the resolution of the visual signal through low-pass spatial frequency filtering and monitoring gaze behavior. Participants who benefited most from the addition of visual information (high visual gain) were more adversely affected by the removal of high spatial frequency information, compared to participants with low visual gain, for materials with both poor and rich contextual cues (i.e., words and sentences, respectively). Differences as a function of gaze behavior between participants with the highest and lowest visual gains were observed only for words, with participants with the highest visual gain fixating longer on the mouth region. Our results indicate that the individual variance in audiovisual speech in noise performance can be accounted for, in part, by better use of fine facial detail information extracted from the visual signal and increased fixation on mouth regions for short stimuli. Thus, for some, audiovisual speech perception may suffer when the visual input (in addition to the auditory signal) is less than perfect.
Atypical rapid audio-visual temporal recalibration in autism spectrum disorders.

PubMed

Noel, Jean-Paul; De Niear, Matthew A; Stevenson, Ryan; Alais, David; Wallace, Mark T

2017-01-01

Changes in sensory and multisensory function are increasingly recognized as a common phenotypic characteristic of Autism Spectrum Disorders (ASD). Furthermore, much recent evidence suggests that sensory disturbances likely play an important role in contributing to social communication weaknesses-one of the core diagnostic features of ASD. An established sensory disturbance observed in ASD is reduced audiovisual temporal acuity. In the current study, we substantially extend these explorations of multisensory temporal function within the framework that an inability to rapidly recalibrate to changes in audiovisual temporal relations may play an important and under-recognized role in ASD. In the paradigm, we present ASD and typically developing (TD) children and adolescents with asynchronous audiovisual stimuli of varying levels of complexity and ask them to perform a simultaneity judgment (SJ). In the critical analysis, we test audiovisual temporal processing on trial t as a condition of trial t - 1. The results demonstrate that individuals with ASD fail to rapidly recalibrate to audiovisual asynchronies in an equivalent manner to their TD counterparts for simple and non-linguistic stimuli (i.e., flashes and beeps, hand-held tools), but exhibit comparable rapid recalibration for speech stimuli. These results are discussed in terms of prior work showing a speech-specific deficit in audiovisual temporal function in ASD, and in light of current theories of autism focusing on sensory noise and stability of perceptual representations. Autism Res 2017, 10: 121-129. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Structuring Broadcast Audio for Information Access

NASA Astrophysics Data System (ADS)

Gauvain, Jean-Luc; Lamel, Lori

2003-12-01

One rapidly expanding application area for state-of-the-art speech recognition technology is the automatic processing of broadcast audiovisual data for information access. Since much of the linguistic information is found in the audio channel, speech recognition is a key enabling technology which, when combined with information retrieval techniques, can be used for searching large audiovisual document collections. Audio indexing must take into account the specificities of audio data such as needing to deal with the continuous data stream and an imperfect word transcription. Other important considerations are dealing with language specificities and facilitating language portability. At Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), broadcast news transcription systems have been developed for seven languages: English, French, German, Mandarin, Portuguese, Spanish, and Arabic. The transcription systems have been integrated into prototype demonstrators for several application areas such as audio data mining, structuring audiovisual archives, selective dissemination of information, and topic tracking for media monitoring. As examples, this paper addresses the spoken document retrieval and topic tracking tasks.
Speech entrainment compensates for Broca's area damage.

PubMed

Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

2015-08-01

Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to SE. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during SE versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of SE to improve speech production and may help select patients for SE treatment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Superior temporal sulcus--It's my area: or is it?

PubMed

Hein, Grit; Knight, Robert T

2008-12-01

The superior temporal sulcus (STS) is the chameleon of the human brain. Several research areas claim the STS as the host brain region for their particular behavior of interest. Some see it as one of the core structures for theory of mind. For others, it is the main region for audiovisual integration. It plays an important role in biological motion perception, but is also claimed to be essential for speech processing and processing of faces. We review the foci of activations in the STS from multiple functional magnetic resonance imaging studies, focusing on theory of mind, audiovisual integration, motion processing, speech processing, and face processing. The results indicate a differentiation of the STS region in an anterior portion, mainly involved in speech processing, and a posterior portion recruited by cognitive demands of all these different research areas. The latter finding argues against a strict functional subdivision of the STS. In line with anatomical evidence from tracer studies, we propose that the function of the STS varies depending on the nature of network coactivations with different regions in the frontal cortex and medial-temporal lobe. This view is more in keeping with the notion that the same brain region can support different cognitive operations depending on task-dependent network connections, emphasizing the role of network connectivity analysis in neuroimaging.
Audio visual speech source separation via improved context dependent association model

NASA Astrophysics Data System (ADS)

Kazemi, Alireza; Boostani, Reza; Sobhanmanesh, Fariborz

2014-12-01

In this paper, we exploit the non-linear relation between a speech source and its associated lip video as a source of extra information to propose an improved audio-visual speech source separation (AVSS) algorithm. The audio-visual association is modeled using a neural associator which estimates the visual lip parameters from a temporal context of acoustic observation frames. We define an objective function based on mean square error (MSE) measure between estimated and target visual parameters. This function is minimized for estimation of the de-mixing vector/filters to separate the relevant source from linear instantaneous or time-domain convolutive mixtures. We have also proposed a hybrid criterion which uses AV coherency together with kurtosis as a non-Gaussianity measure. Experimental results are presented and compared in terms of visually relevant speech detection accuracy and output signal-to-interference ratio (SIR) of source separation. The suggested audio-visual model significantly improves relevant speech classification accuracy compared to existing GMM-based model and the proposed AVSS algorithm improves the speech separation quality compared to reference ICA- and AVSS-based methods.
Audiovisual integration in children listening to spectrally degraded speech.

PubMed

Maidment, David W; Kang, Hi Jee; Stewart, Hannah J; Amitay, Sygal

2015-02-01

The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Children (n=69) and adults (n=15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in auditory-only or audiovisual conditions. The number of bands was adaptively varied to modulate the degradation of the auditory signal, with the number of bands required for approximately 79% correct identification calculated as the threshold. The youngest children (4- to 5-year-olds) did not benefit from accompanying visual information, in comparison to 6- to 11-year-old children and adults. Audiovisual gain also increased with age in the child sample. The current data suggest that children younger than 6 years of age do not fully utilize visual speech cues to enhance speech perception when the auditory signal is degraded. This evidence not only has implications for understanding the development of speech perception skills in children with normal hearing but may also inform the development of new treatment and intervention strategies that aim to remediate speech perception difficulties in pediatric cochlear implant users.
Visual Feedback of Tongue Movement for Novel Speech Sound Learning

PubMed Central

Katz, William F.; Mehta, Sonya

2015-01-01

Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one's own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker's learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA) was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ/; a voiced, coronal, palatal stop) before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers' productions were evaluated using kinematic (tongue-tip spatial positioning) and acoustic (burst spectra) measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing. PMID:26635571
Gated audiovisual speech identification in silence vs. noise: effects on time and accuracy

PubMed Central

Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

2013-01-01

This study investigated the degree to which audiovisual presentation (compared to auditory-only presentation) affected isolation point (IPs, the amount of time required for the correct identification of speech stimuli using a gating paradigm) in silence and noise conditions. The study expanded on the findings of Moradi et al. (under revision), using the same stimuli, but presented in an audiovisual instead of an auditory-only manner. The results showed that noise impeded the identification of consonants and words (i.e., delayed IPs and lowered accuracy), but not the identification of final words in sentences. In comparison with the previous study by Moradi et al., it can be concluded that the provision of visual cues expedited IPs and increased the accuracy of speech stimuli identification in both silence and noise. The implication of the results is discussed in terms of models for speech understanding. PMID:23801980
Evidence of a visual-to-auditory cross-modal sensory gating phenomenon as reflected by the human P50 event-related brain potential modulation.

PubMed

Lebib, Riadh; Papo, David; de Bode, Stella; Baudonnière, Pierre Marie

2003-05-08

We investigated the existence of a cross-modal sensory gating reflected by the modulation of an early electrophysiological index, the P50 component. We analyzed event-related brain potentials elicited by audiovisual speech stimuli manipulated along two dimensions: congruency and discriminability. The results showed that the P50 was attenuated when visual and auditory speech information were redundant (i.e. congruent), in comparison with this same event-related potential component elicited with discrepant audiovisual dubbing. When hard to discriminate, however, bimodal incongruent speech stimuli elicited a similar pattern of P50 attenuation. We concluded to the existence of a visual-to-auditory cross-modal sensory gating phenomenon. These results corroborate previous findings revealing a very early audiovisual interaction during speech perception. Finally, we postulated that the sensory gating system included a cross-modal dimension.
Crossmodal and Incremental Perception of Audiovisual Cues to Emotional Speech

ERIC Educational Resources Information Center

Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

2010-01-01

In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests…
Word segmentation in phonemically identical and prosodically different sequences using cochlear implants: A case study.

PubMed

Basirat, Anahita

2017-01-01

Cochlear implant (CI) users frequently achieve good speech understanding based on phoneme and word recognition. However, there is a significant variability between CI users in processing prosody. The aim of this study was to examine the abilities of an excellent CI user to segment continuous speech using intonational cues. A post-lingually deafened adult CI user and 22 normal hearing (NH) subjects segmented phonemically identical and prosodically different sequences in French such as 'l'affiche' (the poster) versus 'la fiche' (the sheet), both [lafiʃ]. All participants also completed a minimal pair discrimination task. Stimuli were presented in auditory-only and audiovisual presentation modalities. The performance of the CI user in the minimal pair discrimination task was 97% in the auditory-only and 100% in the audiovisual condition. In the segmentation task, contrary to the NH participants, the performance of the CI user did not differ from the chance level. Visual speech did not improve word segmentation. This result suggests that word segmentation based on intonational cues is challenging when using CIs even when phoneme/word recognition is very well rehabilitated. This finding points to the importance of the assessment of CI users' skills in prosody processing and the need for specific interventions focusing on this aspect of speech communication.

Audiovisual cues and perceptual learning of spectrally distorted speech.

PubMed

Pilling, Michael; Thomas, Sharon

2011-12-01

Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties of a cochlear implant with a 6 mm place mismatch: Experiment I found that participants showed significantly greater improvement in perceiving noise-vocoded speech when training gave AV cues than when it gave auditory cues alone. Experiment 2 compared training with AV cues with training which gave written feedback. These two methods did not significantly differ in the pattern of training they produced. Suggestions are made about the types of circumstances in which the two training methods might be found to differ in facilitating auditory perceptual learning of speech.
Children perceive speech onsets by ear and eye*

PubMed Central

JERGER, SUSAN; DAMIAN, MARKUS F.; TYE-MURRAY, NANCY; ABDI, HERVÉ

2016-01-01

Adults use vision to perceive low-fidelity speech; yet how children acquire this ability is not well understood. The literature indicates that children show reduced sensitivity to visual speech from kindergarten to adolescence. We hypothesized that this pattern reflects the effects of complex tasks and a growth period with harder-to-utilize cognitive resources, not lack of sensitivity. We investigated sensitivity to visual speech in children via the phonological priming produced by low-fidelity (non-intact onset) auditory speech presented audiovisually (see dynamic face articulate consonant/rhyme b/ag; hear non-intact onset/rhyme: −b/ag) vs. auditorily (see still face; hear exactly same auditory input). Audiovisual speech produced greater priming from four to fourteen years, indicating that visual speech filled in the non-intact auditory onsets. The influence of visual speech depended uniquely on phonology and speechreading. Children – like adults – perceive speech onsets multimodally. Findings are critical for incorporating visual speech into developmental theories of speech perception. PMID:26752548
The Bilingual Language Interaction Network for Comprehension of Speech*

PubMed Central

Marian, Viorica

2013-01-01

During speech comprehension, bilinguals co-activate both of their languages, resulting in cross-linguistic interaction at various levels of processing. This interaction has important consequences for both the structure of the language system and the mechanisms by which the system processes spoken language. Using computational modeling, we can examine how cross-linguistic interaction affects language processing in a controlled, simulated environment. Here we present a connectionist model of bilingual language processing, the Bilingual Language Interaction Network for Comprehension of Speech (BLINCS), wherein interconnected levels of processing are created using dynamic, self-organizing maps. BLINCS can account for a variety of psycholinguistic phenomena, including cross-linguistic interaction at and across multiple levels of processing, cognate facilitation effects, and audio-visual integration during speech comprehension. The model also provides a way to separate two languages without requiring a global language-identification system. We conclude that BLINCS serves as a promising new model of bilingual spoken language comprehension. PMID:24363602
Audiovisual Integration in High Functioning Adults with Autism

ERIC Educational Resources Information Center

Keane, Brian P.; Rosenthal, Orna; Chun, Nicole H.; Shams, Ladan

2010-01-01

Autism involves various perceptual benefits and deficits, but it is unclear if the disorder also involves anomalous audiovisual integration. To address this issue, we compared the performance of high-functioning adults with autism and matched controls on experiments investigating the audiovisual integration of speech, spatiotemporal relations, and…
Assessing the effect of physical differences in the articulation of consonants and vowels on audiovisual temporal perception

PubMed Central

Vatakis, Argiro; Maragos, Petros; Rodomagoulakis, Isidoros; Spence, Charles

2012-01-01

We investigated how the physical differences associated with the articulation of speech affect the temporal aspects of audiovisual speech perception. Video clips of consonants and vowels uttered by three different speakers were presented. The video clips were analyzed using an auditory-visual signal saliency model in order to compare signal saliency and behavioral data. Participants made temporal order judgments (TOJs) regarding which speech-stream (auditory or visual) had been presented first. The sensitivity of participants' TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. We expected that in the case of the place of articulation and roundedness, where the visual-speech signal is more salient, temporal perception of speech would be modulated by the visual-speech signal. No such effect was expected for the manner of articulation or height. The results demonstrate that for place and manner of articulation, participants' temporal percept was affected (although not always significantly) by highly-salient speech-signals with the visual-signals requiring smaller visual-leads at the PSS. This was not the case when height was evaluated. These findings suggest that in the case of audiovisual speech perception, a highly salient visual-speech signal may lead to higher probabilities regarding the identity of the auditory-signal that modulate the temporal window of multisensory integration of the speech-stimulus. PMID:23060756
The time course of auditory-visual processing of speech and body actions: evidence for the simultaneous activation of an extended neural network for semantic processing.

PubMed

Meyer, Georg F; Harrison, Neil R; Wuerger, Sophie M

2013-08-01

An extensive network of cortical areas is involved in multisensory object and action recognition. This network draws on inferior frontal, posterior temporal, and parietal areas; activity is modulated by familiarity and the semantic congruency of auditory and visual component signals even if semantic incongruences are created by combining visual and auditory signals representing very different signal categories, such as speech and whole body actions. Here we present results from a high-density ERP study designed to examine the time-course and source location of responses to semantically congruent and incongruent audiovisual speech and body actions to explore whether the network involved in action recognition consists of a hierarchy of sequentially activated processing modules or a network of simultaneously active processing sites. We report two main results:1) There are no significant early differences in the processing of congruent and incongruent audiovisual action sequences. The earliest difference between congruent and incongruent audiovisual stimuli occurs between 240 and 280 ms after stimulus onset in the left temporal region. Between 340 and 420 ms, semantic congruence modulates responses in central and right frontal areas. Late differences (after 460 ms) occur bilaterally in frontal areas.2) Source localisation (dipole modelling and LORETA) reveals that an extended network encompassing inferior frontal, temporal, parasaggital, and superior parietal sites are simultaneously active between 180 and 420 ms to process auditory–visual action sequences. Early activation (before 120 ms) can be explained by activity in mainly sensory cortices. . The simultaneous activation of an extended network between 180 and 420 ms is consistent with models that posit parallel processing of complex action sequences in frontal, temporal and parietal areas rather than models that postulate hierarchical processing in a sequence of brain regions. Copyright © 2013 Elsevier Ltd. All rights reserved.
Electrophysiological Evidence for a Multisensory Speech-Specific Mode of Perception

ERIC Educational Resources Information Center

Stekelenburg, Jeroen J.; Vroomen, Jean

2012-01-01

We investigated whether the interpretation of auditory stimuli as speech or non-speech affects audiovisual (AV) speech integration at the neural level. Perceptually ambiguous sine-wave replicas (SWS) of natural speech were presented to listeners who were either in "speech mode" or "non-speech mode". At the behavioral level, incongruent lipread…
Neurophysiology underlying influence of stimulus reliability on audiovisual integration.

PubMed

Shatzer, Hannah; Shen, Stanley; Kerlin, Jess R; Pitt, Mark A; Shahin, Antoine J

2018-01-24

We tested the predictions of the dynamic reweighting model (DRM) of audiovisual (AV) speech integration, which posits that spectrotemporally reliable (informative) AV speech stimuli induce a reweighting of processing from low-level to high-level auditory networks. This reweighting decreases sensitivity to acoustic onsets and in turn increases tolerance to AV onset asynchronies (AVOA). EEG was recorded while subjects watched videos of a speaker uttering trisyllabic nonwords that varied in spectrotemporal reliability and asynchrony of the visual and auditory inputs. Subjects judged the stimuli as in-sync or out-of-sync. Results showed that subjects exhibited greater AVOA tolerance for non-blurred than blurred visual speech and for less than more degraded acoustic speech. Increased AVOA tolerance was reflected in reduced amplitude of the P1-P2 auditory evoked potentials, a neurophysiological indication of reduced sensitivity to acoustic onsets and successful AV integration. There was also sustained visual alpha band (8-14 Hz) suppression (desynchronization) following acoustic speech onsets for non-blurred vs. blurred visual speech, consistent with continuous engagement of the visual system as the speech unfolds. The current findings suggest that increased spectrotemporal reliability of acoustic and visual speech promotes robust AV integration, partly by suppressing sensitivity to acoustic onsets, in support of the DRM's reweighting mechanism. Increased visual signal reliability also sustains the engagement of the visual system with the auditory system to maintain alignment of information across modalities. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

PubMed

Smayda, Kirsten E; Van Engen, Kristin J; Maddox, W Todd; Chandrasekaran, Bharath

2016-01-01

Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35) and thirty-three older adults (ages 60-90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both semantic and visual cues are available to the listener.
Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults

PubMed Central

Smayda, Kirsten E.; Van Engen, Kristin J.; Maddox, W. Todd; Chandrasekaran, Bharath

2016-01-01

Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18–35) and thirty-three older adults (ages 60–90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both semantic and visual cues are available to the listener. PMID:27031343
The Efficacy of Short-term Gated Audiovisual Speech Training for Improving Auditory Sentence Identification in Noise in Elderly Hearing Aid Users.

PubMed

Moradi, Shahram; Wahlin, Anna; Hällgren, Mathias; Rönnberg, Jerker; Lidestam, Björn

2017-01-01

This study aimed to examine the efficacy and maintenance of short-term (one-session) gated audiovisual speech training for improving auditory sentence identification in noise in experienced elderly hearing-aid users. Twenty-five hearing aid users (16 men and 9 women), with an average age of 70.8 years, were randomly divided into an experimental (audiovisual training, n = 14) and a control (auditory training, n = 11) group. Participants underwent gated speech identification tasks comprising Swedish consonants and words presented at 65 dB sound pressure level with a 0 dB signal-to-noise ratio (steady-state broadband noise), in audiovisual or auditory-only training conditions. The Hearing-in-Noise Test was employed to measure participants' auditory sentence identification in noise before the training (pre-test), promptly after training (post-test), and 1 month after training (one-month follow-up). The results showed that audiovisual training improved auditory sentence identification in noise promptly after the training (post-test vs. pre-test scores); furthermore, this improvement was maintained 1 month after the training (one-month follow-up vs. pre-test scores). Such improvement was not observed in the control group, neither promptly after the training nor at the one-month follow-up. However, no significant between-groups difference nor an interaction between groups and session was observed. Audiovisual training may be considered in aural rehabilitation of hearing aid users to improve listening capabilities in noisy conditions. However, the lack of a significant between-groups effect (audiovisual vs. auditory) or an interaction between group and session calls for further research.
Evolution of crossmodal reorganization of the voice area in cochlear-implanted deaf patients.

PubMed

Rouger, Julien; Lagleyre, Sébastien; Démonet, Jean-François; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal

2012-08-01

Psychophysical and neuroimaging studies in both animal and human subjects have clearly demonstrated that cortical plasticity following sensory deprivation leads to a brain functional reorganization that favors the spared modalities. In postlingually deaf patients, the use of a cochlear implant (CI) allows a recovery of the auditory function, which will probably counteract the cortical crossmodal reorganization induced by hearing loss. To study the dynamics of such reversed crossmodal plasticity, we designed a longitudinal neuroimaging study involving the follow-up of 10 postlingually deaf adult CI users engaged in a visual speechreading task. While speechreading activates Broca's area in normally hearing subjects (NHS), the activity level elicited in this region in CI patients is abnormally low and increases progressively with post-implantation time. Furthermore, speechreading in CI patients induces abnormal crossmodal activations in right anterior regions of the superior temporal cortex normally devoted to processing human voice stimuli (temporal voice-sensitive areas-TVA). These abnormal activity levels diminish with post-implantation time and tend towards the levels observed in NHS. First, our study revealed that the neuroplasticity after cochlear implantation involves not only auditory but also visual and audiovisual speech processing networks. Second, our results suggest that during deafness, the functional links between cortical regions specialized in face and voice processing are reallocated to support speech-related visual processing through cross-modal reorganization. Such reorganization allows a more efficient audiovisual integration of speech after cochlear implantation. These compensatory sensory strategies are later completed by the progressive restoration of the visuo-audio-motor speech processing loop, including Broca's area. Copyright © 2011 Wiley Periodicals, Inc.
Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception

PubMed Central

Wilson, Amanda H.; Paré, Martin; Munhall, Kevin G.

2016-01-01

Purpose The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method We presented vowel–consonant–vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent conditions (Experiment 1; N = 66). In Experiment 2 (N = 20), participants performed a visual-only speech perception task and in Experiment 3 (N = 20) an audiovisual task while having their gaze behavior monitored using eye-tracking equipment. Results In the visual-only condition, increasing image resolution led to monotonic increases in performance, and proficient speechreaders were more affected by the removal of high spatial information than were poor speechreaders. The McGurk effect also increased with increasing visual resolution, although it was less affected by the removal of high-frequency information. Observers tended to fixate on the mouth more in visual-only perception, but gaze toward the mouth did not correlate with accuracy of silent speechreading or the magnitude of the McGurk effect. Conclusions The results suggest that individual differences in silent speechreading and the McGurk effect are not related. This conclusion is supported by differential influences of high-resolution visual information on the 2 tasks and differences in the pattern of gaze. PMID:27537379
Effects of aging on audio-visual speech integration.

PubMed

Huyse, Aurélie; Leybaert, Jacqueline; Berthommier, Frédéric

2014-10-01

This study investigated the impact of aging on audio-visual speech integration. A syllable identification task was presented in auditory-only, visual-only, and audio-visual congruent and incongruent conditions. Visual cues were either degraded or unmodified. Stimuli were embedded in stationary noise alternating with modulated noise. Fifteen young adults and 15 older adults participated in this study. Results showed that older adults had preserved lipreading abilities when the visual input was clear but not when it was degraded. The impact of aging on audio-visual integration also depended on the quality of the visual cues. In the visual clear condition, the audio-visual gain was similar in both groups and analyses in the framework of the fuzzy-logical model of perception confirmed that older adults did not differ from younger adults in their audio-visual integration abilities. In the visual reduction condition, the audio-visual gain was reduced in the older group, but only when the noise was stationary, suggesting that older participants could compensate for the loss of lipreading abilities by using the auditory information available in the valleys of the noise. The fuzzy-logical model of perception confirmed the significant impact of aging on audio-visual integration by showing an increased weight of audition in the older group.
Talker variability in audio-visual speech perception

PubMed Central

Heald, Shannon L. M.; Nusbaum, Howard C.

2014-01-01

A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker’s face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred. PMID:25076919
Talker variability in audio-visual speech perception.

PubMed

Heald, Shannon L M; Nusbaum, Howard C

2014-01-01

A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.
Development of Trivia Game for speech understanding in background noise.

PubMed

Schwartz, Kathryn; Ringleb, Stacie I; Sandberg, Hilary; Raymer, Anastasia; Watson, Ginger S

2015-01-01

Listening in noise is an everyday activity and poses a challenge for many people. To improve the ability to understand speech in noise, a computerized auditory rehabilitation game was developed. In Trivia Game players are challenged to answer trivia questions spoken aloud. As players progress through the game, the level of background noise increases. A study using Trivia Game was conducted as a proof-of-concept investigation in healthy participants. College students with normal hearing were randomly assigned to a control (n = 13) or a treatment (n = 14) group. Treatment participants played Trivia Game 12 times over a 4-week period. All participants completed objective (auditory-only and audiovisual formats) and subjective listening in noise measures at baseline and 4 weeks later. There were no statistical differences between the groups at baseline. At post-test, the treatment group significantly improved their overall speech understanding in noise in the audiovisual condition and reported significant benefits in their functional listening abilities. Playing Trivia Game improved speech understanding in noise in healthy listeners. Significant findings for the audiovisual condition suggest that participants improved face-reading abilities. Trivia Game may be a platform for investigating changes in speech understanding in individuals with sensory, linguistic and cognitive impairments.
Perceptual congruency of audio-visual speech affects ventriloquism with bilateral visual stimuli.

PubMed

Kanaya, Shoko; Yokosawa, Kazuhiko

2011-02-01

Many studies on multisensory processes have focused on performance in simplified experimental situations, with a single stimulus in each sensory modality. However, these results cannot necessarily be applied to explain our perceptual behavior in natural scenes where various signals exist within one sensory modality. We investigated the role of audio-visual syllable congruency on participants' auditory localization bias or the ventriloquism effect using spoken utterances and two videos of a talking face. Salience of facial movements was also manipulated. Results indicated that more salient visual utterances attracted participants' auditory localization. Congruent pairing of audio-visual utterances elicited greater localization bias than incongruent pairing, while previous studies have reported little dependency on the reality of stimuli in ventriloquism. Moreover, audio-visual illusory congruency, owing to the McGurk effect, caused substantial visual interference on auditory localization. Multisensory performance appears more flexible and adaptive in this complex environment than in previous studies.
Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

PubMed Central

Alm, Magnus; Behne, Dawn

2015-01-01

Gender and age have been found to affect adults’ audio-visual (AV) speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20–30 years) and middle-aged adults (50–60 years) with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy toward more visually dominated responses. PMID:26236274
Monkey Lipsmacking Develops Like the Human Speech Rhythm

ERIC Educational Resources Information Center

Morrill, Ryan J.; Paukner, Annika; Ferrari, Pier F.; Ghazanfar, Asif A.

2012-01-01

Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved "de novo" in humans. An alternative account--the one we explored here--is that the rhythm of speech evolved through the modification of rhythmic facial…

Audio-visual integration during speech perception in prelingually deafened Japanese children revealed by the McGurk effect.

PubMed

Tona, Risa; Naito, Yasushi; Moroto, Saburo; Yamamoto, Rinko; Fujiwara, Keizo; Yamazaki, Hiroshi; Shinohara, Shogo; Kikuchi, Masahiro

2015-12-01

To investigate the McGurk effect in profoundly deafened Japanese children with cochlear implants (CI) and in normal-hearing children. This was done to identify how children with profound deafness using CI established audiovisual integration during the speech acquisition period. Twenty-four prelingually deafened children with CI and 12 age-matched normal-hearing children participated in this study. Responses to audiovisual stimuli were compared between deafened and normal-hearing controls. Additionally, responses of the children with CI younger than 6 years of age were compared with those of the children with CI at least 6 years of age at the time of the test. Responses to stimuli combining auditory labials and visual non-labials were significantly different between deafened children with CI and normal-hearing controls (p<0.05). Additionally, the McGurk effect tended to be more induced in deafened children older than 6 years of age than in their younger counterparts. The McGurk effect was more significantly induced in prelingually deafened Japanese children with CI than in normal-hearing, age-matched Japanese children. Despite having good speech-perception skills and auditory input through their CI, from early childhood, deafened children may use more visual information in speech perception than normal-hearing children. As children using CI need to communicate based on insufficient speech signals coded by CI, additional activities of higher-order brain function may be necessary to compensate for the incomplete auditory input. This study provided information on the influence of deafness on the development of audiovisual integration related to speech, which could contribute to our further understanding of the strategies used in spoken language communication by prelingually deafened children. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Computationally Efficient Clustering of Audio-Visual Meeting Data

NASA Astrophysics Data System (ADS)

Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
The Efficacy of Short-term Gated Audiovisual Speech Training for Improving Auditory Sentence Identification in Noise in Elderly Hearing Aid Users

PubMed Central

Moradi, Shahram; Wahlin, Anna; Hällgren, Mathias; Rönnberg, Jerker; Lidestam, Björn

2017-01-01

This study aimed to examine the efficacy and maintenance of short-term (one-session) gated audiovisual speech training for improving auditory sentence identification in noise in experienced elderly hearing-aid users. Twenty-five hearing aid users (16 men and 9 women), with an average age of 70.8 years, were randomly divided into an experimental (audiovisual training, n = 14) and a control (auditory training, n = 11) group. Participants underwent gated speech identification tasks comprising Swedish consonants and words presented at 65 dB sound pressure level with a 0 dB signal-to-noise ratio (steady-state broadband noise), in audiovisual or auditory-only training conditions. The Hearing-in-Noise Test was employed to measure participants’ auditory sentence identification in noise before the training (pre-test), promptly after training (post-test), and 1 month after training (one-month follow-up). The results showed that audiovisual training improved auditory sentence identification in noise promptly after the training (post-test vs. pre-test scores); furthermore, this improvement was maintained 1 month after the training (one-month follow-up vs. pre-test scores). Such improvement was not observed in the control group, neither promptly after the training nor at the one-month follow-up. However, no significant between-groups difference nor an interaction between groups and session was observed. Conclusion: Audiovisual training may be considered in aural rehabilitation of hearing aid users to improve listening capabilities in noisy conditions. However, the lack of a significant between-groups effect (audiovisual vs. auditory) or an interaction between group and session calls for further research. PMID:28348542
A Neural Basis for Interindividual Differences in the McGurk Effect, a Multisensory Speech Illusion

PubMed Central

Nath, Audrey R.; Beauchamp, Michael S.

2011-01-01

The McGurk effect is a compelling illusion in which humans perceive mismatched audiovisual speech as a completely different syllable. However, some normal individuals do not experience the illusion, reporting that the stimulus sounds the same with or without visual input. Converging evidence suggests that the left superior temporal sulcus (STS) is critical for audiovisual integration during speech perception. We used blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) to measure brain activity as McGurk perceivers and non-perceivers were presented with congruent audiovisual syllables, McGurk audiovisual syllables, and non-McGurk incongruent syllables. The inferior frontal gyrus showed an effect of stimulus condition (greater responses for incongruent stimuli) but not susceptibility group, while the left auditory cortex showed an effect of susceptibility group (greater response in susceptible individuals) but not stimulus condition. Only one brain region, the left STS, showed a significant effect of both susceptibility and stimulus condition. The amplitude of the response in the left STS was significantly correlated with the likelihood of perceiving the McGurk effect: a weak STS response meant that a subject was less likely to perceive the McGurk effect, while a strong response meant that a subject was more likely to perceive it. These results suggest that the left STS is a key locus for interindividual differences in speech perception. PMID:21787869
The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

PubMed Central

Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

2017-01-01

We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework of the multimodal nature of human communication. PMID:28424636
Lip movements entrain the observers’ low-frequency brain oscillations to facilitate speech intelligibility

PubMed Central

Park, Hyojin; Kayser, Christoph; Thut, Gregor; Gross, Joachim

2016-01-01

During continuous speech, lip movements provide visual temporal signals that facilitate speech processing. Here, using MEG we directly investigated how these visual signals interact with rhythmic brain activity in participants listening to and seeing the speaker. First, we investigated coherence between oscillatory brain activity and speaker’s lip movements and demonstrated significant entrainment in visual cortex. We then used partial coherence to remove contributions of the coherent auditory speech signal from the lip-brain coherence. Comparing this synchronization between different attention conditions revealed that attending visual speech enhances the coherence between activity in visual cortex and the speaker’s lips. Further, we identified a significant partial coherence between left motor cortex and lip movements and this partial coherence directly predicted comprehension accuracy. Our results emphasize the importance of visually entrained and attention-modulated rhythmic brain activity for the enhancement of audiovisual speech processing. DOI: http://dx.doi.org/10.7554/eLife.14521.001 PMID:27146891
Linguistic experience and audio-visual perception of non-native fricatives.

PubMed

Wang, Yue; Behne, Dawn M; Jiang, Haisheng

2008-09-01

This study examined the effects of linguistic experience on audio-visual (AV) perception of non-native (L2) speech. Canadian English natives and Mandarin Chinese natives differing in degree of English exposure [long and short length of residence (LOR) in Canada] were presented with English fricatives of three visually distinct places of articulation: interdentals nonexistent in Mandarin and labiodentals and alveolars common in both languages. Stimuli were presented in quiet and in a cafe-noise background in four ways: audio only (A), visual only (V), congruent AV (AVc), and incongruent AV (AVi). Identification results showed that overall performance was better in the AVc than in the A or V condition and better in quiet than in cafe noise. While the Mandarin long LOR group approximated the native English patterns, the short LOR group showed poorer interdental identification, more reliance on visual information, and greater AV-fusion with the AVi materials, indicating the failure of L2 visual speech category formation with the short LOR non-natives and the positive effects of linguistic experience with the long LOR non-natives. These results point to an integrated network in AV speech processing as a function of linguistic background and provide evidence to extend auditory-based L2 speech learning theories to the visual domain.
Neural Correlates of Temporal Complexity and Synchrony during Audiovisual Correspondence Detection.

PubMed

Baumann, Oliver; Vromen, Joyce M G; Cheung, Allen; McFadyen, Jessica; Ren, Yudan; Guo, Christine C

2018-01-01

We often perceive real-life objects as multisensory cues through space and time. A key challenge for audiovisual integration is to match neural signals that not only originate from different sensory modalities but also that typically reach the observer at slightly different times. In humans, complex, unpredictable audiovisual streams lead to higher levels of perceptual coherence than predictable, rhythmic streams. In addition, perceptual coherence for complex signals seems less affected by increased asynchrony between visual and auditory modalities than for simple signals. Here, we used functional magnetic resonance imaging to determine the human neural correlates of audiovisual signals with different levels of temporal complexity and synchrony. Our study demonstrated that greater perceptual asynchrony and lower signal complexity impaired performance in an audiovisual coherence-matching task. Differences in asynchrony and complexity were also underpinned by a partially different set of brain regions. In particular, our results suggest that, while regions in the dorsolateral prefrontal cortex (DLPFC) were modulated by differences in memory load due to stimulus asynchrony, areas traditionally thought to be involved in speech production and recognition, such as the inferior frontal and superior temporal cortex, were modulated by the temporal complexity of the audiovisual signals. Our results, therefore, indicate specific processing roles for different subregions of the fronto-temporal cortex during audiovisual coherence detection.
Neural Correlates of Temporal Complexity and Synchrony during Audiovisual Correspondence Detection

PubMed Central

Ren, Yudan

2018-01-01

Abstract We often perceive real-life objects as multisensory cues through space and time. A key challenge for audiovisual integration is to match neural signals that not only originate from different sensory modalities but also that typically reach the observer at slightly different times. In humans, complex, unpredictable audiovisual streams lead to higher levels of perceptual coherence than predictable, rhythmic streams. In addition, perceptual coherence for complex signals seems less affected by increased asynchrony between visual and auditory modalities than for simple signals. Here, we used functional magnetic resonance imaging to determine the human neural correlates of audiovisual signals with different levels of temporal complexity and synchrony. Our study demonstrated that greater perceptual asynchrony and lower signal complexity impaired performance in an audiovisual coherence-matching task. Differences in asynchrony and complexity were also underpinned by a partially different set of brain regions. In particular, our results suggest that, while regions in the dorsolateral prefrontal cortex (DLPFC) were modulated by differences in memory load due to stimulus asynchrony, areas traditionally thought to be involved in speech production and recognition, such as the inferior frontal and superior temporal cortex, were modulated by the temporal complexity of the audiovisual signals. Our results, therefore, indicate specific processing roles for different subregions of the fronto-temporal cortex during audiovisual coherence detection. PMID:29354682
[Virtual audiovisual talking heads: articulatory data and models--applications].

PubMed

Badin, P; Elisei, F; Bailly, G; Savariaux, C; Serrurier, A; Tarabalka, Y

2007-01-01

In the framework of experimental phonetics, our approach to the study of speech production is based on the measurement, the analysis and the modeling of orofacial articulators such as the jaw, the face and the lips, the tongue or the velum. Therefore, we present in this article experimental techniques that allow characterising the shape and movement of speech articulators (static and dynamic MRI, computed tomodensitometry, electromagnetic articulography, video recording). We then describe the linear models of the various organs that we can elaborate from speaker-specific articulatory data. We show that these models, that exhibit a good geometrical resolution, can be controlled from articulatory data with a good temporal resolution and can thus permit the reconstruction of high quality animation of the articulators. These models, that we have integrated in a virtual talking head, can produce augmented audiovisual speech. In this framework, we have assessed the natural tongue reading capabilities of human subjects by means of audiovisual perception tests. We conclude by suggesting a number of other applications of talking heads.
Audiovisual integration of emotional signals in voice and face: an event-related fMRI study.

PubMed

Kreifelts, Benjamin; Ethofer, Thomas; Grodd, Wolfgang; Erb, Michael; Wildgruber, Dirk

2007-10-01

In a natural environment, non-verbal emotional communication is multimodal (i.e. speech melody, facial expression) and multifaceted concerning the variety of expressed emotions. Understanding these communicative signals and integrating them into a common percept is paramount to successful social behaviour. While many previous studies have focused on the neurobiology of emotional communication in the auditory or visual modality alone, far less is known about multimodal integration of auditory and visual non-verbal emotional information. The present study investigated this process using event-related fMRI. Behavioural data revealed that audiovisual presentation of non-verbal emotional information resulted in a significant increase in correctly classified stimuli when compared with visual and auditory stimulation. This behavioural gain was paralleled by enhanced activation in bilateral posterior superior temporal gyrus (pSTG) and right thalamus, when contrasting audiovisual to auditory and visual conditions. Further, a characteristic of these brain regions, substantiating their role in the emotional integration process, is a linear relationship between the gain in classification accuracy and the strength of the BOLD response during the bimodal condition. Additionally, enhanced effective connectivity between audiovisual integration areas and associative auditory and visual cortices was observed during audiovisual stimulation, offering further insight into the neural process accomplishing multimodal integration. Finally, we were able to document an enhanced sensitivity of the putative integration sites to stimuli with emotional non-verbal content as compared to neutral stimuli.
Effect of hearing loss on semantic access by auditory and audiovisual speech in children.

PubMed

Jerger, Susan; Tye-Murray, Nancy; Damian, Markus F; Abdi, Hervé

2013-01-01

This research studied whether the mode of input (auditory versus audiovisual) influenced semantic access by speech in children with sensorineural hearing impairment (HI). Participants, 31 children with HI and 62 children with normal hearing (NH), were tested with the authors' new multimodal picture word task. Children were instructed to name pictures displayed on a monitor and ignore auditory or audiovisual speech distractors. The semantic content of the distractors was varied to be related versus unrelated to the pictures (e.g., picture distractor of dog-bear versus dog-cheese, respectively). In children with NH, picture-naming times were slower in the presence of semantically related distractors. This slowing, called semantic interference, is attributed to the meaning-related picture-distractor entries competing for selection and control of the response (the lexical selection by competition hypothesis). Recently, a modification of the lexical selection by competition hypothesis, called the competition threshold (CT) hypothesis, proposed that (1) the competition between the picture-distractor entries is determined by a threshold, and (2) distractors with experimentally reduced fidelity cannot reach the CT. Thus, semantically related distractors with reduced fidelity do not produce the normal interference effect, but instead no effect or semantic facilitation (faster picture naming times for semantically related versus unrelated distractors). Facilitation occurs because the activation level of the semantically related distractor with reduced fidelity (1) is not sufficient to exceed the CT and produce interference but (2) is sufficient to activate its concept, which then strengthens the activation of the picture and facilitates naming. This research investigated whether the proposals of the CT hypothesis generalize to the auditory domain, to the natural degradation of speech due to HI, and to participants who are children. Our multimodal picture word task allowed us to (1) quantify picture naming results in the presence of auditory speech distractors and (2) probe whether the addition of visual speech enriched the fidelity of the auditory input sufficiently to influence results. In the HI group, the auditory distractors produced no effect or a facilitative effect, in agreement with proposals of the CT hypothesis. In contrast, the audiovisual distractors produced the normal semantic interference effect. Results in the HI versus NH groups differed significantly for the auditory mode, but not for the audiovisual mode. This research indicates that the lower fidelity auditory speech associated with HI affects the normalcy of semantic access by children. Further, adding visual speech enriches the lower fidelity auditory input sufficiently to produce the semantic interference effect typical of children with NH.
Shifts in Audiovisual Processing in Healthy Aging.

PubMed

Baum, Sarah H; Stevenson, Ryan

2017-09-01

The integration of information across sensory modalities into unified percepts is a fundamental sensory process upon which a multitude of cognitive processes are based. We review the body of literature exploring aging-related changes in audiovisual integration published over the last five years. Specifically, we review the impact of changes in temporal processing, the influence of the effectiveness of sensory inputs, the role of working memory, and the newer studies of intra-individual variability during these processes. Work in the last five years on bottom-up influences of sensory perception has garnered significant attention. Temporal processing, a driving factors of multisensory integration, has now been shown to decouple with multisensory integration in aging, despite their co-decline with aging. The impact of stimulus effectiveness also changes with age, where older adults show maximal benefit from multisensory gain at high signal-to-noise ratios. Following sensory decline, high working memory capacities have now been shown to be somewhat of a protective factor against age-related declines in audiovisual speech perception, particularly in noise. Finally, newer research is emerging focusing on the general intra-individual variability observed with aging. Overall, the studies of the past five years have replicated and expanded on previous work that highlights the role of bottom-up sensory changes with aging and their influence on audiovisual integration, as well as the top-down influence of working memory.
The early maximum likelihood estimation model of audiovisual integration in speech perception.

PubMed

Andersen, Tobias S

2015-05-01

Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk-MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely focused on the fuzzy logical model of perception (FLMP), which provides excellent fits to experimental observations but also has been criticized for being too flexible, post hoc and difficult to interpret. The current study introduces the early maximum likelihood estimation (MLE) model of audiovisual integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures favored more complex models. This difference between conventional error measures and cross-validation was found to be indicative of over-fitting in more complex models such as the FLMP.
Simulating reading acquisition: The link between reading outcome and multimodal brain signatures of letter-speech sound learning in prereaders.

PubMed

Karipidis, Iliana I; Pleisch, Georgette; Brandeis, Daniel; Roth, Alexander; Röthlisberger, Martina; Schneebeli, Maya; Walitza, Susanne; Brem, Silvia

2018-05-08

During reading acquisition, neural reorganization of the human brain facilitates the integration of letters and speech sounds, which enables successful reading. Neuroimaging and behavioural studies have established that impaired audiovisual integration of letters and speech sounds is a core deficit in individuals with developmental dyslexia. This longitudinal study aimed to identify neural and behavioural markers of audiovisual integration that are related to future reading fluency. We simulated the first step of reading acquisition by performing artificial-letter training with prereading children at risk for dyslexia. Multiple logistic regressions revealed that our training provides new precursors of reading fluency at the beginning of reading acquisition. In addition, an event-related potential around 400 ms and functional magnetic resonance imaging activation patterns in the left planum temporale to audiovisual correspondences improved cross-validated prediction of future poor readers. Finally, an exploratory analysis combining simultaneously acquired electroencephalography and hemodynamic data suggested that modulation of temporoparietal brain regions depended on future reading skills. The multimodal approach demonstrates neural adaptations to audiovisual integration in the developing brain that are related to reading outcome. Despite potential limitations arising from the restricted sample size, our results may have promising implications both for identifying poor-reading children and for monitoring early interventions.
Hearing disability and communication handicap for compensation purposes based on self-assessment and audiometric testing.

PubMed

Salomon, G; Parving, A

1985-01-01

It is reasoned that for compensation or epidemiological studies an evaluation of hearing disability and the concomitant handicap must include the ability to perceive visual cues. A scaling procedure for hearing- and audiovisual communication handicap is presented. The procedure deviates in two ways from previous handicap assessments: (1) It is based on individual self-assessment of semantic speech perception but can be implemented by means of professional audiological test procedures. (2) The system does not make use of pure-tone auditory thresholds as a predominant audiological principle, but is based on speech perception. The interrelationship between auditory and audiovisual handicap is evaluated. A total score including audio- and audiovisual perception handicap is proposed and a suggestion for disability percentages is presented.
Selective attention to a talker's mouth in infancy: role of audiovisual temporal synchrony and linguistic experience.

PubMed

Hillairet de Boisferon, Anne; Tift, Amy H; Minar, Nicholas J; Lewkowicz, David J

2017-05-01

Previous studies have found that infants shift their attention from the eyes to the mouth of a talker when they enter the canonical babbling phase after 6 months of age. Here, we investigated whether this increased attentional focus on the mouth is mediated by audio-visual synchrony and linguistic experience. To do so, we tracked eye gaze in 4-, 6-, 8-, 10-, and 12-month-old infants while they were exposed either to desynchronized native or desynchronized non-native audiovisual fluent speech. Results indicated that, regardless of language, desynchronization disrupted the usual pattern of relative attention to the eyes and mouth found in response to synchronized speech at 10 months but not at any other age. These findings show that audio-visual synchrony mediates selective attention to a talker's mouth just prior to the emergence of initial language expertise and that it declines in importance once infants become native-language experts. © 2016 John Wiley & Sons Ltd.
Visual-Auditory Integration during Speech Imitation in Autism

ERIC Educational Resources Information Center

Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

2004-01-01

Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…
Neural entrainment to rhythmic speech in children with developmental dyslexia

PubMed Central

Power, Alan J.; Mead, Natasha; Barnes, Lisa; Goswami, Usha

2013-01-01

A rhythmic paradigm based on repetition of the syllable “ba” was used to study auditory, visual, and audio-visual oscillatory entrainment to speech in children with and without dyslexia using EEG. Children pressed a button whenever they identified a delay in the isochronous stimulus delivery (500 ms; 2 Hz delta band rate). Response power, strength of entrainment and preferred phase of entrainment in the delta and theta frequency bands were compared between groups. The quality of stimulus representation was also measured using cross-correlation of the stimulus envelope with the neural response. The data showed a significant group difference in the preferred phase of entrainment in the delta band in response to the auditory and audio-visual stimulus streams. A different preferred phase has significant implications for the quality of speech information that is encoded neurally, as it implies enhanced neuronal processing (phase alignment) at less informative temporal points in the incoming signal. Consistent with this possibility, the cross-correlogram analysis revealed superior stimulus representation by the control children, who showed a trend for larger peak r-values and significantly later lags in peak r-values compared to participants with dyslexia. Significant relationships between both peak r-values and peak lags were found with behavioral measures of reading. The data indicate that the auditory temporal reference frame for speech processing is atypical in developmental dyslexia, with low frequency (delta) oscillations entraining to a different phase of the rhythmic syllabic input. This would affect the quality of encoding of speech, and could underlie the cognitive impairments in phonological representation that are the behavioral hallmark of this developmental disorder across languages. PMID:24376407
Role of Visual Speech in Phonological Processing by Children With Hearing Loss

PubMed Central

Jerger, Susan; Tye-Murray, Nancy; Abdi, Hervé

2011-01-01

Purpose This research assessed the influence of visual speech on phonological processing by children with hearing loss (HL). Method Children with HL and children with normal hearing (NH) named pictures while attempting to ignore auditory or audiovisual speech distractors whose onsets relative to the pictures were either congruent, conflicting in place of articulation, or conflicting in voicing—for example, the picture “pizza” coupled with the distractors “peach,” “teacher,” or “beast,” respectively. Speed of picture naming was measured. Results The conflicting conditions slowed naming, and phonological processing by children with HL displayed the age-related shift in sensitivity to visual speech seen in children with NH, although with developmental delay. Younger children with HL exhibited a disproportionately large influence of visual speech and a negligible influence of auditory speech, whereas older children with HL showed a robust influence of auditory speech with no benefit to performance from adding visual speech. The congruent conditions did not speed naming in children with HL, nor did the addition of visual speech influence performance. Unexpectedly, the /∧/-vowel congruent distractors slowed naming in children with HL and decreased articulatory proficiency. Conclusions Results for the conflicting conditions are consistent with the hypothesis that speech representations in children with HL (a) are initially disproportionally structured in terms of visual speech and (b) become better specified with age in terms of auditorily encoded information. PMID:19339701

The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure

PubMed Central

Stacey, Paula C.; Kitterick, Pádraig T.; Morris, Saffron D.; Sumner, Christian J.

2017-01-01

Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker's voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues. PMID:27085797
The neural processing of foreign-accented speech and its relationship to listener bias

PubMed Central

Yi, Han-Gyol; Smiljanic, Rajka; Chandrasekaran, Bharath

2014-01-01

Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners' perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013). Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI) study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants' Asian-foreign association was measured using an implicit association test (IAT), conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1) foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2) face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3) implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing. PMID:25339883
Psychophysics of the McGurk and Other Audiovisual Speech Integration Effects

ERIC Educational Resources Information Center

Jiang, Jintao; Bernstein, Lynne E.

2011-01-01

When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called "McGurk effect"), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for…
AN EXPERIMENTAL EVALUATION OF AUDIO-VISUAL METHODS--CHANGING ATTITUDES TOWARD EDUCATION.

ERIC Educational Resources Information Center

LOWELL, EDGAR L.; AND OTHERS

AUDIOVISUAL PROGRAMS FOR PARENTS OF DEAF CHILDREN WERE DEVELOPED AND EVALUATED. EIGHTEEN SOUND FILMS AND ACCOMPANYING RECORDS PRESENTED INFORMATION ON HEARING, LIPREADING AND SPEECH, AND ATTEMPTED TO CHANGE PARENTAL ATTITUDES TOWARD CHILDREN AND SPOUSES. TWO VERSIONS OF THE FILMS AND RECORDS WERE NARRATED BY (1) "STARS" WHO WERE…
The functional and structural asymmetries of the superior temporal sulcus.

PubMed

Specht, Karsten; Wigglesworth, Philip

2018-02-01

The superior temporal sulcus (STS) is an anatomical structure that increasingly interests researchers. This structure appears to receive multisensory input and is involved in several perceptual and cognitive core functions, such as speech perception, audiovisual integration, (biological) motion processing and theory of mind capacities. In addition, the superior temporal sulcus is not only one of the longest sulci of the brain, but it also shows marked functional and structural asymmetries, some of which have only been found in humans. To explore the functional-structural relationships of these asymmetries in more detail, this study combines functional and structural magnetic resonance imaging. Using a speech perception task, an audiovisual integration task, and a theory of mind task, this study again demonstrated an involvement of the STS in these processes, with an expected strong leftward asymmetry for the speech perception task. Furthermore, this study confirmed the earlier described, human-specific asymmetries, namely that the left STS is longer than the right STS and that the right STS is deeper than the left STS. However, this study did not find any relationship between these structural asymmetries and the detected brain activations or their functional asymmetries. This can, on the other hand, give further support to the notion that the structural asymmetry of the STS is not directly related to the functional asymmetry of the speech perception and the language system as a whole, but that it may have other causes and functions. © 2018 The Authors. Scandinavian Journal of Psychology published by Scandinavian Psychological Associations and John Wiley & Sons Ltd.
Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

ERIC Educational Resources Information Center

Altvater-Mackensen, Nicole; Grossmann, Tobias

2015-01-01

Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…
Audio-Visual Speech Perception: A Developmental ERP Investigation

ERIC Educational Resources Information Center

Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

2014-01-01

Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…
An audiovisual emotion recognition system

NASA Astrophysics Data System (ADS)

Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun

2007-12-01

Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.
Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs

PubMed Central

ten Oever, Sanne; Sack, Alexander T.; Wheat, Katherine L.; Bien, Nina; van Atteveldt, Nienke

2013-01-01

Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception. PMID:23805110
Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs.

PubMed

Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

2013-01-01

Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.
Looking Behavior and Audiovisual Speech Understanding in Children With Normal Hearing and Children With Mild Bilateral or Unilateral Hearing Loss.

PubMed

Lewis, Dawna E; Smith, Nicholas A; Spalding, Jody L; Valente, Daniel L

Visual information from talkers facilitates speech intelligibility for listeners when audibility is challenged by environmental noise and hearing loss. Less is known about how listeners actively process and attend to visual information from different talkers in complex multi-talker environments. This study tracked looking behavior in children with normal hearing (NH), mild bilateral hearing loss (MBHL), and unilateral hearing loss (UHL) in a complex multi-talker environment to examine the extent to which children look at talkers and whether looking patterns relate to performance on a speech-understanding task. It was hypothesized that performance would decrease as perceptual complexity increased and that children with hearing loss would perform more poorly than their peers with NH. Children with MBHL or UHL were expected to demonstrate greater attention to individual talkers during multi-talker exchanges, indicating that they were more likely to attempt to use visual information from talkers to assist in speech understanding in adverse acoustics. It also was of interest to examine whether MBHL, versus UHL, would differentially affect performance and looking behavior. Eighteen children with NH, eight children with MBHL, and 10 children with UHL participated (8-12 years). They followed audiovisual instructions for placing objects on a mat under three conditions: a single talker providing instructions via a video monitor, four possible talkers alternately providing instructions on separate monitors in front of the listener, and the same four talkers providing both target and nontarget information. Multi-talker background noise was presented at a 5 dB signal-to-noise ratio during testing. An eye tracker monitored looking behavior while children performed the experimental task. Behavioral task performance was higher for children with NH than for either group of children with hearing loss. There were no differences in performance between children with UHL and children with MBHL. Eye-tracker analysis revealed that children with NH looked more at the screens overall than did children with MBHL or UHL, though individual differences were greater in the groups with hearing loss. Listeners in all groups spent a small proportion of time looking at relevant screens as talkers spoke. Although looking was distributed across all screens, there was a bias toward the right side of the display. There was no relationship between overall looking behavior and performance on the task. The present study examined the processing of audiovisual speech in the context of a naturalistic task. Results demonstrated that children distributed their looking to a variety of sources during the task, but that children with NH were more likely to look at screens than were those with MBHL/UHL. However, all groups looked at the relevant talkers as they were speaking only a small proportion of the time. Despite variability in looking behavior, listeners were able to follow the audiovisual instructions and children with NH demonstrated better performance than children with MBHL/UHL. These results suggest that performance on some challenging multi-talker audiovisual tasks is not dependent on visual fixation to relevant talkers for children with NH or with MBHL/UHL.
Musicians have enhanced audiovisual multisensory binding: experience-dependent effects in the double-flash illusion.

PubMed

Bidelman, Gavin M

2016-10-01

Musical training is associated with behavioral and neurophysiological enhancements in auditory processing for both musical and nonmusical sounds (e.g., speech). Yet, whether the benefits of musicianship extend beyond enhancements to auditory-specific skills and impact multisensory (e.g., audiovisual) processing has yet to be fully validated. Here, we investigated multisensory integration of auditory and visual information in musicians and nonmusicians using a double-flash illusion, whereby the presentation of multiple auditory stimuli (beeps) concurrent with a single visual object (flash) induces an illusory perception of multiple flashes. We parametrically varied the onset asynchrony between auditory and visual events (leads and lags of ±300 ms) to quantify participants' "temporal window" of integration, i.e., stimuli in which auditory and visual cues were fused into a single percept. Results show that musically trained individuals were both faster and more accurate at processing concurrent audiovisual cues than their nonmusician peers; nonmusicians had a higher susceptibility for responding to audiovisual illusions and perceived double flashes over an extended range of onset asynchronies compared to trained musicians. Moreover, temporal window estimates indicated that musicians' windows (<100 ms) were ~2-3× shorter than nonmusicians' (~200 ms), suggesting more refined multisensory integration and audiovisual binding. Collectively, findings indicate a more refined binding of auditory and visual cues in musically trained individuals. We conclude that experience-dependent plasticity of intensive musical experience extends beyond simple listening skills, improving multimodal processing and the integration of multiple sensory systems in a domain-general manner.
Evaluating the Effort Expended to Understand Speech in Noise Using a Dual-Task Paradigm: The Effects of Providing Visual Speech Cues

ERIC Educational Resources Information Center

Fraser, Sarah; Gagne, Jean-Pierre; Alepins, Majolaine; Dubois, Pascale

2010-01-01

Purpose: Using a dual-task paradigm, 2 experiments (Experiments 1 and 2) were conducted to assess differences in the amount of listening effort expended to understand speech in noise in audiovisual (AV) and audio-only (A-only) modalities. Experiment 1 had equivalent noise levels in both modalities, and Experiment 2 equated speech recognition…
Audiovisual Speech Web-Lab: an Internet teaching and research laboratory.

PubMed

Gordon, M S; Rosenblum, L D

2001-05-01

Internet resources now enable laboratories to make full-length experiments available on line. A handful of existing web sites offer users the ability to participate in experiments and generate usable data. We have integrated this technology into a web site that also provides full discussion of the theoretical and methodological aspects of the experiments using text and simple interactive demonstrations. The content of the web site (http://www.psych.ucr.edu/avspeech/lab) concerns audiovisual speech perception and its relation to face perception. The site is designed to be useful for users of multiple interests and levels of expertise.
Integrating hidden Markov model and PRAAT: a toolbox for robust automatic speech transcription

NASA Astrophysics Data System (ADS)

Kabir, A.; Barker, J.; Giurgiu, M.

2010-09-01

An automatic time-aligned phone transcription toolbox of English speech corpora has been developed. Especially the toolbox would be very useful to generate robust automatic transcription and able to produce phone level transcription using speaker independent models as well as speaker dependent models without manual intervention. The system is based on standard Hidden Markov Models (HMM) approach and it was successfully experimented over a large audiovisual speech corpus namely GRID corpus. One of the most powerful features of the toolbox is the increased flexibility in speech processing where the speech community would be able to import the automatic transcription generated by HMM Toolkit (HTK) into a popular transcription software, PRAAT, and vice-versa. The toolbox has been evaluated through statistical analysis on GRID data which shows that automatic transcription deviates by an average of 20 ms with respect to manual transcription.
The influence of selective attention to auditory and visual speech on the integration of audiovisual speech information.

PubMed

Buchan, Julie N; Munhall, Kevin G

2011-01-01

Conflicting visual speech information can influence the perception of acoustic speech, causing an illusory percept of a sound not present in the actual acoustic speech (the McGurk effect). We examined whether participants can voluntarily selectively attend to either the auditory or visual modality by instructing participants to pay attention to the information in one modality and to ignore competing information from the other modality. We also examined how performance under these instructions was affected by weakening the influence of the visual information by manipulating the temporal offset between the audio and video channels (experiment 1), and the spatial frequency information present in the video (experiment 2). Gaze behaviour was also monitored to examine whether attentional instructions influenced the gathering of visual information. While task instructions did have an influence on the observed integration of auditory and visual speech information, participants were unable to completely ignore conflicting information, particularly information from the visual stream. Manipulating temporal offset had a more pronounced interaction with task instructions than manipulating the amount of visual information. Participants' gaze behaviour suggests that the attended modality influences the gathering of visual information in audiovisual speech perception.
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

PubMed Central

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca’s aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca’s aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment. PMID:23250889
Perception of Audio-Visual Speech Synchrony in Spanish-Speaking Children with and without Specific Language Impairment

ERIC Educational Resources Information Center

Pons, Ferran; Andreu, Llorenc; Sanz-Torrent, Monica; Buil-Legaz, Lucia; Lewkowicz, David J.

2013-01-01

Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the…
Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

ERIC Educational Resources Information Center

Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

2011-01-01

Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…
Comparisons of Audio and Audiovisual Measures of Stuttering Frequency and Severity in Preschool-Age Children

ERIC Educational Resources Information Center

Rousseau, Isabelle; Onslow, Mark; Packman, Ann; Jones, Mark

2008-01-01

Purpose: To determine whether measures of stuttering frequency and measures of overall stuttering severity in preschoolers differ when made from audio-only recordings compared with audiovisual recordings. Method: Four blinded speech-language pathologists who had extensive experience with preschoolers who stutter measured stuttering frequency and…

Multi-microphone adaptive array augmented with visual cueing.

PubMed

Gibson, Paul L; Hedin, Dan S; Davies-Venn, Evelyn E; Nelson, Peggy; Kramer, Kevin

2012-01-01

We present the development of an audiovisual array that enables hearing aid users to converse with multiple speakers in reverberant environments with significant speech babble noise where their hearing aids do not function well. The system concept consists of a smartphone, a smartphone accessory, and a smartphone software application. The smartphone accessory concept is a multi-microphone audiovisual array in a form factor that allows attachment to the back of the smartphone. The accessory will also contain a lower power radio by which it can transmit audio signals to compatible hearing aids. The smartphone software application concept will use the smartphone's built in camera to acquire images and perform real-time face detection using the built-in face detection support of the smartphone. The audiovisual beamforming algorithm uses the location of talking targets to improve the signal to noise ratio and consequently improve the user's speech intelligibility. Since the proposed array system leverages a handheld consumer electronic device, it will be portable and low cost. A PC based experimental system was developed to demonstrate the feasibility of an audiovisual multi-microphone array and these results are presented.
Development of a test battery for evaluating speech perception in complex listening environments.

PubMed

Brungart, Douglas S; Sheffield, Benjamin M; Kubli, Lina R

2014-08-01

In the real world, spoken communication occurs in complex environments that involve audiovisual speech cues, spatially separated sound sources, reverberant listening spaces, and other complicating factors that influence speech understanding. However, most clinical tools for assessing speech perception are based on simplified listening environments that do not reflect the complexities of real-world listening. In this study, speech materials from the QuickSIN speech-in-noise test by Killion, Niquette, Gudmundsen, Revit, and Banerjee [J. Acoust. Soc. Am. 116, 2395-2405 (2004)] were modified to simulate eight listening conditions spanning the range of auditory environments listeners encounter in everyday life. The standard QuickSIN test method was used to estimate 50% speech reception thresholds (SRT50) in each condition. A method of adjustment procedure was also used to obtain subjective estimates of the lowest signal-to-noise ratio (SNR) where the listeners were able to understand 100% of the speech (SRT100) and the highest SNR where they could detect the speech but could not understand any of the words (SRT0). The results show that the modified materials maintained most of the efficiency of the QuickSIN test procedure while capturing performance differences across listening conditions comparable to those reported in previous studies that have examined the effects of audiovisual cues, binaural cues, room reverberation, and time compression on the intelligibility of speech.
fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex.

PubMed

van Atteveldt, Nienke M; Blau, Vera C; Blomert, Leo; Goebel, Rainer

2010-02-02

Efficient multisensory integration is of vital importance for adequate interaction with the environment. In addition to basic binding cues like temporal and spatial coherence, meaningful multisensory information is also bound together by content-based associations. Many functional Magnetic Resonance Imaging (fMRI) studies propose the (posterior) superior temporal cortex (STC) as the key structure for integrating meaningful multisensory information. However, a still unanswered question is how superior temporal cortex encodes content-based associations, especially in light of inconsistent results from studies comparing brain activation to semantically matching (congruent) versus nonmatching (incongruent) multisensory inputs. Here, we used fMR-adaptation (fMR-A) in order to circumvent potential problems with standard fMRI approaches, including spatial averaging and amplitude saturation confounds. We presented repetitions of audiovisual stimuli (letter-speech sound pairs) and manipulated the associative relation between the auditory and visual inputs (congruent/incongruent pairs). We predicted that if multisensory neuronal populations exist in STC and encode audiovisual content relatedness, adaptation should be affected by the manipulated audiovisual relation. The results revealed an occipital-temporal network that adapted independently of the audiovisual relation. Interestingly, several smaller clusters distributed over superior temporal cortex within that network, adapted stronger to congruent than to incongruent audiovisual repetitions, indicating sensitivity to content congruency. These results suggest that the revealed clusters contain multisensory neuronal populations that encode content relatedness by selectively responding to congruent audiovisual inputs, since unisensory neuronal populations are assumed to be insensitive to the audiovisual relation. These findings extend our previously revealed mechanism for the integration of letters and speech sounds and demonstrate that fMR-A is sensitive to multisensory congruency effects that may not be revealed in BOLD amplitude per se.
Audiovisual spoken word recognition as a clinical criterion for sensory aids efficiency in Persian-language children with hearing loss.

PubMed

Oryadi-Zanjani, Mohammad Majid; Vahab, Maryam; Bazrafkan, Mozhdeh; Haghjoo, Asghar

2015-12-01

The aim of this study was to examine the role of audiovisual speech recognition as a clinical criterion of cochlear implant or hearing aid efficiency in Persian-language children with severe-to-profound hearing loss. This research was administered as a cross-sectional study. The sample size was 60 Persian 5-7 year old children. The assessment tool was one of subtests of Persian version of the Test of Language Development-Primary 3. The study included two experiments: auditory-only and audiovisual presentation conditions. The test was a closed-set including 30 words which were orally presented by a speech-language pathologist. The scores of audiovisual word perception were significantly higher than auditory-only condition in the children with normal hearing (P<0.01) and cochlear implant (P<0.05); however, in the children with hearing aid, there was no significant difference between word perception score in auditory-only and audiovisual presentation conditions (P>0.05). The audiovisual spoken word recognition can be applied as a clinical criterion to assess the children with severe to profound hearing loss in order to find whether cochlear implant or hearing aid has been efficient for them or not; i.e. if a child with hearing impairment who using CI or HA can obtain higher scores in audiovisual spoken word recognition than auditory-only condition, his/her auditory skills have appropriately developed due to effective CI or HA as one of the main factors of auditory habilitation. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

PubMed

Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

2015-07-01

It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line with the 'auditory-visual view' of auditory speech perception, which assumes that auditory speech recognition is optimized by using predictions from previously encoded speaker-specific audio-visual internal models. Copyright © 2015 Elsevier Ltd. All rights reserved.
Putative mechanisms mediating tolerance for audiovisual stimulus onset asynchrony.

PubMed

Bhat, Jyoti; Miller, Lee M; Pitt, Mark A; Shahin, Antoine J

2015-03-01

Audiovisual (AV) speech perception is robust to temporal asynchronies between visual and auditory stimuli. We investigated the neural mechanisms that facilitate tolerance for audiovisual stimulus onset asynchrony (AVOA) with EEG. Individuals were presented with AV words that were asynchronous in onsets of voice and mouth movement and judged whether they were synchronous or not. Behaviorally, individuals tolerated (perceived as synchronous) longer AVOAs when mouth movement preceded the speech (V-A) stimuli than when the speech preceded mouth movement (A-V). Neurophysiologically, the P1-N1-P2 auditory evoked potentials (AEPs), time-locked to sound onsets and known to arise in and surrounding the primary auditory cortex (PAC), were smaller for the in-sync than the out-of-sync percepts. Spectral power of oscillatory activity in the beta band (14-30 Hz) following the AEPs was larger during the in-sync than out-of-sync perception for both A-V and V-A conditions. However, alpha power (8-14 Hz), also following AEPs, was larger for the in-sync than out-of-sync percepts only in the V-A condition. These results demonstrate that AVOA tolerance is enhanced by inhibiting low-level auditory activity (e.g., AEPs representing generators in and surrounding PAC) that code for acoustic onsets. By reducing sensitivity to acoustic onsets, visual-to-auditory onset mapping is weakened, allowing for greater AVOA tolerance. In contrast, beta and alpha results suggest the involvement of higher-level neural processes that may code for language cues (phonetic, lexical), selective attention, and binding of AV percepts, allowing for wider neural windows of temporal integration, i.e., greater AVOA tolerance. Copyright © 2015 the American Physiological Society.
The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude envelope cues.

PubMed

Chuen, Lorraine; Schutz, Michael

2016-07-01

An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities.
[Test set for the evaluation of hearing and speech development after cochlear implantation in children].

PubMed

Lamprecht-Dinnesen, A; Sick, U; Sandrieser, P; Illg, A; Lesinski-Schiedat, A; Döring, W H; Müller-Deile, J; Kiefer, J; Matthias, K; Wüst, A; Konradi, E; Riebandt, M; Matulat, P; Von Der Haar-Heise, S; Swart, J; Elixmann, K; Neumann, K; Hildmann, A; Coninx, F; Meyer, V; Gross, M; Kruse, E; Lenarz, T

2002-10-01

Since autumn 1998 the multicenter interdisciplinary study group "Test Materials for CI Children" has been compiling a uniform examination tool for evaluation of speech and hearing development after cochlear implantation in childhood. After studying the relevant literature, suitable materials were checked for practical applicability, modified and provided with criteria for execution and break-off. For data acquisition, observation forms for preparation of a PC-version were developed. The evaluation set contains forms for master data with supplements relating to postoperative processes. The hearing tests check supra-threshold hearing with loudness scaling for children, speech comprehension in silence (Mainz and Göttingen Test for Speech Comprehension in Childhood) and phonemic differentiation (Oldenburg Rhyme Test for Children), the central auditory processes of detection, discrimination, identification and recognition (modification of the "Frankfurt Functional Hearing Test for Children") and audiovisual speech perception (Open Paragraph Tracking, Kiel Speech Track Program). The materials for speech and language development comprise phonetics-phonology, lexicon and semantics (LOGO Pronunciation Test), syntax and morphology (analysis of spontaneous speech), language comprehension (Reynell Scales), communication and pragmatics (observation forms). The MAIS and MUSS modified questionnaires are integrated. The evaluation set serves quality assurance and permits factor analysis as well as controls for regularity through the multicenter comparison of long-term developmental trends after cochlear implantation.
Large Vocabulary Audio-Visual Speech Recognition

DTIC Science & Technology

2002-06-12

www.is.cs.cmu.edu Email: waibel(a)cs.cmu~edu Inttractive Systenms Labs ttoctis Ssstms Labs Meeting Browser - -- Interpreting Human Communication "Why did...Speech Interacti Stams Labs t-cive Systms Focus of Attention Tracking Conclusion - Complete Model of Human Communication is Needed - Include all
Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech.

PubMed

Broderick, Michael P; Anderson, Andrew J; Di Liberto, Giovanni M; Crosse, Michael J; Lalor, Edmund C

2018-03-05

People routinely hear and understand speech at rates of 120-200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words' meanings in an approximately time-locked fashion. However, electrophysiological evidence for such time-locked processing has been lacking for continuous speech. Although valuable insights into semantic processing have been provided by the "N400 component" of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, with less emphasis on natural, narrative speech comprehension. Building on the discovery that cortical activity "tracks" the dynamics of running speech [7-9] and psycholinguistic work demonstrating [10-12] and modeling [13-15] how context impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech comprehension. We used a computational model [16] to quantify the meaning carried by words based on how semantically dissimilar they were to their preceding context and then regressed this measure against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time lag of 200-600 ms on centro-parietal EEG channels, characteristics common to the N400. Applying this approach to EEG datasets involving time-reversed speech, cocktail party attention, and audiovisual speech-in-noise demonstrated that this response was very sensitive to whether or not subjects understood the speech they heard. These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion. Copyright © 2018 Elsevier Ltd. All rights reserved.
Mu Wave Suppression during the Perception of Meaningless Syllables: EEG Evidence of Motor Recruitment

ERIC Educational Resources Information Center

Crawcour, Stephen; Bowers, Andrew; Harkrider, Ashley; Saltuklaroglu, Tim

2009-01-01

Motor involvement in speech perception has been recently studied using a variety of techniques. In the current study, EEG measurements from Cz, C3 and C4 electrodes were used to examine the relative power of the mu rhythm (i.e., 8-13 Hz) in response to various audio-visual speech and non-speech stimuli, as suppression of these rhythms is…
Text as a Supplement to Speech in Young and Older Adults a)

PubMed Central

Krull, Vidya; Humes, Larry E.

2015-01-01

Objective The purpose of this experiment was to quantify the contribution of visual text to auditory speech recognition in background noise. Specifically, we tested the hypothesis that partially accurate visual text from an automatic speech recognizer could be used successfully to supplement speech understanding in difficult listening conditions in older adults, with normal or impaired hearing. Our working hypotheses were based on what is known regarding audiovisual speech perception in the elderly from speechreading literature. We hypothesized that: 1) combining auditory and visual text information will result in improved recognition accuracy compared to auditory or visual text information alone; 2) benefit from supplementing speech with visual text (auditory and visual enhancement) in young adults will be greater than that in older adults; and 3) individual differences in performance on perceptual measures would be associated with cognitive abilities. Design Fifteen young adults with normal hearing, fifteen older adults with normal hearing, and fifteen older adults with hearing loss participated in this study. All participants completed sentence recognition tasks in auditory-only, text-only, and combined auditory-text conditions. The auditory sentence stimuli were spectrally shaped to restore audibility for the older participants with impaired hearing. All participants also completed various cognitive measures, including measures of working memory, processing speed, verbal comprehension, perceptual and cognitive speed, processing efficiency, inhibition, and the ability to form wholes from parts. Group effects were examined for each of the perceptual and cognitive measures. Audiovisual benefit was calculated relative to performance on auditory-only and visual-text only conditions. Finally, the relationship between perceptual measures and other independent measures were examined using principal-component factor analyses, followed by regression analyses. Results Both young and older adults performed similarly on nine out of ten perceptual measures (auditory, visual, and combined measures). Combining degraded speech with partially correct text from an automatic speech recognizer improved the understanding of speech in both young and older adults, relative to both auditory- and text-only performance. In all subjects, cognition emerged as a key predictor for a general speech-text integration ability. Conclusions These results suggest that neither age nor hearing loss affected the ability of subjects to benefit from text when used to support speech, after ensuring audibility through spectral shaping. These results also suggest that the benefit obtained by supplementing auditory input with partially accurate text is modulated by cognitive ability, specifically lexical and verbal skills. PMID:26458131
MPEG-7 audio-visual indexing test-bed for video retrieval

NASA Astrophysics Data System (ADS)

Gagnon, Langis; Foucher, Samuel; Gouaillier, Valerie; Brun, Christelle; Brousseau, Julie; Boulianne, Gilles; Osterrath, Frederic; Chapdelaine, Claude; Dutrisac, Julie; St-Onge, Francis; Champagne, Benoit; Lu, Xiaojian

2003-12-01

This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 Audio-Visual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual content like face recognition, motion activity, speech recognition and semantic clustering. The MPEG-7/XML encoding of the film database is done off-line. The description decomposition is based on a temporal decomposition into visual segments (shots), key frames and audio/speech sub-segments. The visible outcome will be a web site that allows video retrieval using a proprietary XQuery-based search engine and accessible to members at the Canadian National Film Board (NFB) Cineroute site. For example, end-user will be able to ask to point on movie shots in the database that have been produced in a specific year, that contain the face of a specific actor who tells a specific word and in which there is no motion activity. Video streaming is performed over the high bandwidth CA*net network deployed by CANARIE, a public Canadian Internet development organization.
Audiovisual Integration in Children Listening to Spectrally Degraded Speech

ERIC Educational Resources Information Center

Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal

2015-01-01

Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…
Impact of Language on Development of Auditory-Visual Speech Perception

ERIC Educational Resources Information Center

Sekiyama, Kaoru; Burnham, Denis

2008-01-01

The McGurk effect paradigm was used to examine the developmental onset of inter-language differences between Japanese and English in auditory-visual speech perception. Participants were asked to identify syllables in audiovisual (with congruent or discrepant auditory and visual components), audio-only, and video-only presentations at various…
Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training.

PubMed

Bernstein, Lynne E; Auer, Edward T; Eberhardt, Silvio P; Jiang, Jintao

2013-01-01

Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.
Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training

PubMed Central

Bernstein, Lynne E.; Auer, Edward T.; Eberhardt, Silvio P.; Jiang, Jintao

2013-01-01

Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called “reverse hierarchy theory” of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning. PMID:23515520
How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

PubMed

Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

2013-01-01

In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1) the auditory system, (2) supramodal representations, and (3) frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal signal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for "reading" texts at ultra-fast speaking rates (>16 syllables/s), exceeding by far the normal range of 6 syllables/s. A functional magnetic resonance imaging study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the "bottleneck" for understanding time-compressed speech seems related to higher demands for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments on cross-modal adjustments during space, time, and object recognition.
How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

PubMed Central

Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

2013-01-01

In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1) the auditory system, (2) supramodal representations, and (3) frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal signal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for “reading” texts at ultra-fast speaking rates (>16 syllables/s), exceeding by far the normal range of 6 syllables/s. A functional magnetic resonance imaging study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the “bottleneck” for understanding time-compressed speech seems related to higher demands for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments on cross-modal adjustments during space, time, and object recognition. PMID:23966968
Cross-modal enhancement of speech detection in young and older adults: does signal content matter?

PubMed

Tye-Murray, Nancy; Spehar, Brent; Myerson, Joel; Sommers, Mitchell S; Hale, Sandra

2011-01-01

The purpose of the present study was to examine the effects of age and visual content on cross-modal enhancement of auditory speech detection. Visual content consisted of three clearly distinct types of visual information: an unaltered video clip of a talker's face, a low-contrast version of the same clip, and a mouth-like Lissajous figure. It was hypothesized that both young and older adults would exhibit reduced enhancement as visual content diverged from the original clip of the talker's face, but that the decrease would be greater for older participants. Nineteen young adults and 19 older adults were asked to detect a single spoken syllable (/ba/) in speech-shaped noise, and the level of the signal was adaptively varied to establish the signal-to-noise ratio (SNR) at threshold. There was an auditory-only baseline condition and three audiovisual conditions in which the syllable was accompanied by one of the three visual signals (the unaltered clip of the talker's face, the low-contrast version of that clip, or the Lissajous figure). For each audiovisual condition, the SNR at threshold was compared with the SNR at threshold for the auditory-only condition to measure the amount of cross-modal enhancement. Young adults exhibited significant cross-modal enhancement with all three types of visual stimuli, with the greatest amount of enhancement observed for the unaltered clip of the talker's face. Older adults, in contrast, exhibited significant cross-modal enhancement only with the unaltered face. Results of this study suggest that visual signal content affects cross-modal enhancement of speech detection in both young and older adults. They also support a hypothesized age-related deficit in processing low-contrast visual speech stimuli, even in older adults with normal contrast sensitivity.

Hearing Lips and Seeing Voices: How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception

PubMed Central

Skipper, Jeremy I.; van Wassenhove, Virginie; Nusbaum, Howard C.; Small, Steven L.

2009-01-01

Observing a speaker’s mouth profoundly influences speech perception. For example, listeners perceive an “illusory” “ta” when the video of a face producing /ka/ is dubbed onto an audio /pa/. Here, we show how cortical areas supporting speech production mediate this illusory percept and audiovisual (AV) speech perception more generally. Specifically, cortical activity during AV speech perception occurs in many of the same areas that are active during speech production. We find that different perceptions of the same syllable and the perception of different syllables are associated with different distributions of activity in frontal motor areas involved in speech production. Activity patterns in these frontal motor areas resulting from the illusory “ta” percept are more similar to the activity patterns evoked by AV/ta/ than they are to patterns evoked by AV/pa/ or AV/ka/. In contrast to the activity in frontal motor areas, stimulus-evoked activity for the illusory “ta” in auditory and somatosensory areas and visual areas initially resembles activity evoked by AV/pa/ and AV/ka/, respectively. Ultimately, though, activity in these regions comes to resemble activity evoked by AV/ta/. Together, these results suggest that AV speech elicits in the listener a motor plan for the production of the phoneme that the speaker might have been attempting to produce, and that feedback in the form of efference copy from the motor system ultimately influences the phonetic interpretation. PMID:17218482
Audiovisual Simultaneity Judgment and Rapid Recalibration throughout the Lifespan.

PubMed

Noel, Jean-Paul; De Niear, Matthew; Van der Burg, Erik; Wallace, Mark T

2016-01-01

Multisensory interactions are well established to convey an array of perceptual and behavioral benefits. One of the key features of multisensory interactions is the temporal structure of the stimuli combined. In an effort to better characterize how temporal factors influence multisensory interactions across the lifespan, we examined audiovisual simultaneity judgment and the degree of rapid recalibration to paired audiovisual stimuli (Flash-Beep and Speech) in a sample of 220 participants ranging from 7 to 86 years of age. Results demonstrate a surprisingly protracted developmental time-course for both audiovisual simultaneity judgment and rapid recalibration, with neither reaching maturity until well into adolescence. Interestingly, correlational analyses revealed that audiovisual simultaneity judgments (i.e., the size of the audiovisual temporal window of simultaneity) and rapid recalibration significantly co-varied as a function of age. Together, our results represent the most complete description of age-related changes in audiovisual simultaneity judgments to date, as well as being the first to describe changes in the degree of rapid recalibration as a function of age. We propose that the developmental time-course of rapid recalibration scaffolds the maturation of more durable audiovisual temporal representations.
A Computational Analysis of Neural Mechanisms Underlying the Maturation of Multisensory Speech Integration in Neurotypical Children and Those on the Autism Spectrum

PubMed Central

Cuppini, Cristiano; Ursino, Mauro; Magosso, Elisa; Ross, Lars A.; Foxe, John J.; Molholm, Sophie

2017-01-01

Failure to appropriately develop multisensory integration (MSI) of audiovisual speech may affect a child's ability to attain optimal communication. Studies have shown protracted development of MSI into late-childhood and identified deficits in MSI in children with an autism spectrum disorder (ASD). Currently, the neural basis of acquisition of this ability is not well understood. Here, we developed a computational model informed by neurophysiology to analyze possible mechanisms underlying MSI maturation, and its delayed development in ASD. The model posits that strengthening of feedforward and cross-sensory connections, responsible for the alignment of auditory and visual speech sound representations in posterior superior temporal gyrus/sulcus, can explain behavioral data on the acquisition of MSI. This was simulated by a training phase during which the network was exposed to unisensory and multisensory stimuli, and projections were crafted by Hebbian rules of potentiation and depression. In its mature architecture, the network also reproduced the well-known multisensory McGurk speech effect. Deficits in audiovisual speech perception in ASD were well accounted for by fewer multisensory exposures, compatible with a lack of attention, but not by reduced synaptic connectivity or synaptic plasticity. PMID:29163099
Kernel-Based Sensor Fusion With Application to Audio-Visual Voice Activity Detection

NASA Astrophysics Data System (ADS)

Dov, David; Talmon, Ronen; Cohen, Israel

2016-12-01

In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance.
The effect of a concurrent working memory task and temporal offsets on the integration of auditory and visual speech information.

PubMed

Buchan, Julie N; Munhall, Kevin G

2012-01-01

Audiovisual speech perception is an everyday occurrence of multisensory integration. Conflicting visual speech information can influence the perception of acoustic speech (namely the McGurk effect), and auditory and visual speech are integrated over a rather wide range of temporal offsets. This research examined whether the addition of a concurrent cognitive load task would affect the audiovisual integration in a McGurk speech task and whether the cognitive load task would cause more interference at increasing offsets. The amount of integration was measured by the proportion of responses in incongruent trials that did not correspond to the audio (McGurk response). An eye-tracker was also used to examine whether the amount of temporal offset and the presence of a concurrent cognitive load task would influence gaze behavior. Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets. Participant's gaze behavior was also influenced by the addition of a cognitive load task. Gaze was less centralized on the face, less time was spent looking at the mouth and more time was spent looking at the eyes, when a concurrent cognitive load task was added to the speech task.
The Function of Consciousness in Multisensory Integration

ERIC Educational Resources Information Center

Palmer, Terry D.; Ramsey, Ashley K.

2012-01-01

The function of consciousness was explored in two contexts of audio-visual speech, cross-modal visual attention guidance and McGurk cross-modal integration. Experiments 1, 2, and 3 utilized a novel cueing paradigm in which two different flash suppressed lip-streams cooccured with speech sounds matching one of these streams. A visual target was…
Effect of Perceptual Load on Semantic Access by Speech in Children

ERIC Educational Resources Information Center

Jerger, Susan; Damian, Markus F.; Mills, Candice; Bartlett, James; Tye-Murray, Nancy; Abdi, Herve

2013-01-01

Purpose: To examine whether semantic access by speech requires attention in children. Method: Children ("N" = 200) named pictures and ignored distractors on a cross-modal (distractors: auditory-no face) or multimodal (distractors: auditory-static face and audiovisual- dynamic face) picture word task. The cross-modal task had a low load,…
Recognition of Amodal Language Identity Emerges in Infancy

ERIC Educational Resources Information Center

Lewkowicz, David J.; Pons, Ferran

2013-01-01

Audiovisual speech consists of overlapping and invariant patterns of dynamic acoustic and optic articulatory information. Research has shown that infants can perceive a variety of basic auditory-visual (A-V) relations but no studies have investigated whether and when infants begin to perceive higher order A-V relations inherent in speech. Here, we…
Binding and unbinding the auditory and visual streams in the McGurk effect.

PubMed

Nahorna, Olha; Berthommier, Frédéric; Schwartz, Jean-Luc

2012-08-01

Subjects presented with coherent auditory and visual streams generally fuse them into a single percept. This results in enhanced intelligibility in noise, or in visual modification of the auditory percept in the McGurk effect. It is classically considered that processing is done independently in the auditory and visual systems before interaction occurs at a certain representational stage, resulting in an integrated percept. However, some behavioral and neurophysiological data suggest the existence of a two-stage process. A first stage would involve binding together the appropriate pieces of audio and video information before fusion per se in a second stage. Then it should be possible to design experiments leading to unbinding. It is shown here that if a given McGurk stimulus is preceded by an incoherent audiovisual context, the amount of McGurk effect is largely reduced. Various kinds of incoherent contexts (acoustic syllables dubbed on video sentences or phonetic or temporal modifications of the acoustic content of a regular sequence of audiovisual syllables) can significantly reduce the McGurk effect even when they are short (less than 4 s). The data are interpreted in the framework of a two-stage "binding and fusion" model for audiovisual speech perception.
Visual activity predicts auditory recovery from deafness after adult cochlear implantation.

PubMed

Strelnikov, Kuzma; Rouger, Julien; Demonet, Jean-François; Lagleyre, Sebastien; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal

2013-12-01

Modern cochlear implantation technologies allow deaf patients to understand auditory speech; however, the implants deliver only a coarse auditory input and patients must use long-term adaptive processes to achieve coherent percepts. In adults with post-lingual deafness, the high progress of speech recovery is observed during the first year after cochlear implantation, but there is a large range of variability in the level of cochlear implant outcomes and the temporal evolution of recovery. It has been proposed that when profoundly deaf subjects receive a cochlear implant, the visual cross-modal reorganization of the brain is deleterious for auditory speech recovery. We tested this hypothesis in post-lingually deaf adults by analysing whether brain activity shortly after implantation correlated with the level of auditory recovery 6 months later. Based on brain activity induced by a speech-processing task, we found strong positive correlations in areas outside the auditory cortex. The highest positive correlations were found in the occipital cortex involved in visual processing, as well as in the posterior-temporal cortex known for audio-visual integration. The other area, which positively correlated with auditory speech recovery, was localized in the left inferior frontal area known for speech processing. Our results demonstrate that the visual modality's functional level is related to the proficiency level of auditory recovery. Based on the positive correlation of visual activity with auditory speech recovery, we suggest that visual modality may facilitate the perception of the word's auditory counterpart in communicative situations. The link demonstrated between visual activity and auditory speech perception indicates that visuoauditory synergy is crucial for cross-modal plasticity and fostering speech-comprehension recovery in adult cochlear-implanted deaf patients.
Spatio-temporal distribution of brain activity associated with audio-visually congruent and incongruent speech and the McGurk Effect.

PubMed

Pratt, Hillel; Bleich, Naomi; Mittelman, Nomi

2015-11-01

Spatio-temporal distributions of cortical activity to audio-visual presentations of meaningless vowel-consonant-vowels and the effects of audio-visual congruence/incongruence, with emphasis on the McGurk effect, were studied. The McGurk effect occurs when a clearly audible syllable with one consonant, is presented simultaneously with a visual presentation of a face articulating a syllable with a different consonant and the resulting percept is a syllable with a consonant other than the auditorily presented one. Twenty subjects listened to pairs of audio-visually congruent or incongruent utterances and indicated whether pair members were the same or not. Source current densities of event-related potentials to the first utterance in the pair were estimated and effects of stimulus-response combinations, brain area, hemisphere, and clarity of visual articulation were assessed. Auditory cortex, superior parietal cortex, and middle temporal cortex were the most consistently involved areas across experimental conditions. Early (<200 msec) processing of the consonant was overall prominent in the left hemisphere, except right hemisphere prominence in superior parietal cortex and secondary visual cortex. Clarity of visual articulation impacted activity in secondary visual cortex and Wernicke's area. McGurk perception was associated with decreased activity in primary and secondary auditory cortices and Wernicke's area before 100 msec, increased activity around 100 msec which decreased again around 180 msec. Activity in Broca's area was unaffected by McGurk perception and was only increased to congruent audio-visual stimuli 30-70 msec following consonant onset. The results suggest left hemisphere prominence in the effects of stimulus and response conditions on eight brain areas involved in dynamically distributed parallel processing of audio-visual integration. Initially (30-70 msec) subcortical contributions to auditory cortex, superior parietal cortex, and middle temporal cortex occur. During 100-140 msec, peristriate visual influences and Wernicke's area join in the processing. Resolution of incongruent audio-visual inputs is then attempted, and if successful, McGurk perception occurs and cortical activity in left hemisphere further increases between 170 and 260 msec.
Matching Heard and Seen Speech: An ERP Study of Audiovisual Word Recognition

PubMed Central

Kaganovich, Natalya; Schumaker, Jennifer; Rowland, Courtney

2016-01-01

Seeing articulatory gestures while listening to speech-in-noise (SIN) significantly improves speech understanding. However, the degree of this improvement varies greatly among individuals. We examined a relationship between two distinct stages of visual articulatory processing and the SIN accuracy by combining a cross-modal repetition priming task with ERP recordings. Participants first heard a word referring to a common object (e.g., pumpkin) and then decided whether the subsequently presented visual silent articulation matched the word they had just heard. Incongruent articulations elicited a significantly enhanced N400, indicative of a mismatch detection at the pre-lexical level. Congruent articulations elicited a significantly larger LPC, indexing articulatory word recognition. Only the N400 difference between incongruent and congruent trials was significantly correlated with individuals’ SIN accuracy improvement in the presence of the talker’s face. PMID:27155219
Reduced frontal theta oscillations indicate altered crossmodal prediction error processing in schizophrenia

PubMed Central

Keil, Julian; Balz, Johanna; Gallinat, Jürgen; Senkowski, Daniel

2016-01-01

Our brain generates predictions about forthcoming stimuli and compares predicted with incoming input. Failures in predicting events might contribute to hallucinations and delusions in schizophrenia (SZ). When a stimulus violates prediction, neural activity that reflects prediction error (PE) processing is found. While PE processing deficits have been reported in unisensory paradigms, it is unknown whether SZ patients (SZP) show altered crossmodal PE processing. We measured high-density electroencephalography and applied source estimation approaches to investigate crossmodal PE processing generated by audiovisual speech. In SZP and healthy control participants (HC), we used an established paradigm in which high- and low-predictive visual syllables were paired with congruent or incongruent auditory syllables. We examined crossmodal PE processing in SZP and HC by comparing differences in event-related potentials and neural oscillations between incongruent and congruent high- and low-predictive audiovisual syllables. In both groups event-related potentials between 206 and 250 ms were larger in high- compared with low-predictive syllables, suggesting intact audiovisual incongruence detection in the auditory cortex of SZP. The analysis of oscillatory responses revealed theta-band (4–7 Hz) power enhancement in high- compared with low-predictive syllables between 230 and 370 ms in the frontal cortex of HC but not SZP. Thus aberrant frontal theta-band oscillations reflect crossmodal PE processing deficits in SZ. The present study suggests a top-down multisensory processing deficit and highlights the role of dysfunctional frontal oscillations for the SZ psychopathology. PMID:27358314
Visual Cues Contribute Differentially to Audiovisual Perception of Consonants and Vowels in Improving Recognition and Reducing Cognitive Demands in Listeners with Hearing Impairment Using Hearing Aids

ERIC Educational Resources Information Center

Moradi, Shahram; Lidestam, Bjorn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Ronnberg, Jerker

2017-01-01

Purpose: We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels--in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands--in listeners with hearing impairment using hearing aids. Method: The study comprised 199…
Visual-auditory integration during speech imitation in autism.

PubMed

Williams, Justin H G; Massaro, Dominic W; Peel, Natalie J; Bosseler, Alexis; Suddendorf, Thomas

2004-01-01

Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional 'mirror neuron' systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a 'virtual' head (Baldi), delivered speech stimuli for identification in auditory, visual or bimodal conditions. Children with ASD were poorer than controls at recognizing stimuli in the unimodal conditions, but once performance on this measure was controlled for, no group difference was found in the bimodal condition. A group of participants with ASD were also trained to develop their speech-reading ability. Training improved visual accuracy and this also improved the children's ability to utilize visual information in their processing of speech. Overall results were compared to predictions from mathematical models based on integration and non-integration, and were most consistent with the integration model. We conclude that, whilst they are less accurate in recognizing stimuli in the unimodal condition, children with ASD show normal integration of visual and auditory speech stimuli. Given that training in recognition of visual speech was effective, children with ASD may benefit from multi-modal approaches in imitative therapy and language training.
Are the Literacy Difficulties That Characterize Developmental Dyslexia Associated with a Failure to Integrate Letters and Speech Sounds?

ERIC Educational Resources Information Center

Nash, Hannah M.; Gooch, Debbie; Hulme, Charles; Mahajan, Yatin; McArthur, Genevieve; Steinmetzger, Kurt; Snowling, Margaret J.

2017-01-01

The "automatic letter-sound integration hypothesis" (Blomert, [Blomert, L., 2011]) proposes that dyslexia results from a failure to fully integrate letters and speech sounds into automated audio-visual objects. We tested this hypothesis in a sample of English-speaking children with dyslexic difficulties (N = 13) and samples of…
Effects of Visual Information on Intelligibility of Open and Closed Class Words in Predictable Sentences Produced by Speakers with Dysarthria

ERIC Educational Resources Information Center

Hustad, Katherine C.; Dardis, Caitlin M.; Mccourt, Kelly A.

2007-01-01

This study examined the independent and interactive effects of visual information and linguistic class of words on intelligibility of dysarthric speech. Seven speakers with dysarthria participated in the study, along with 224 listeners who transcribed speech samples in audiovisual (AV) or audio-only (AO) listening conditions. Orthographic…
The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception.

PubMed

Buchan, Julie N; Paré, Martin; Munhall, Kevin G

2008-11-25

During face-to-face conversation the face provides auditory and visual linguistic information, and also conveys information about the identity of the speaker. This study investigated behavioral strategies involved in gathering visual information while watching talking faces. The effects of varying talker identity and varying the intelligibility of speech (by adding acoustic noise) on gaze behavior were measured with an eyetracker. Varying the intelligibility of the speech by adding noise had a noticeable effect on the location and duration of fixations. When noise was present subjects adopted a vantage point that was more centralized on the face by reducing the frequency of the fixations on the eyes and mouth and lengthening the duration of their gaze fixations on the nose and mouth. Varying talker identity resulted in a more modest change in gaze behavior that was modulated by the intelligibility of the speech. Although subjects generally used similar strategies to extract visual information in both talker variability conditions, when noise was absent there were more fixations on the mouth when viewing a different talker every trial as opposed to the same talker every trial. These findings provide a useful baseline for studies examining gaze behavior during audiovisual speech perception and perception of dynamic faces.
Non-fluent speech following stroke is caused by impaired efference copy.

PubMed

Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

2017-09-01

Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.
McGurk stimuli for the investigation of multisensory integration in cochlear implant users: The Oldenburg Audio Visual Speech Stimuli (OLAVS).

PubMed

Stropahl, Maren; Schellhardt, Sebastian; Debener, Stefan

2017-06-01

The concurrent presentation of different auditory and visual syllables may result in the perception of a third syllable, reflecting an illusory fusion of visual and auditory information. This well-known McGurk effect is frequently used for the study of audio-visual integration. Recently, it was shown that the McGurk effect is strongly stimulus-dependent, which complicates comparisons across perceivers and inferences across studies. To overcome this limitation, we developed the freely available Oldenburg audio-visual speech stimuli (OLAVS), consisting of 8 different talkers and 12 different syllable combinations. The quality of the OLAVS set was evaluated with 24 normal-hearing subjects. All 96 stimuli were characterized based on their stimulus disparity, which was obtained from a probabilistic model (cf. Magnotti & Beauchamp, 2015). Moreover, the McGurk effect was studied in eight adult cochlear implant (CI) users. By applying the individual, stimulus-independent parameters of the probabilistic model, the predicted effect of stronger audio-visual integration in CI users could be confirmed, demonstrating the validity of the new stimulus material.

Dynamics of cortico-subcortical cross-modal operations involved in audio-visual object detection in humans.

PubMed

Fort, Alexandra; Delpuech, Claude; Pernier, Jacques; Giard, Marie-Hélène

2002-10-01

Very recently, a number of neuroimaging studies in humans have begun to investigate the question of how the brain integrates information from different sensory modalities to form unified percepts. Already, intermodal neural processing appears to depend on the modalities of inputs or the nature (speech/non-speech) of information to be combined. Yet, the variety of paradigms, stimuli and technics used make it difficult to understand the relationships between the factors operating at the perceptual level and the underlying physiological processes. In a previous experiment, we used event-related potentials to describe the spatio-temporal organization of audio-visual interactions during a bimodal object recognition task. Here we examined the network of cross-modal interactions involved in simple detection of the same objects. The objects were defined either by unimodal auditory or visual features alone, or by the combination of the two features. As expected, subjects detected bimodal stimuli more rapidly than either unimodal stimuli. Combined analysis of potentials, scalp current densities and dipole modeling revealed several interaction patterns within the first 200 micro s post-stimulus: in occipito-parietal visual areas (45-85 micro s), in deep brain structures, possibly the superior colliculus (105-140 micro s), and in right temporo-frontal regions (170-185 micro s). These interactions differed from those found during object identification in sensory-specific areas and possibly in the superior colliculus, indicating that the neural operations governing multisensory integration depend crucially on the nature of the perceptual processes involved.
Neuromodulatory Effects of Auditory Training and Hearing Aid Use on Audiovisual Speech Perception in Elderly Individuals

PubMed Central

Yu, Luodi; Rao, Aparna; Zhang, Yang; Burton, Philip C.; Rishiq, Dania; Abrams, Harvey

2017-01-01

Although audiovisual (AV) training has been shown to improve overall speech perception in hearing-impaired listeners, there has been a lack of direct brain imaging data to help elucidate the neural networks and neural plasticity associated with hearing aid (HA) use and auditory training targeting speechreading. For this purpose, the current clinical case study reports functional magnetic resonance imaging (fMRI) data from two hearing-impaired patients who were first-time HA users. During the study period, both patients used HAs for 8 weeks; only one received a training program named ReadMyQuipsTM (RMQ) targeting speechreading during the second half of the study period for 4 weeks. Identical fMRI tests were administered at pre-fitting and at the end of the 8 weeks. Regions of interest (ROI) including auditory cortex and visual cortex for uni-sensory processing, and superior temporal sulcus (STS) for AV integration, were identified for each person through independent functional localizer task. The results showed experience-dependent changes involving ROIs of auditory cortex, STS and functional connectivity between uni-sensory ROIs and STS from pretest to posttest in both cases. These data provide initial evidence for the malleable experience-driven cortical functionality for AV speech perception in elderly hearing-impaired people and call for further studies with a much larger subject sample and systematic control to fill in the knowledge gap to understand brain plasticity associated with auditory rehabilitation in the aging population. PMID:28270763
Neuromodulatory Effects of Auditory Training and Hearing Aid Use on Audiovisual Speech Perception in Elderly Individuals.

PubMed

Yu, Luodi; Rao, Aparna; Zhang, Yang; Burton, Philip C; Rishiq, Dania; Abrams, Harvey

2017-01-01

Although audiovisual (AV) training has been shown to improve overall speech perception in hearing-impaired listeners, there has been a lack of direct brain imaging data to help elucidate the neural networks and neural plasticity associated with hearing aid (HA) use and auditory training targeting speechreading. For this purpose, the current clinical case study reports functional magnetic resonance imaging (fMRI) data from two hearing-impaired patients who were first-time HA users. During the study period, both patients used HAs for 8 weeks; only one received a training program named ReadMyQuips TM (RMQ) targeting speechreading during the second half of the study period for 4 weeks. Identical fMRI tests were administered at pre-fitting and at the end of the 8 weeks. Regions of interest (ROI) including auditory cortex and visual cortex for uni-sensory processing, and superior temporal sulcus (STS) for AV integration, were identified for each person through independent functional localizer task. The results showed experience-dependent changes involving ROIs of auditory cortex, STS and functional connectivity between uni-sensory ROIs and STS from pretest to posttest in both cases. These data provide initial evidence for the malleable experience-driven cortical functionality for AV speech perception in elderly hearing-impaired people and call for further studies with a much larger subject sample and systematic control to fill in the knowledge gap to understand brain plasticity associated with auditory rehabilitation in the aging population.
Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers.

PubMed

Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen

2018-01-01

The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration.
Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers

PubMed Central

Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen

2018-01-01

The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration. PMID:29780312
Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants

PubMed Central

Leybaert, Jacqueline; LaSasso, Carol J.

2010-01-01

Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357
Basic to Applied Research: The Benefits of Audio-Visual Speech Perception Research in Teaching Foreign Languages

ERIC Educational Resources Information Center

Erdener, Dogu

2016-01-01

Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…
Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap.

PubMed

Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin'ya

2013-01-01

It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.
Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap

PubMed Central

Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin’Ya

2013-01-01

It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap. PMID:23658549
Pre- and Postoperative Binaural Unmasking for Bimodal Cochlear Implant Listeners.

PubMed

Sheffield, Benjamin M; Schuchman, Gerald; Bernstein, Joshua G W

Cochlear implants (CIs) are increasingly recommended to individuals with residual bilateral acoustic hearing. Although new hearing-preserving electrode designs and surgical approaches show great promise, CI recipients are still at risk to lose acoustic hearing in the implanted ear, which could prevent the ability to take advantage of binaural unmasking to aid speech recognition in noise. This study examined the tradeoff between the benefits of a CI for speech understanding in noise and the potential loss of binaural unmasking for CI recipients with some bilateral preoperative acoustic hearing. Binaural unmasking is difficult to evaluate in CI candidates because speech perception in noise is generally too poor to measure reliably in the range of signal to noise ratios (SNRs) where binaural intelligibility level differences (BILDs) are typically observed (<5 dB). Thus, a test of audiovisual speech perception in noise was employed to increase performance to measureable levels. BILDs were measured preoperatively for 11 CI candidates and at least 5 months post-activation for 10 of these individuals (1 individual elected not to receive a CI). Audiovisual sentences were presented in speech-shaped masking noise between -10 and +15 dB SNR. The noise was always correlated between the ears, while the speech signal was either correlated (N0S0) or inversely correlated (N0Sπ). Stimuli were delivered via headphones to the unaided ear(s) and, where applicable, via auxiliary input to the CI speech processor. A z test evaluated performance differences between the N0S0 and N0Sπ conditions for each listener pre- and postoperatively. For listeners showing a significant difference, the magnitude of the BILD was characterized as the difference in SNRs required to achieve 50% correct performance. One listener who underwent hearing-preservation surgery received additional postoperative tests, which presented sound directly to both ears and to the CI speech processor. Five of 11 listeners showed a significant preoperative BILD (range: 2.0 to 7.3 dB). Only 2 of these 5 showed a significant postoperative BILD, but the mean BILD was smaller (1.3 dB) than that observed preoperatively (3.1 dB). Despite the fact that some listeners lost the preoperative binaural benefit, 9 out of 10 listeners tested postoperatively had performance equal to or better than their best pre-CI performance. The listener who retained functional acoustic hearing in the implanted ear also demonstrated a preserved acoustic BILD postoperatively. Approximately half of the CI candidates in this study demonstrated preoperative binaural hearing benefits for audiovisual speech perception in noise. Most of these listeners lost their acoustic hearing in the implanted ear after surgery (using nonhearing-preservation techniques), and therefore lost access to this binaural benefit. In all but one case, any loss of binaural benefit was compensated for or exceeded by an improvement in speech perception with the CI. Evidence of a preoperative BILD suggests that certain CI candidates might further benefit from hearing-preservation surgery to retain acoustic binaural unmasking, as demonstrated for the listener who underwent hearing-preservation surgery. This test of binaural audiovisual speech perception in noise could serve as a diagnostic tool to identify CI candidates who are most likely to receive functional benefits from their bilateral acoustic hearing.
Being First Matters: Topographical Representational Similarity Analysis of ERP Signals Reveals Separate Networks for Audiovisual Temporal Binding Depending on the Leading Sense.

PubMed

Cecere, Roberto; Gross, Joachim; Willis, Ashleigh; Thut, Gregor

2017-05-24

In multisensory integration, processing in one sensory modality is enhanced by complementary information from other modalities. Intersensory timing is crucial in this process because only inputs reaching the brain within a restricted temporal window are perceptually bound. Previous research in the audiovisual field has investigated various features of the temporal binding window, revealing asymmetries in its size and plasticity depending on the leading input: auditory-visual (AV) or visual-auditory (VA). Here, we tested whether separate neuronal mechanisms underlie this AV-VA dichotomy in humans. We recorded high-density EEG while participants performed an audiovisual simultaneity judgment task including various AV-VA asynchronies and unisensory control conditions (visual-only, auditory-only) and tested whether AV and VA processing generate different patterns of brain activity. After isolating the multisensory components of AV-VA event-related potentials (ERPs) from the sum of their unisensory constituents, we ran a time-resolved topographical representational similarity analysis (tRSA) comparing the AV and VA ERP maps. Spatial cross-correlation matrices were built from real data to index the similarity between the AV and VA maps at each time point (500 ms window after stimulus) and then correlated with two alternative similarity model matrices: AV maps = VA maps versus AV maps ≠ VA maps The tRSA results favored the AV maps ≠ VA maps model across all time points, suggesting that audiovisual temporal binding (indexed by synchrony perception) engages different neural pathways depending on the leading sense. The existence of such dual route supports recent theoretical accounts proposing that multiple binding mechanisms are implemented in the brain to accommodate different information parsing strategies in auditory and visual sensory systems. SIGNIFICANCE STATEMENT Intersensory timing is a crucial aspect of multisensory integration, determining whether and how inputs in one modality enhance stimulus processing in another modality. Our research demonstrates that evaluating synchrony of auditory-leading (AV) versus visual-leading (VA) audiovisual stimulus pairs is characterized by two distinct patterns of brain activity. This suggests that audiovisual integration is not a unitary process and that different binding mechanisms are recruited in the brain based on the leading sense. These mechanisms may be relevant for supporting different classes of multisensory operations, for example, auditory enhancement of visual attention (AV) and visual enhancement of auditory speech (VA). Copyright © 2017 Cecere et al.
Being First Matters: Topographical Representational Similarity Analysis of ERP Signals Reveals Separate Networks for Audiovisual Temporal Binding Depending on the Leading Sense

PubMed Central

2017-01-01

In multisensory integration, processing in one sensory modality is enhanced by complementary information from other modalities. Intersensory timing is crucial in this process because only inputs reaching the brain within a restricted temporal window are perceptually bound. Previous research in the audiovisual field has investigated various features of the temporal binding window, revealing asymmetries in its size and plasticity depending on the leading input: auditory–visual (AV) or visual–auditory (VA). Here, we tested whether separate neuronal mechanisms underlie this AV–VA dichotomy in humans. We recorded high-density EEG while participants performed an audiovisual simultaneity judgment task including various AV–VA asynchronies and unisensory control conditions (visual-only, auditory-only) and tested whether AV and VA processing generate different patterns of brain activity. After isolating the multisensory components of AV–VA event-related potentials (ERPs) from the sum of their unisensory constituents, we ran a time-resolved topographical representational similarity analysis (tRSA) comparing the AV and VA ERP maps. Spatial cross-correlation matrices were built from real data to index the similarity between the AV and VA maps at each time point (500 ms window after stimulus) and then correlated with two alternative similarity model matrices: AVmaps = VAmaps versus AVmaps ≠ VAmaps. The tRSA results favored the AVmaps ≠ VAmaps model across all time points, suggesting that audiovisual temporal binding (indexed by synchrony perception) engages different neural pathways depending on the leading sense. The existence of such dual route supports recent theoretical accounts proposing that multiple binding mechanisms are implemented in the brain to accommodate different information parsing strategies in auditory and visual sensory systems. SIGNIFICANCE STATEMENT Intersensory timing is a crucial aspect of multisensory integration, determining whether and how inputs in one modality enhance stimulus processing in another modality. Our research demonstrates that evaluating synchrony of auditory-leading (AV) versus visual-leading (VA) audiovisual stimulus pairs is characterized by two distinct patterns of brain activity. This suggests that audiovisual integration is not a unitary process and that different binding mechanisms are recruited in the brain based on the leading sense. These mechanisms may be relevant for supporting different classes of multisensory operations, for example, auditory enhancement of visual attention (AV) and visual enhancement of auditory speech (VA). PMID:28450537
Creating speech-synchronized animation.

PubMed

King, Scott A; Parent, Richard E

2005-01-01

We present a facial model designed primarily to support animated speech. Our facial model takes facial geometry as input and transforms it into a parametric deformable model. The facial model uses a muscle-based parameterization, allowing for easier integration between speech synchrony and facial expressions. Our facial model has a highly deformable lip model that is grafted onto the input facial geometry to provide the necessary geometric complexity needed for creating lip shapes and high-quality renderings. Our facial model also includes a highly deformable tongue model that can represent the shapes the tongue undergoes during speech. We add teeth, gums, and upper palate geometry to complete the inner mouth. To decrease the processing time, we hierarchically deform the facial surface. We also present a method to animate the facial model over time to create animated speech using a model of coarticulation that blends visemes together using dominance functions. We treat visemes as a dynamic shaping of the vocal tract by describing visemes as curves instead of keyframes. We show the utility of the techniques described in this paper by implementing them in a text-to-audiovisual-speech system that creates animation of speech from unrestricted text. The facial and coarticulation models must first be interactively initialized. The system then automatically creates accurate real-time animated speech from the input text. It is capable of cheaply producing tremendous amounts of animated speech with very low resource requirements.
Multisensory speech perception without the left superior temporal sulcus.

PubMed

Baum, Sarah H; Martin, Randi C; Hamilton, A Cris; Beauchamp, Michael S

2012-09-01

Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. Copyright © 2012 Elsevier Inc. All rights reserved.
Multisensory Speech Perception Without the Left Superior Temporal Sulcus

PubMed Central

Baum, Sarah H.; Martin, Randi C.; Hamilton, A. Cris; Beauchamp, Michael S.

2012-01-01

Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. PMID:22634292
Facial expressions and the evolution of the speech rhythm.

PubMed

Ghazanfar, Asif A; Takahashi, Daniel Y

2014-06-01

In primates, different vocalizations are produced, at least in part, by making different facial expressions. Not surprisingly, humans, apes, and monkeys all recognize the correspondence between vocalizations and the facial postures associated with them. However, one major dissimilarity between monkey vocalizations and human speech is that, in the latter, the acoustic output and associated movements of the mouth are both rhythmic (in the 3- to 8-Hz range) and tightly correlated, whereas monkey vocalizations have a similar acoustic rhythmicity but lack the concommitant rhythmic facial motion. This raises the question of how we evolved from a presumptive ancestral acoustic-only vocal rhythm to the one that is audiovisual with improved perceptual sensitivity. According to one hypothesis, this bisensory speech rhythm evolved through the rhythmic facial expressions of ancestral primates. If this hypothesis has any validity, we expect that the extant nonhuman primates produce at least some facial expressions with a speech-like rhythm in the 3- to 8-Hz frequency range. Lip smacking, an affiliative signal observed in many genera of primates, satisfies this criterion. We review a series of studies using developmental, x-ray cineradiographic, EMG, and perceptual approaches with macaque monkeys producing lip smacks to further investigate this hypothesis. We then explore its putative neural basis and remark on important differences between lip smacking and speech production. Overall, the data support the hypothesis that lip smacking may have been an ancestral expression that was linked to vocal output to produce the original rhythmic audiovisual speech-like utterances in the human lineage.
Hearing Faces: How the Infant Brain Matches the Face It Sees with the Speech It Hears

ERIC Educational Resources Information Center

Bristow, Davina; Dehaene-Lambertz, Ghislaine; Mattout, Jeremie; Soares, Catherine; Gliga, Teodora; Baillet, Sylvain; Mangin, Jean-Francois

2009-01-01

Speech is not a purely auditory signal. From around 2 months of age, infants are able to correctly match the vowel they hear with the appropriate articulating face. However, there is no behavioral evidence of integrated audiovisual perception until 4 months of age, at the earliest, when an illusory percept can be created by the fusion of the…
The Impact of Early Bilingualism on Face Recognition Processes.

PubMed

Kandel, Sonia; Burfin, Sabine; Méary, David; Ruiz-Tada, Elisa; Costa, Albert; Pascalis, Olivier

2016-01-01

Early linguistic experience has an impact on the way we decode audiovisual speech in face-to-face communication. The present study examined whether differences in visual speech decoding could be linked to a broader difference in face processing. To identify a phoneme we have to do an analysis of the speaker's face to focus on the relevant cues for speech decoding (e.g., locating the mouth with respect to the eyes). Face recognition processes were investigated through two classic effects in face recognition studies: the Other-Race Effect (ORE) and the Inversion Effect. Bilingual and monolingual participants did a face recognition task with Caucasian faces (own race), Chinese faces (other race), and cars that were presented in an Upright or Inverted position. The results revealed that monolinguals exhibited the classic ORE. Bilinguals did not. Overall, bilinguals were slower than monolinguals. These results suggest that bilinguals' face processing abilities differ from monolinguals'. Early exposure to more than one language may lead to a perceptual organization that goes beyond language processing and could extend to face analysis. We hypothesize that these differences could be due to the fact that bilinguals focus on different parts of the face than monolinguals, making them more efficient in other race face processing but slower. However, more studies using eye-tracking techniques are necessary to confirm this explanation.
Neural Correlates of Interindividual Differences in Children’s Audiovisual Speech Perception

PubMed Central

Nath, Audrey R.; Fava, Eswen E.; Beauchamp, Michael S.

2011-01-01

Children use information from both the auditory and visual modalities to aid in understanding speech. A dramatic illustration of this multisensory integration is the McGurk effect, an illusion in which an auditory syllable is perceived differently when it is paired with an incongruent mouth movement. However, there are significant interindividual differences in McGurk perception: some children never perceive the illusion, while others always do. Because converging evidence suggests that the posterior superior temporal sulcus (STS) is a critical site for multisensory integration, we hypothesized that activity within the STS would predict susceptibility to the McGurk effect. To test this idea, we used blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) in seventeen children aged 6 to 12 years to measure brain responses to three audiovisual stimulus categories: McGurk incongruent, non-McGurk incongruent and congruent syllables. Two separate analysis approaches, one using independent functional localizers and another using whole-brain voxel-based regression, showed differences in the left STS between perceivers and non-perceivers. The STS of McGurk perceivers responded significantly more than non-perceivers to McGurk syllables, but not to other stimuli, and perceivers’ hemodynamic responses in the STS were significantly prolonged. In addition to the STS, weaker differences between perceivers and non-perceivers were observed in the FFA and extrastriate visual cortex. These results suggest that the STS is an important source of interindividual variability in children’s audiovisual speech perception. PMID:21957257
Sight and sound persistently out of synch: stable individual differences in audiovisual synchronisation revealed by implicit measures of lip-voice integration

PubMed Central

Ipser, Alberta; Agolli, Vlera; Bajraktari, Anisa; Al-Alawi, Fatimah; Djaafara, Nurfitriani; Freeman, Elliot D.

2017-01-01

Are sight and sound out of synch? Signs that they are have been dismissed for over two centuries as an artefact of attentional and response bias, to which traditional subjective methods are prone. To avoid such biases, we measured performance on objective tasks that depend implicitly on achieving good lip-synch. We measured the McGurk effect (in which incongruent lip-voice pairs evoke illusory phonemes), and also identification of degraded speech, while manipulating audiovisual asynchrony. Peak performance was found at an average auditory lag of ~100 ms, but this varied widely between individuals. Participants’ individual optimal asynchronies showed trait-like stability when the same task was re-tested one week later, but measures based on different tasks did not correlate. This discounts the possible influence of common biasing factors, suggesting instead that our different tasks probe different brain networks, each subject to their own intrinsic auditory and visual processing latencies. Our findings call for renewed interest in the biological causes and cognitive consequences of individual sensory asynchronies, leading potentially to fresh insights into the neural representation of sensory timing. A concrete implication is that speech comprehension might be enhanced, by first measuring each individual’s optimal asynchrony and then applying a compensatory auditory delay. PMID:28429784

Alpha and Beta Oscillations Index Semantic Congruency between Speech and Gestures in Clear and Degraded Speech.

PubMed

Drijvers, Linda; Özyürek, Asli; Jensen, Ole

2018-06-19

Previous work revealed that visual semantic information conveyed by gestures can enhance degraded speech comprehension, but the mechanisms underlying these integration processes under adverse listening conditions remain poorly understood. We used MEG to investigate how oscillatory dynamics support speech-gesture integration when integration load is manipulated by auditory (e.g., speech degradation) and visual semantic (e.g., gesture congruency) factors. Participants were presented with videos of an actress uttering an action verb in clear or degraded speech, accompanied by a matching (mixing gesture + "mixing") or mismatching (drinking gesture + "walking") gesture. In clear speech, alpha/beta power was more suppressed in the left inferior frontal gyrus and motor and visual cortices when integration load increased in response to mismatching versus matching gestures. In degraded speech, beta power was less suppressed over posterior STS and medial temporal lobe for mismatching compared with matching gestures, showing that integration load was lowest when speech was degraded and mismatching gestures could not be integrated and disambiguate the degraded signal. Our results thus provide novel insights on how low-frequency oscillatory modulations in different parts of the cortex support the semantic audiovisual integration of gestures in clear and degraded speech: When speech is clear, the left inferior frontal gyrus and motor and visual cortices engage because higher-level semantic information increases semantic integration load. When speech is degraded, posterior STS/middle temporal gyrus and medial temporal lobe are less engaged because integration load is lowest when visual semantic information does not aid lexical retrieval and speech and gestures cannot be integrated.
Computer-based auditory training (CBAT): benefits for children with language- and reading-related learning difficulties.

PubMed

Loo, Jenny Hooi Yin; Bamiou, Doris-Eva; Campbell, Nicci; Luxon, Linda M

2010-08-01

This article reviews the evidence for computer-based auditory training (CBAT) in children with language, reading, and related learning difficulties, and evaluates the extent it can benefit children with auditory processing disorder (APD). Searches were confined to studies published between 2000 and 2008, and they are rated according to the level of evidence hierarchy proposed by the American Speech-Language Hearing Association (ASHA) in 2004. We identified 16 studies of two commercially available CBAT programs (13 studies of Fast ForWord (FFW) and three studies of Earobics) and five further outcome studies of other non-speech and simple speech sounds training, available for children with language, learning, and reading difficulties. The results suggest that, apart from the phonological awareness skills, the FFW and Earobics programs seem to have little effect on the language, spelling, and reading skills of children. Non-speech and simple speech sounds training may be effective in improving children's reading skills, but only if it is delivered by an audio-visual method. There is some initial evidence to suggest that CBAT may be of benefit for children with APD. Further research is necessary, however, to substantiate these preliminary findings.
Visual Cues Contribute Differentially to Audiovisual Perception of Consonants and Vowels in Improving Recognition and Reducing Cognitive Demands in Listeners With Hearing Impairment Using Hearing Aids.

PubMed

Moradi, Shahram; Lidestam, Björn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Rönnberg, Jerker

2017-09-18

We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels-in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands-in listeners with hearing impairment using hearing aids. The study comprised 199 participants with hearing impairment (mean age = 61.1 years) with bilateral, symmetrical, mild-to-severe sensorineural hearing loss. Gated Swedish consonants and vowels were presented aurally and audiovisually to participants. Linear amplification was adjusted for each participant to assure audibility. The reading span test was used to measure participants' working memory capacity. Audiovisual presentation resulted in shortened isolation points and improved accuracy for consonants and vowels relative to auditory-only presentation. This benefit was more evident for consonants than vowels. In addition, correlations and subsequent analyses revealed that listeners with higher scores on the reading span test identified both consonants and vowels earlier in auditory-only presentation, but only vowels (not consonants) in audiovisual presentation. Consonants and vowels differed in terms of the benefits afforded from their associative visual cues, as indicated by the degree of audiovisual benefit and reduction in cognitive demands linked to the identification of consonants and vowels presented audiovisually.
The relationship between level of autistic traits and local bias in the context of the McGurk effect

PubMed Central

Ujiie, Yuta; Asai, Tomohisa; Wakabayashi, Akio

2015-01-01

The McGurk effect is a well-known illustration that demonstrates the influence of visual information on hearing in the context of speech perception. Some studies have reported that individuals with autism spectrum disorder (ASD) display abnormal processing of audio-visual speech integration, while other studies showed contradictory results. Based on the dimensional model of ASD, we administered two analog studies to examine the link between level of autistic traits, as assessed by the Autism Spectrum Quotient (AQ), and the McGurk effect among a sample of university students. In the first experiment, we found that autistic traits correlated negatively with fused (McGurk) responses. Then, we manipulated presentation types of visual stimuli to examine whether the local bias toward visual speech cues modulated individual differences in the McGurk effect. The presentation included four types of visual images, comprising no image, mouth only, mouth and eyes, and full face. The results revealed that global facial information facilitates the influence of visual speech cues on McGurk stimuli. Moreover, individual differences between groups with low and high levels of autistic traits appeared when the full-face visual speech cue with an incongruent voice condition was presented. These results suggest that individual differences in the McGurk effect might be due to a weak ability to process global facial information in individuals with high levels of autistic traits. PMID:26175705
Audiovisual sentence repetition as a clinical criterion for auditory development in Persian-language children with hearing loss.

PubMed

Oryadi-Zanjani, Mohammad Majid; Vahab, Maryam; Rahimi, Zahra; Mayahi, Anis

2017-02-01

It is important for clinician such as speech-language pathologists and audiologists to develop more efficient procedures to assess the development of auditory, speech and language skills in children using hearing aid and/or cochlear implant compared to their peers with normal hearing. So, the aim of study was the comparison of the performance of 5-to-7-year-old Persian-language children with and without hearing loss in visual-only, auditory-only, and audiovisual presentation of sentence repetition task. The research was administered as a cross-sectional study. The sample size was 92 Persian 5-7 year old children including: 60 with normal hearing and 32 with hearing loss. The children with hearing loss were recruited from Soroush rehabilitation center for Persian-language children with hearing loss in Shiraz, Iran, through consecutive sampling method. All the children had unilateral cochlear implant or bilateral hearing aid. The assessment tool was the Sentence Repetition Test. The study included three computer-based experiments including visual-only, auditory-only, and audiovisual. The scores were compared within and among the three groups through statistical tests in α = 0.05. The score of sentence repetition task between V-only, A-only, and AV presentation was significantly different in the three groups; in other words, the highest to lowest scores belonged respectively to audiovisual, auditory-only, and visual-only format in the children with normal hearing (P < 0.01), cochlear implant (P < 0.01), and hearing aid (P < 0.01). In addition, there was no significant correlationship between the visual-only and audiovisual sentence repetition scores in all the 5-to-7-year-old children (r = 0.179, n = 92, P = 0.088), but audiovisual sentence repetition scores were found to be strongly correlated with auditory-only scores in all the 5-to-7-year-old children (r = 0.943, n = 92, P = 0.000). According to the study's findings, audiovisual integration occurs in the 5-to-7-year-old Persian children using hearing aid or cochlear implant during sentence repetition similar to their peers with normal hearing. Therefore, it is recommended that audiovisual sentence repetition should be used as a clinical criterion for auditory development in Persian-language children with hearing loss. Copyright © 2016. Published by Elsevier B.V.
Integration of faces and vocalizations in ventral prefrontal cortex: Implications for the evolution of audiovisual speech

PubMed Central

Romanski, Lizabeth M.

2012-01-01

The integration of facial gestures and vocal signals is an essential process in human communication and relies on an interconnected circuit of brain regions, including language regions in the inferior frontal gyrus (IFG). Studies have determined that ventral prefrontal cortical regions in macaques [e.g., the ventrolateral prefrontal cortex (VLPFC)] share similar cytoarchitectonic features as cortical areas in the human IFG, suggesting structural homology. Anterograde and retrograde tracing studies show that macaque VLPFC receives afferents from the superior and inferior temporal gyrus, which provide complex auditory and visual information, respectively. Moreover, physiological studies have shown that single neurons in VLPFC integrate species-specific face and vocal stimuli. Although bimodal responses may be found across a wide region of prefrontal cortex, vocalization responsive cells, which also respond to faces, are mainly found in anterior VLPFC. This suggests that VLPFC may be specialized to process and integrate social communication information, just as the IFG is specialized to process and integrate speech and gestures in the human brain. PMID:22723356
Comparison of audio and audiovisual measures of adult stuttering: Implications for clinical trials.

PubMed

O'Brian, Sue; Jones, Mark; Onslow, Mark; Packman, Ann; Menzies, Ross; Lowe, Robyn

2015-04-15

This study investigated whether measures of percentage syllables stuttered (%SS) and stuttering severity ratings with a 9-point scale differ when made from audiovisual compared with audio-only recordings. Four experienced speech-language pathologists measured %SS and assigned stuttering severity ratings to 10-minute audiovisual and audio-only recordings of 36 adults. There was a mean 18% increase in %SS scores when samples were presented in audiovisual compared with audio-only mode. This result was consistent across both higher and lower %SS scores and was found to be directly attributable to counts of stuttered syllables rather than the total number of syllables. There was no significant difference between stuttering severity ratings made from the two modes. In clinical trials research, when using %SS as the primary outcome measure, audiovisual samples would be preferred as long as clear, good quality, front-on images can be easily captured. Alternatively, stuttering severity ratings may be a more valid measure to use as they correlate well with %SS and values are not influenced by the presentation mode.
Audio-guided audiovisual data segmentation, indexing, and retrieval

NASA Astrophysics Data System (ADS)

Zhang, Tong; Kuo, C.-C. Jay

1998-12-01

While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data, based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e., speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based upon morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes, such as applause, explosions, bird sounds, etc. This fine-level classification and indexing step is based upon time- frequency analysis of audio signals and the use of the hidden Markov model as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90 percent for the coarse-level classification, and higher than 85 percent for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.
Communication Arts: A Tentative Curriculum Guide for English 7-12, Basic, Regular, STS, [Advanced], Junior High Speech A & B, Speech I and II, Drama I and II, and Journalism I and II.

ERIC Educational Resources Information Center

Irving Independent School District, TX.

This guide is intended to be used for instruction in communication skills from the seventh grade through the twelfth. Each section of the guide is identified by grade level and includes instructional objectives, suggestions for introducing and motivating the unit, required material, suggested activities, audiovisual aids, resource materials, and…
Interactive MPEG-4 low-bit-rate speech/audio transmission over the Internet

NASA Astrophysics Data System (ADS)

Liu, Fang; Kim, JongWon; Kuo, C.-C. Jay

1999-11-01

The recently developed MPEG-4 technology enables the coding and transmission of natural and synthetic audio-visual data in the form of objects. In an effort to extend the object-based functionality of MPEG-4 to real-time Internet applications, architectural prototypes of multiplex layer and transport layer tailored for transmission of MPEG-4 data over IP are under debate among Internet Engineering Task Force (IETF), and MPEG-4 systems Ad Hoc group. In this paper, we present an architecture for interactive MPEG-4 speech/audio transmission system over the Internet. It utilities a framework of Real Time Streaming Protocol (RTSP) over Real-time Transport Protocol (RTP) to provide controlled, on-demand delivery of real time speech/audio data. Based on a client-server model, a couple of low bit-rate bit streams (real-time speech/audio, pre- encoded speech/audio) are multiplexed and transmitted via a single RTP channel to the receiver. The MPEG-4 Scene Description (SD) and Object Descriptor (OD) bit streams are securely sent through the RTSP control channel. Upon receiving, an initial MPEG-4 audio- visual scene is constructed after de-multiplexing, decoding of bit streams, and scene composition. A receiver is allowed to manipulate the initial audio-visual scene presentation locally, or interactively arrange scene changes by sending requests to the server. A server may also choose to update the client with new streams and list of contents for user selection.
Robust audio-visual speech recognition under noisy audio-video conditions.

PubMed

Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

2014-02-01

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Speech-evoked activation in adult temporal cortex measured using functional near-infrared spectroscopy (fNIRS): Are the measurements reliable?

PubMed

Wiggins, Ian M; Anderson, Carly A; Kitterick, Pádraig T; Hartley, Douglas E H

2016-09-01

Functional near-infrared spectroscopy (fNIRS) is a silent, non-invasive neuroimaging technique that is potentially well suited to auditory research. However, the reliability of auditory-evoked activation measured using fNIRS is largely unknown. The present study investigated the test-retest reliability of speech-evoked fNIRS responses in normally-hearing adults. Seventeen participants underwent fNIRS imaging in two sessions separated by three months. In a block design, participants were presented with auditory speech, visual speech (silent speechreading), and audiovisual speech conditions. Optode arrays were placed bilaterally over the temporal lobes, targeting auditory brain regions. A range of established metrics was used to quantify the reproducibility of cortical activation patterns, as well as the amplitude and time course of the haemodynamic response within predefined regions of interest. The use of a signal processing algorithm designed to reduce the influence of systemic physiological signals was found to be crucial to achieving reliable detection of significant activation at the group level. For auditory speech (with or without visual cues), reliability was good to excellent at the group level, but highly variable among individuals. Temporal-lobe activation in response to visual speech was less reliable, especially in the right hemisphere. Consistent with previous reports, fNIRS reliability was improved by averaging across a small number of channels overlying a cortical region of interest. Overall, the present results confirm that fNIRS can measure speech-evoked auditory responses in adults that are highly reliable at the group level, and indicate that signal processing to reduce physiological noise may substantially improve the reliability of fNIRS measurements. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Perception of co-speech gestures in aphasic patients: a visual exploration study during the observation of dyadic conversations.

PubMed

Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M

2015-03-01

Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.
Individual differences and the effect of face configuration information in the McGurk effect.

PubMed

Ujiie, Yuta; Asai, Tomohisa; Wakabayashi, Akio

2018-04-01

The McGurk effect, which denotes the influence of visual information on audiovisual speech perception, is less frequently observed in individuals with autism spectrum disorder (ASD) compared to those without it; the reason for this remains unclear. Several studies have suggested that facial configuration context might play a role in this difference. More specifically, people with ASD show a local processing bias for faces-that is, they process global face information to a lesser extent. This study examined the role of facial configuration context in the McGurk effect in 46 healthy students. Adopting an analogue approach using the Autism-Spectrum Quotient (AQ), we sought to determine whether this facial configuration context is crucial to previously observed reductions in the McGurk effect in people with ASD. Lip-reading and audiovisual syllable identification tasks were assessed via presentation of upright normal, inverted normal, upright Thatcher-type, and inverted Thatcher-type faces. When the Thatcher-type face was presented, perceivers were found to be sensitive to the misoriented facial characteristics, causing them to perceive a weaker McGurk effect than when the normal face was presented (this is known as the McThatcher effect). Additionally, the McGurk effect was weaker in individuals with high AQ scores than in those with low AQ scores in the incongruent audiovisual condition, regardless of their ability to read lips or process facial configuration contexts. Our findings, therefore, do not support the assumption that individuals with ASD show a weaker McGurk effect due to a difficulty in processing facial configuration context.
Validating a Method to Assess Lipreading, Audiovisual Gain, and Integration During Speech Reception With Cochlear-Implanted and Normal-Hearing Subjects Using a Talking Head.

PubMed

Schreitmüller, Stefan; Frenken, Miriam; Bentz, Lüder; Ortmann, Magdalene; Walger, Martin; Meister, Hartmut

Watching a talker's mouth is beneficial for speech reception (SR) in many communication settings, especially in noise and when hearing is impaired. Measures for audiovisual (AV) SR can be valuable in the framework of diagnosing or treating hearing disorders. This study addresses the lack of standardized methods in many languages for assessing lipreading, AV gain, and integration. A new method is validated that supplements a German speech audiometric test with visualizations of the synthetic articulation of an avatar that was used, for it is feasible to lip-sync auditory speech in a highly standardized way. Three hypotheses were formed according to the literature on AV SR that used live or filmed talkers. It was tested whether respective effects could be reproduced with synthetic articulation: (1) cochlear implant (CI) users have a higher visual-only SR than normal-hearing (NH) individuals, and younger individuals obtain higher lipreading scores than older persons. (2) Both CI and NH gain from presenting AV over unimodal (auditory or visual) sentences in noise. (3) Both CI and NH listeners efficiently integrate complementary auditory and visual speech features. In a controlled, cross-sectional study with 14 experienced CI users (mean age 47.4) and 14 NH individuals (mean age 46.3, similar broad age distribution), lipreading, AV gain, and integration of a German matrix sentence test were assessed. Visual speech stimuli were synthesized by the articulation of the Talking Head system "MASSY" (Modular Audiovisual Speech Synthesizer), which displayed standardized articulation with respect to the visibility of German phones. In line with the hypotheses and previous literature, CI users had a higher mean visual-only SR than NH individuals (CI, 38%; NH, 12%; p < 0.001). Age was correlated with lipreading such that within each group, younger individuals obtained higher visual-only scores than older persons (rCI = -0.54; p = 0.046; rNH = -0.78; p < 0.001). Both CI and NH benefitted by AV over unimodal speech as indexed by calculations of the measures visual enhancement and auditory enhancement (each p < 0.001). Both groups efficiently integrated complementary auditory and visual speech features as indexed by calculations of the measure integration enhancement (each p < 0.005). Given the good agreement between results from literature and the outcome of supplementing an existing validated auditory test with synthetic visual cues, the introduced method can be considered an interesting candidate for clinical and scientific applications to assess measures important for AV SR in a standardized manner. This could be beneficial for optimizing the diagnosis and treatment of individual listening and communication disorders, such as cochlear implantation.
Crossmodal and incremental perception of audiovisual cues to emotional speech.

PubMed

Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

2010-01-01

In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores raise faster for positive than negative conditions. Finally, the gating results suggest that incongruent emotions are perceived as more intense than congruent emotions, as the former get more extreme recognition scores than the latter, already after a short period of exposure.
Visual speech segmentation: using facial cues to locate word boundaries in continuous speech

PubMed Central

Mitchel, Aaron D.; Weiss, Daniel J.

2014-01-01

Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition. PMID:25018577
Does dynamic information about the speaker's face contribute to semantic speech processing? ERP evidence.

PubMed

Hernández-Gutiérrez, David; Abdel Rahman, Rasha; Martín-Loeches, Manuel; Muñoz, Francisco; Schacht, Annekathrin; Sommer, Werner

2018-07-01

Face-to-face interactions characterize communication in social contexts. These situations are typically multimodal, requiring the integration of linguistic auditory input with facial information from the speaker. In particular, eye gaze and visual speech provide the listener with social and linguistic information, respectively. Despite the importance of this context for an ecological study of language, research on audiovisual integration has mainly focused on the phonological level, leaving aside effects on semantic comprehension. Here we used event-related potentials (ERPs) to investigate the influence of facial dynamic information on semantic processing of connected speech. Participants were presented with either a video or a still picture of the speaker, concomitant to auditory sentences. Along three experiments, we manipulated the presence or absence of the speaker's dynamic facial features (mouth and eyes) and compared the amplitudes of the semantic N400 elicited by unexpected words. Contrary to our predictions, the N400 was not modulated by dynamic facial information; therefore, semantic processing seems to be unaffected by the speaker's gaze and visual speech. Even though, during the processing of expected words, dynamic faces elicited a long-lasting late posterior positivity compared to the static condition. This effect was significantly reduced when the mouth of the speaker was covered. Our findings may indicate an increase of attentional processing to richer communicative contexts. The present findings also demonstrate that in natural communicative face-to-face encounters, perceiving the face of a speaker in motion provides supplementary information that is taken into account by the listener, especially when auditory comprehension is non-demanding. Copyright © 2018 Elsevier Ltd. All rights reserved.
Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

PubMed

D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

2016-08-01

Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.
The development of co-speech gesture in the communication of children with autism spectrum disorders.

PubMed

Sowden, Hannah; Clegg, Judy; Perkins, Michael

2013-12-01

Co-speech gestures have a close semantic relationship to speech in adult conversation. In typically developing children co-speech gestures which give additional information to speech facilitate the emergence of multi-word speech. A difficulty with integrating audio-visual information is known to exist for individuals with Autism Spectrum Disorder (ASD), which may affect development of the speech-gesture system. A longitudinal observational study was conducted with four children with ASD, aged 2;4 to 3;5 years. Participants were video-recorded for 20 min every 2 weeks during their attendance on an intervention programme. Recording continued for up to 8 months, thus affording a rich analysis of gestural practices from pre-verbal to multi-word speech across the group. All participants combined gesture with either speech or vocalisations. Co-speech gestures providing additional information to speech were observed to be either absent or rare. Findings suggest that children with ASD do not make use of the facilitating communicative effects of gesture in the same way as typically developing children.

Contributions of local speech encoding and functional connectivity to audio-visual speech perception

PubMed Central

Giordano, Bruno L; Ince, Robin A A; Gross, Joachim; Schyns, Philippe G; Panzeri, Stefano; Kayser, Christoph

2017-01-01

Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments. DOI: http://dx.doi.org/10.7554/eLife.24763.001 PMID:28590903
Speaker emotion recognition: from classical classifiers to deep neural networks

NASA Astrophysics Data System (ADS)

Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

2018-04-01

Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.
MEG demonstrates a supra-additive response to facial and vocal emotion in the right superior temporal sulcus.

PubMed

Hagan, Cindy C; Woods, Will; Johnson, Sam; Calder, Andrew J; Green, Gary G R; Young, Andrew W

2009-11-24

An influential neural model of face perception suggests that the posterior superior temporal sulcus (STS) is sensitive to those aspects of faces that produce transient visual changes, including facial expression. Other researchers note that recognition of expression involves multiple sensory modalities and suggest that the STS also may respond to crossmodal facial signals that change transiently. Indeed, many studies of audiovisual (AV) speech perception show STS involvement in AV speech integration. Here we examine whether these findings extend to AV emotion. We used magnetoencephalography to measure the neural responses of participants as they viewed and heard emotionally congruent fear and minimally congruent neutral face and voice stimuli. We demonstrate significant supra-additive responses (i.e., where AV > [unimodal auditory + unimodal visual]) in the posterior STS within the first 250 ms for emotionally congruent AV stimuli. These findings show a role for the STS in processing crossmodal emotive signals.
Multimedia material about velopharynx and primary palatoplasty for orientation of caregivers of children with cleft lip and palate.

PubMed

Costa, Tarcila Lima da; Souza, Olivia Mesquita Vieira de; Carneiro, Homero Aferri; Chiquito Netto, Cristianne; Pegoraro-Krook, Maria Inês; Dutka, Jeniffer de Cássia Rillo

2016-01-01

The objective of this study was to describe the process of elaboration and evaluation of multimedia material for caregivers about velopharynx, speech, and primary palatoplasty in babies with cleft lip and palate. The elaboration of the material involved an interdisciplinary relationship between the fields of Speech Language Pathology and Audiology, Dentistry and Arts. The definition and execution of the following activities were based on the principles of art education involving the following: characterization of audience, characterization of content, identification and elaboration of illustrations, characterization of educational approach, elaboration of text and narratives, definition of audiovisual sequence, and video preparation. The material was evaluated with the participation of 41 caregivers of patients with cleft lip and palate involving the comparison between acquired knowledge using an evaluation script applied before and after presenting the material. An increase was observed in correct responses regarding the role of velopharynx and the importance of primary palatoplasty for speech. The multimedia was effective in optimizing the knowledge of caregivers, suggesting the importance of such material during orientation.
Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration.

PubMed

Stropahl, Maren; Debener, Stefan

2017-01-01

There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI) users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users ( n = 18), untreated mild to moderately hearing impaired individuals (n = 18) and normal hearing controls ( n = 17). Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the auditory system even at early stages of hearing loss.
Audiovisual perceptual learning with multiple speakers.

PubMed

Mitchel, Aaron D; Gerfen, Chip; Weiss, Daniel J

2016-05-01

One challenge for speech perception is between-speaker variability in the acoustic parameters of speech. For example, the same phoneme (e.g. the vowel in "cat") may have substantially different acoustic properties when produced by two different speakers and yet the listener must be able to interpret these disparate stimuli as equivalent. Perceptual tuning, the use of contextual information to adjust phonemic representations, may be one mechanism that helps listeners overcome obstacles they face due to this variability during speech perception. Here we test whether visual contextual cues to speaker identity may facilitate the formation and maintenance of distributional representations for individual speakers, allowing listeners to adjust phoneme boundaries in a speaker-specific manner. We familiarized participants to an audiovisual continuum between /aba/ and /ada/. During familiarization, the "b-face" mouthed /aba/ when an ambiguous token was played, while the "D-face" mouthed /ada/. At test, the same ambiguous token was more likely to be identified as /aba/ when paired with a stilled image of the "b-face" than with an image of the "D-face." This was not the case in the control condition when the two faces were paired equally with the ambiguous token. Together, these results suggest that listeners may form speaker-specific phonemic representations using facial identity cues.
Benefits of Music Training for Perception of Emotional Speech Prosody in Deaf Children With Cochlear Implants

PubMed Central

Gordon, Karen A.; Papsin, Blake C.; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A.

2017-01-01

Objectives: Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Design: Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Results: Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Conclusions: Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation. PMID:28085739
Benefits of Music Training for Perception of Emotional Speech Prosody in Deaf Children With Cochlear Implants.

PubMed

Good, Arla; Gordon, Karen A; Papsin, Blake C; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A

Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation.
Audiovisual Speech Integration in Pervasive Developmental Disorder: Evidence from Event-Related Potentials

ERIC Educational Resources Information Center

Magnee, Maurice J. C. M.; de Gelder, Beatrice; van Engeland, Herman; Kemner, Chantal

2008-01-01

Background: Integration of information from multiple sensory sources is an important prerequisite for successful social behavior, especially during face-to-face conversation. It has been suggested that communicative impairments among individuals with pervasive developmental disorders (PDD) might be caused by an inability to integrate synchronously…
Audiovisual signal compression: the 64/P codecs

NASA Astrophysics Data System (ADS)

Jayant, Nikil S.

1996-02-01

Video codecs operating at integral multiples of 64 kbps are well-known in visual communications technology as p * 64 systems (p equals 1 to 24). Originally developed as a class of ITU standards, these codecs have served as core technology for videoconferencing, and they have also influenced the MPEG standards for addressable video. Video compression in the above systems is provided by motion compensation followed by discrete cosine transform -- quantization of the residual signal. Notwithstanding the promise of higher bit rates in emerging generations of networks and storage devices, there is a continuing need for facile audiovisual communications over voice band and wireless modems. Consequently, video compression at bit rates lower than 64 kbps is a widely-sought capability. In particular, video codecs operating at rates in the neighborhood of 64, 32, 16, and 8 kbps seem to have great practical value, being matched respectively to the transmission capacities of basic rate ISDN (64 kbps), and voiceband modems that represent high (32 kbps), medium (16 kbps) and low- end (8 kbps) grades in current modem technology. The purpose of this talk is to describe the state of video technology at these transmission rates, without getting too literal about the specific speeds mentioned above. In other words, we expect codecs designed for non- submultiples of 64 kbps, such as 56 kbps or 19.2 kbps, as well as for sub-multiples of 64 kbps, depending on varying constraints on modem rate and the transmission rate needed for the voice-coding part of the audiovisual communications link. The MPEG-4 video standards process is a natural platform on which to examine current capabilities in sub-ISDN rate video coding, and we shall draw appropriately from this process in describing video codec performance. Inherent in this summary is a reinforcement of motion compensation and DCT as viable building blocks of video compression systems, although there is a need for improving signal quality even in the very best of these systems. In a related part of our talk, we discuss the role of preprocessing and postprocessing subsystems which serve to enhance the performance of an otherwise standard codec. Examples of these (sometimes proprietary) subsystems are automatic face-tracking prior to the coding of a head-and-shoulders scene, and adaptive postfiltering after conventional decoding, to reduce generic classes of artifacts in low bit rate video. The talk concludes with a summary of technology targets and research directions. We discuss targets in terms of four fundamental parameters of coder performance: quality, bit rate, delay and complexity; and we emphasize the need for measuring and maximizing the composite quality of the audiovisual signal. In discussing research directions, we examine progress and opportunities in two fundamental approaches for bit rate reduction: removal of statistical redundancy and reduction of perceptual irrelevancy; we speculate on the value of techniques such as analysis-by-synthesis that have proved to be quite valuable in speech coding, and we examine the prospect of integrating speech and image processing for developing next-generation technology for audiovisual communications.
Seeing the Talker's Face Improves Free Recall of Speech for Young Adults With Normal Hearing but Not Older Adults With Hearing Loss.

PubMed

Rudner, Mary; Mishra, Sushmit; Stenfelt, Stefan; Lunner, Thomas; Rönnberg, Jerker

2016-06-01

Seeing the talker's face improves speech understanding in noise, possibly releasing resources for cognitive processing. We investigated whether it improves free recall of spoken two-digit numbers. Twenty younger adults with normal hearing and 24 older adults with hearing loss listened to and subsequently recalled lists of 13 two-digit numbers, with alternating male and female talkers. Lists were presented in quiet as well as in stationary and speech-like noise at a signal-to-noise ratio giving approximately 90% intelligibility. Amplification compensated for loss of audibility. Seeing the talker's face improved free recall performance for the younger but not the older group. Poorer performance in background noise was contingent on individual differences in working memory capacity. The effect of seeing the talker's face did not differ in quiet and noise. We have argued that the absence of an effect of seeing the talker's face for older adults with hearing loss may be due to modulation of audiovisual integration mechanisms caused by an interaction between task demands and participant characteristics. In particular, we suggest that executive task demands and interindividual executive skills may play a key role in determining the benefit of seeing the talker's face during a speech-based cognitive task.
Preschoolers Benefit From Visually Salient Speech Cues

PubMed Central

Holt, Rachael Frush

2015-01-01

Purpose This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method Twelve adults and 27 typically developing 3- and 4-year-old children completed 3 audiovisual (AV) speech integration tasks: matching, discrimination, and recognition. The authors compared AV benefit for visually salient and less visually salient speech discrimination contrasts and assessed the visual saliency of consonant confusions in auditory-only and AV word recognition. Results Four-year-olds and adults demonstrated visual influence on all measures. Three-year-olds demonstrated visual influence on speech discrimination and recognition measures. All groups demonstrated greater AV benefit for the visually salient discrimination contrasts. AV recognition benefit in 4-year-olds and adults depended on the visual saliency of speech sounds. Conclusions Preschoolers can demonstrate AV speech integration. Their AV benefit results from efficient use of visually salient speech cues. Four-year-olds, but not 3-year-olds, used visual phonological knowledge to take advantage of visually salient speech cues, suggesting possible developmental differences in the mechanisms of AV benefit. PMID:25322336
Does Language Influence the Accuracy of Judgments of Stuttering in Children?

ERIC Educational Resources Information Center

Einarsdottir, Johanna; Ingham, Roger J.

2009-01-01

Purpose: To determine whether stuttering judgment accuracy is influenced by familiarity with the stuttering speaker's language. Method: Audiovisual 7-min speech samples from nine 3- to 5-year-olds were used. Icelandic children who stutter (CWS), preselected for different levels of stuttering, were subdivided into 5-s intervals. Ten experienced…
Audiovisual Perception of Congruent and Incongruent Dutch Front Vowels

ERIC Educational Resources Information Center

Valkenier, Bea; Duyne, Jurriaan Y.; Andringa, Tjeerd C.; Baskent, Deniz

2012-01-01

Purpose: Auditory perception of vowels in background noise is enhanced when combined with visually perceived speech features. The objective of this study was to investigate whether the influence of visual cues on vowel perception extends to incongruent vowels, in a manner similar to the McGurk effect observed with consonants. Method:…
Sound Recordings in the Audiovisual Archives Division of the National Archives. Preliminary Draft.

ERIC Educational Resources Information Center

Bray, Mayfield S.; Waffen, Leslie C.

Some 47,000 sound recordings dating from the turn of the century have been collected in the National Archives. These recordings, consisting of recordings of press conferences, panel discussions, interviews, speeches, court and conference proceedings, entertainment programs, and news broadcasts, are listed in this paper by government agency. The…
Going to the MALL: Mobile Assisted Language Learning

ERIC Educational Resources Information Center

Chinnery, George M.

2006-01-01

Practically since their availability, a succession of audiovisual recording devices (e.g., reel-to-reel, VCRs, PCs) has been used to capture language samples, and myriad playback and broadcast devices (e.g., phonographs, radios, televisions) have provided access to authentic speech samples. The espousal of audiolingual theory in the 1950s brought…
Gesture and Metaphor Comprehension: Electrophysiological Evidence of Cross-Modal Coordination by Audiovisual Stimulation

ERIC Educational Resources Information Center

Cornejo, Carlos; Simonetti, Franco; Ibanez, Agustin; Aldunate, Nerea; Ceric, Francisco; Lopez, Vladimir; Nunez, Rafael E.

2009-01-01

In recent years, studies have suggested that gestures influence comprehension of linguistic expressions, for example, eliciting an N400 component in response to a speech/gesture mismatch. In this paper, we investigate the role of gestural information in the understanding of metaphors. Event related potentials (ERPs) were recorded while…
A virtual speaker in noisy classroom conditions: supporting or disrupting children's listening comprehension?

PubMed

Nirme, Jens; Haake, Magnus; Lyberg Åhlander, Viveka; Brännström, Jonas; Sahlén, Birgitta

2018-04-05

Seeing a speaker's face facilitates speech recognition, particularly under noisy conditions. Evidence for how it might affect comprehension of the content of the speech is more sparse. We investigated how children's listening comprehension is affected by multi-talker babble noise, with or without presentation of a digitally animated virtual speaker, and whether successful comprehension is related to performance on a test of executive functioning. We performed a mixed-design experiment with 55 (34 female) participants (8- to 9-year-olds), recruited from Swedish elementary schools. The children were presented with four different narratives, each in one of four conditions: audio-only presentation in a quiet setting, audio-only presentation in noisy setting, audio-visual presentation in a quiet setting, and audio-visual presentation in a noisy setting. After each narrative, the children answered questions on the content and rated their perceived listening effort. Finally, they performed a test of executive functioning. We found significantly fewer correct answers to explicit content questions after listening in noise. This negative effect was only mitigated to a marginally significant degree by audio-visual presentation. Strong executive function only predicted more correct answers in quiet settings. Altogether, our results are inconclusive regarding how seeing a virtual speaker affects listening comprehension. We discuss how methodological adjustments, including modifications to our virtual speaker, can be used to discriminate between possible explanations to our results and contribute to understanding the listening conditions children face in a typical classroom.
Speech comprehension training and auditory and cognitive processing in older adults.

PubMed

Pichora-Fuller, M Kathleen; Levitt, Harry

2012-12-01

To provide a brief history of speech comprehension training systems and an overview of research on auditory and cognitive aging as background to recommendations for future directions for rehabilitation. Two distinct domains were reviewed: one concerning technological and the other concerning psychological aspects of training. Historical trends and advances in these 2 domains were interrelated to highlight converging trends and directions for future practice. Over the last century, technological advances have influenced both the design of hearing aids and training systems. Initially, training focused on children and those with severe loss for whom amplification was insufficient. Now the focus has shifted to older adults with relatively little loss but difficulties listening in noise. Evidence of brain plasticity from auditory and cognitive neuroscience provides new insights into how to facilitate perceptual (re-)learning by older adults. There is a new imperative to complement training to increase bottom-up processing of the signal with more ecologically valid training to boost top-down information processing based on knowledge of language and the world. Advances in digital technologies enable the development of increasingly sophisticated training systems incorporating complex meaningful materials such as music, audiovisual interactive displays, and conversation.
Perceived synchrony for realistic and dynamic audiovisual events.

PubMed

Eg, Ragnhild; Behne, Dawn M

2015-01-01

In well-controlled laboratory experiments, researchers have found that humans can perceive delays between auditory and visual signals as short as 20 ms. Conversely, other experiments have shown that humans can tolerate audiovisual asynchrony that exceeds 200 ms. This seeming contradiction in human temporal sensitivity can be attributed to a number of factors such as experimental approaches and precedence of the asynchronous signals, along with the nature, duration, location, complexity and repetitiveness of the audiovisual stimuli, and even individual differences. In order to better understand how temporal integration of audiovisual events occurs in the real world, we need to close the gap between the experimental setting and the complex setting of everyday life. With this work, we aimed to contribute one brick to the bridge that will close this gap. We compared perceived synchrony for long-running and eventful audiovisual sequences to shorter sequences that contain a single audiovisual event, for three types of content: action, music, and speech. The resulting windows of temporal integration showed that participants were better at detecting asynchrony for the longer stimuli, possibly because the long-running sequences contain multiple corresponding events that offer audiovisual timing cues. Moreover, the points of subjective simultaneity differ between content types, suggesting that the nature of a visual scene could influence the temporal perception of events. An expected outcome from this type of experiment was the rich variation among participants' distributions and the derived points of subjective simultaneity. Hence, the designs of similar experiments call for more participants than traditional psychophysical studies. Heeding this caution, we conclude that existing theories on multisensory perception are ready to be tested on more natural and representative stimuli.

Perceived synchrony for realistic and dynamic audiovisual events

PubMed Central

Eg, Ragnhild; Behne, Dawn M.

2015-01-01

In well-controlled laboratory experiments, researchers have found that humans can perceive delays between auditory and visual signals as short as 20 ms. Conversely, other experiments have shown that humans can tolerate audiovisual asynchrony that exceeds 200 ms. This seeming contradiction in human temporal sensitivity can be attributed to a number of factors such as experimental approaches and precedence of the asynchronous signals, along with the nature, duration, location, complexity and repetitiveness of the audiovisual stimuli, and even individual differences. In order to better understand how temporal integration of audiovisual events occurs in the real world, we need to close the gap between the experimental setting and the complex setting of everyday life. With this work, we aimed to contribute one brick to the bridge that will close this gap. We compared perceived synchrony for long-running and eventful audiovisual sequences to shorter sequences that contain a single audiovisual event, for three types of content: action, music, and speech. The resulting windows of temporal integration showed that participants were better at detecting asynchrony for the longer stimuli, possibly because the long-running sequences contain multiple corresponding events that offer audiovisual timing cues. Moreover, the points of subjective simultaneity differ between content types, suggesting that the nature of a visual scene could influence the temporal perception of events. An expected outcome from this type of experiment was the rich variation among participants' distributions and the derived points of subjective simultaneity. Hence, the designs of similar experiments call for more participants than traditional psychophysical studies. Heeding this caution, we conclude that existing theories on multisensory perception are ready to be tested on more natural and representative stimuli. PMID:26082738
Cognitive spare capacity: evaluation data and its association with comprehension of dynamic conversations

PubMed Central

Keidser, Gitte; Best, Virginia; Freeston, Katrina; Boyce, Alexandra

2015-01-01

It is well-established that communication involves the working memory system, which becomes increasingly engaged in understanding speech as the input signal degrades. The more resources allocated to recovering a degraded input signal, the fewer resources, referred to as cognitive spare capacity (CSC), remain for higher-level processing of speech. Using simulated natural listening environments, the aims of this paper were to (1) evaluate an English version of a recently introduced auditory test to measure CSC that targets the updating process of the executive function, (2) investigate if the test predicts speech comprehension better than the reading span test (RST) commonly used to measure working memory capacity, and (3) determine if the test is sensitive to increasing the number of attended locations during listening. In Experiment I, the CSC test was presented using a male and a female talker, in quiet and in spatially separated babble- and cafeteria-noises, in an audio-only and in an audio-visual mode. Data collected on 21 listeners with normal and impaired hearing confirmed that the English version of the CSC test is sensitive to population group, noise condition, and clarity of speech, but not presentation modality. In Experiment II, performance by 27 normal-hearing listeners on a novel speech comprehension test presented in noise was significantly associated with working memory capacity, but not with CSC. Moreover, this group showed no significant difference in CSC as the number of talker locations in the test increased. There was no consistent association between the CSC test and the RST. It is recommended that future studies investigate the psychometric properties of the CSC test, and examine its sensitivity to the complexity of the listening environment in participants with both normal and impaired hearing. PMID:25999904
Impact of language on development of auditory-visual speech perception.

PubMed

Sekiyama, Kaoru; Burnham, Denis

2008-03-01

The McGurk effect paradigm was used to examine the developmental onset of inter-language differences between Japanese and English in auditory-visual speech perception. Participants were asked to identify syllables in audiovisual (with congruent or discrepant auditory and visual components), audio-only, and video-only presentations at various signal-to-noise levels. In Experiment 1 with two groups of adults, native speakers of Japanese and native speakers of English, the results on both percent visually influenced responses and reaction time supported previous reports of a weaker visual influence for Japanese participants. In Experiment 2, an additional three age groups (6, 8, and 11 years) in each language group were tested. The results showed that the degree of visual influence was low and equivalent for Japanese and English language 6-year-olds, and increased over age for English language participants, especially between 6 and 8 years, but remained the same for Japanese participants. This may be related to the fact that English language adults and older children processed visual speech information relatively faster than auditory information whereas no such inter-modal differences were found in the Japanese participants' reaction times.
Impact of Audio-Visual Asynchrony on Lip-Reading Effects -Neuromagnetic and Psychophysical Study-

PubMed Central

Yahata, Izumi; Kanno, Akitake; Sakamoto, Shuichi; Takanashi, Yoshitaka; Takata, Shiho; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio

2016-01-01

The effects of asynchrony between audio and visual (A/V) stimuli on the N100m responses of magnetoencephalography in the left hemisphere were compared with those on the psychophysical responses in 11 participants. The latency and amplitude of N100m were significantly shortened and reduced in the left hemisphere by the presentation of visual speech as long as the temporal asynchrony between A/V stimuli was within 100 ms, but were not significantly affected with audio lags of -500 and +500 ms. However, some small effects were still preserved on average with audio lags of 500 ms, suggesting similar asymmetry of the temporal window to that observed in psychophysical measurements, which tended to be more robust (wider) for audio lags; i.e., the pattern of visual-speech effects as a function of A/V lag observed in the N100m in the left hemisphere grossly resembled that in psychophysical measurements on average, although the individual responses were somewhat varied. The present results suggest that the basic configuration of the temporal window of visual effects on auditory-speech perception could be observed from the early auditory processing stage. PMID:28030631
Parametric Representation of the Speaker's Lips for Multimodal Sign Language and Speech Recognition

NASA Astrophysics Data System (ADS)

Ryumin, D.; Karpov, A. A.

2017-05-01

In this article, we propose a new method for parametric representation of human's lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker's lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.
Neural Entrainment to Rhythmically Presented Auditory, Visual, and Audio-Visual Speech in Children

PubMed Central

Power, Alan James; Mead, Natasha; Barnes, Lisa; Goswami, Usha

2012-01-01

Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal “samples” of information from the speech stream at different rates, phase resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (“phase locking”). Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate) based on repetition of the syllable “ba,” presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a “talking head”). To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the “ba” stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a “ba” in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling, such as dyslexia. PMID:22833726
The Foundations of Social Cognition: Studies on Face/Voice Integration in Newborn Infants

ERIC Educational Resources Information Center

Streri, Arlette; Coulon, Marion; Guellai, Bahia

2013-01-01

A series of studies on newborns' abilities for recognizing speaking faces has been performed in order to identify the fundamental cues of social cognition. We used audiovisual dynamic faces rather than photographs or patterns of faces. Direct eye gaze and speech addressed to newborns, in interactive situations, appear to be two good candidates for…
Can Children with Autism Spectrum Disorders "Hear" a Speaking Face?

ERIC Educational Resources Information Center

Irwin, Julia R.; Tornatore, Lauren A.; Brancazio, Lawrence; Whalen, D. H.

2011-01-01

This study used eye-tracking methodology to assess audiovisual speech perception in 26 children ranging in age from 5 to 15 years, half with autism spectrum disorders (ASD) and half with typical development. Given the characteristic reduction in gaze to the faces of others in children with ASD, it was hypothesized that they would show reduced…
Children with SLI Can Exhibit Reduced Attention to a Talker's Mouth

ERIC Educational Resources Information Center

Pons, Ferran; Sanz-Torrent, Monica; Ferinu, Laura; Birulés, Joan; Andreu, Llorenç

2018-01-01

It has been demonstrated that children with specific language impairment (SLI) show difficulties not only with auditory but also with audiovisual speech perception. The goal of this study was to assess whether children with SLI might show reduced attention to the talker's mouth compared to their typically developing (TD) peers. An additional aim…
Audio-visual speech cue combination.

PubMed

Arnold, Derek H; Tear, Morgan; Schindel, Ryan; Roseboom, Warrick

2010-04-16

Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process. Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation. Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.
Talker and lexical effects on audiovisual word recognition by adults with cochlear implants.

PubMed

Kaiser, Adam R; Kirk, Karen Iler; Lachs, Lorin; Pisoni, David B

2003-04-01

The present study examined how postlingually deafened adults with cochlear implants combine visual information from lipreading with auditory cues in an open-set word recognition task. Adults with normal hearing served as a comparison group. Word recognition performance was assessed using lexically controlled word lists presented under auditory-only, visual-only, and combined audiovisual presentation formats. Effects of talker variability were studied by manipulating the number of talkers producing the stimulus tokens. Lexical competition was investigated using sets of lexically easy and lexically hard test words. To assess the degree of audiovisual integration, a measure of visual enhancement, R(a), was used to assess the gain in performance provided in the audiovisual presentation format relative to the maximum possible performance obtainable in the auditory-only format. Results showed that word recognition performance was highest for audiovisual presentation followed by auditory-only and then visual-only stimulus presentation. Performance was better for single-talker lists than for multiple-talker lists, particularly under the audiovisual presentation format. Word recognition performance was better for the lexically easy than for the lexically hard words regardless of presentation format. Visual enhancement scores were higher for single-talker conditions compared to multiple-talker conditions and tended to be somewhat better for lexically easy words than for lexically hard words. The pattern of results suggests that information from the auditory and visual modalities is used to access common, multimodal lexical representations in memory. The findings are discussed in terms of the complementary nature of auditory and visual sources of information that specify the same underlying gestures and articulatory events in speech.
Talker and Lexical Effects on Audiovisual Word Recognition by Adults With Cochlear Implants

PubMed Central

Kaiser, Adam R.; Kirk, Karen Iler; Lachs, Lorin; Pisoni, David B.

2012-01-01

The present study examined how postlingually deafened adults with cochlear implants combine visual information from lipreading with auditory cues in an open-set word recognition task. Adults with normal hearing served as a comparison group. Word recognition performance was assessed using lexically controlled word lists presented under auditory-only, visual-only, and combined audiovisual presentation formats. Effects of talker variability were studied by manipulating the number of talkers producing the stimulus tokens. Lexical competition was investigated using sets of lexically easy and lexically hard test words. To assess the degree of audiovisual integration, a measure of visual enhancement, Ra, was used to assess the gain in performance provided in the audiovisual presentation format relative to the maximum possible performance obtainable in the auditory-only format. Results showed that word recognition performance was highest for audiovisual presentation followed by auditory-only and then visual-only stimulus presentation. Performance was better for single-talker lists than for multiple-talker lists, particularly under the audiovisual presentation format. Word recognition performance was better for the lexically easy than for the lexically hard words regardless of presentation format. Visual enhancement scores were higher for single-talker conditions compared to multiple-talker conditions and tended to be somewhat better for lexically easy words than for lexically hard words. The pattern of results suggests that information from the auditory and visual modalities is used to access common, multimodal lexical representations in memory. The findings are discussed in terms of the complementary nature of auditory and visual sources of information that specify the same underlying gestures and articulatory events in speech. PMID:14700380
Effects of Visual Speech on Early Auditory Evoked Fields - From the Viewpoint of Individual Variance.

PubMed

Yahata, Izumi; Kawase, Tetsuaki; Kanno, Akitake; Hidaka, Hiroshi; Sakamoto, Shuichi; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio

2017-01-01

The effects of visual speech (the moving image of the speaker's face uttering speech sound) on early auditory evoked fields (AEFs) were examined using a helmet-shaped magnetoencephalography system in 12 healthy volunteers (9 males, mean age 35.5 years). AEFs (N100m) in response to the monosyllabic sound /be/ were recorded and analyzed under three different visual stimulus conditions, the moving image of the same speaker's face uttering /be/ (congruent visual stimuli) or uttering /ge/ (incongruent visual stimuli), and visual noise (still image processed from speaker's face using a strong Gaussian filter: control condition). On average, latency of N100m was significantly shortened in the bilateral hemispheres for both congruent and incongruent auditory/visual (A/V) stimuli, compared to the control A/V condition. However, the degree of N100m shortening was not significantly different between the congruent and incongruent A/V conditions, despite the significant differences in psychophysical responses between these two A/V conditions. Moreover, analysis of the magnitudes of these visual effects on AEFs in individuals showed that the lip-reading effects on AEFs tended to be well correlated between the two different audio-visual conditions (congruent vs. incongruent visual stimuli) in the bilateral hemispheres but were not significantly correlated between right and left hemisphere. On the other hand, no significant correlation was observed between the magnitudes of visual speech effects and psychophysical responses. These results may indicate that the auditory-visual interaction observed on the N100m is a fundamental process which does not depend on the congruency of the visual information.
Multisensory speech perception in autism spectrum disorder: From phoneme to whole-word perception.

PubMed

Stevenson, Ryan A; Baum, Sarah H; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Wallace, Mark T

2017-07-01

Speech perception in noisy environments is boosted when a listener can see the speaker's mouth and integrate the auditory and visual speech information. Autistic children have a diminished capacity to integrate sensory information across modalities, which contributes to core symptoms of autism, such as impairments in social communication. We investigated the abilities of autistic and typically-developing (TD) children to integrate auditory and visual speech stimuli in various signal-to-noise ratios (SNR). Measurements of both whole-word and phoneme recognition were recorded. At the level of whole-word recognition, autistic children exhibited reduced performance in both the auditory and audiovisual modalities. Importantly, autistic children showed reduced behavioral benefit from multisensory integration with whole-word recognition, specifically at low SNRs. At the level of phoneme recognition, autistic children exhibited reduced performance relative to their TD peers in auditory, visual, and audiovisual modalities. However, and in contrast to their performance at the level of whole-word recognition, both autistic and TD children showed benefits from multisensory integration for phoneme recognition. In accordance with the principle of inverse effectiveness, both groups exhibited greater benefit at low SNRs relative to high SNRs. Thus, while autistic children showed typical multisensory benefits during phoneme recognition, these benefits did not translate to typical multisensory benefit of whole-word recognition in noisy environments. We hypothesize that sensory impairments in autistic children raise the SNR threshold needed to extract meaningful information from a given sensory input, resulting in subsequent failure to exhibit behavioral benefits from additional sensory information at the level of whole-word recognition. Autism Res 2017. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. Autism Res 2017, 10: 1280-1290. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.
Fifty years of progress in speech and speaker recognition

NASA Astrophysics Data System (ADS)

Furui, Sadaoki

2004-10-01

Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.
Excitability of the motor system: A transcranial magnetic stimulation study on singing and speaking.

PubMed

Royal, Isabelle; Lidji, Pascale; Théoret, Hugo; Russo, Frank A; Peretz, Isabelle

2015-08-01

The perception of movements is associated with increased activity in the human motor cortex, which in turn may underlie our ability to understand actions, as it may be implicated in the recognition, understanding and imitation of actions. Here, we investigated the involvement and lateralization of the primary motor cortex (M1) in the perception of singing and speech. Transcranial magnetic stimulation (TMS) was applied independently for both hemispheres over the mouth representation of the motor cortex in healthy participants while they watched 4-s audiovisual excerpts of singers producing a 2-note ascending interval (singing condition) or 4-s audiovisual excerpts of a person explaining a proverb (speech condition). Subjects were instructed to determine whether a sung interval/written proverb, matched a written interval/proverb. During both tasks, motor evoked potentials (MEPs) were recorded from the contralateral mouth muscle (orbicularis oris) of the stimulated motor cortex compared to a control task. Moreover, to investigate the time course of motor activation, TMS pulses were randomly delivered at 7 different time points (ranging from 500 to 3500 ms after stimulus onset). Results show that stimulation of the right hemisphere had a similar effect on the MEPs for both the singing and speech perception tasks, whereas stimulation of the left hemisphere significantly differed in the speech perception task compared to the singing perception task. Furthermore, analysis of the MEPs in the singing task revealed that they decreased for small musical intervals, but increased for large musical intervals, regardless of which hemisphere was stimulated. Overall, these results suggest a dissociation between the lateralization of M1 activity for speech perception and for singing perception, and that in the latter case its activity can be modulated by musical parameters such as the size of a musical interval. Copyright © 2015 Elsevier Ltd. All rights reserved.
The Role of Intersensory Redundancy in the Emergence of Social Referencing in 5 1/2-Month-Old Infants

ERIC Educational Resources Information Center

Vaillant-Molina, Mariana; Bahrick, Lorraine E.

2012-01-01

Early evidence of social referencing was examined in 5 1/2-month-old infants. Infants were habituated to 2 films of moving toys, one toy eliciting a woman's positive emotional expression and the other eliciting a negative expression under conditions of bimodal (audiovisual) or unimodal visual (silent) speech. It was predicted that intersensory…
pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.

PubMed

Giannakopoulos, Theodoros

2015-01-01

Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.
pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis

PubMed Central

Giannakopoulos, Theodoros

2015-01-01

Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library. PMID:26656189
Read My Lips: Brain Dynamics Associated with Audiovisual Integration and Deviance Detection.

PubMed

Tse, Chun-Yu; Gratton, Gabriele; Garnsey, Susan M; Novak, Michael A; Fabiani, Monica

2015-09-01

Information from different modalities is initially processed in different brain areas, yet real-world perception often requires the integration of multisensory signals into a single percept. An example is the McGurk effect, in which people viewing a speaker whose lip movements do not match the utterance perceive the spoken sounds incorrectly, hearing them as more similar to those signaled by the visual rather than the auditory input. This indicates that audiovisual integration is important for generating the phoneme percept. Here we asked when and where the audiovisual integration process occurs, providing spatial and temporal boundaries for the processes generating phoneme perception. Specifically, we wanted to separate audiovisual integration from other processes, such as simple deviance detection. Building on previous work employing ERPs, we used an oddball paradigm in which task-irrelevant audiovisually deviant stimuli were embedded in strings of non-deviant stimuli. We also recorded the event-related optical signal, an imaging method combining spatial and temporal resolution, to investigate the time course and neuroanatomical substrate of audiovisual integration. We found that audiovisual deviants elicit a short duration response in the middle/superior temporal gyrus, whereas audiovisual integration elicits a more extended response involving also inferior frontal and occipital regions. Interactions between audiovisual integration and deviance detection processes were observed in the posterior/superior temporal gyrus. These data suggest that dynamic interactions between inferior frontal cortex and sensory regions play a significant role in multimodal integration.

Hearing faces: how the infant brain matches the face it sees with the speech it hears.

PubMed

Bristow, Davina; Dehaene-Lambertz, Ghislaine; Mattout, Jeremie; Soares, Catherine; Gliga, Teodora; Baillet, Sylvain; Mangin, Jean-François

2009-05-01

Speech is not a purely auditory signal. From around 2 months of age, infants are able to correctly match the vowel they hear with the appropriate articulating face. However, there is no behavioral evidence of integrated audiovisual perception until 4 months of age, at the earliest, when an illusory percept can be created by the fusion of the auditory stimulus and of the facial cues (McGurk effect). To understand how infants initially match the articulatory movements they see with the sounds they hear, we recorded high-density ERPs in response to auditory vowels that followed a congruent or incongruent silently articulating face in 10-week-old infants. In a first experiment, we determined that auditory-visual integration occurs during the early stages of perception as in adults. The mismatch response was similar in timing and in topography whether the preceding vowels were presented visually or aurally. In the second experiment, we studied audiovisual integration in the linguistic (vowel perception) and nonlinguistic (gender perception) domain. We observed a mismatch response for both types of change at similar latencies. Their topographies were significantly different demonstrating that cross-modal integration of these features is computed in parallel by two different networks. Indeed, brain source modeling revealed that phoneme and gender computations were lateralized toward the left and toward the right hemisphere, respectively, suggesting that each hemisphere possesses an early processing bias. We also observed repetition suppression in temporal regions and repetition enhancement in frontal regions. These results underscore how complex and structured is the human cortical organization which sustains communication from the first weeks of life on.
Summarizing Audiovisual Contents of a Video Program

NASA Astrophysics Data System (ADS)

Gong, Yihong

2003-12-01

In this paper, we focus on video programs that are intended to disseminate information and knowledge such as news, documentaries, seminars, etc, and present an audiovisual summarization system that summarizes the audio and visual contents of the given video separately, and then integrating the two summaries with a partial alignment. The audio summary is created by selecting spoken sentences that best present the main content of the audio speech while the visual summary is created by eliminating duplicates/redundancies and preserving visually rich contents in the image stream. The alignment operation aims to synchronize each spoken sentence in the audio summary with its corresponding speaker's face and to preserve the rich content in the visual summary. A Bipartite Graph-based audiovisual alignment algorithm is developed to efficiently find the best alignment solution that satisfies these alignment requirements. With the proposed system, we strive to produce a video summary that: (1) provides a natural visual and audio content overview, and (2) maximizes the coverage for both audio and visual contents of the original video without having to sacrifice either of them.
Effects of stimulus response compatibility on covert imitation of vowels.

PubMed

Adank, Patti; Nuttall, Helen; Bekkering, Harold; Maegherman, Gwijde

2018-03-13

When we observe someone else speaking, we tend to automatically activate the corresponding speech motor patterns. When listening, we therefore covertly imitate the observed speech. Simulation theories of speech perception propose that covert imitation of speech motor patterns supports speech perception. Covert imitation of speech has been studied with interference paradigms, including the stimulus-response compatibility paradigm (SRC). The SRC paradigm measures covert imitation by comparing articulation of a prompt following exposure to a distracter. Responses tend to be faster for congruent than for incongruent distracters; thus, showing evidence of covert imitation. Simulation accounts propose a key role for covert imitation in speech perception. However, covert imitation has thus far only been demonstrated for a select class of speech sounds, namely consonants, and it is unclear whether covert imitation extends to vowels. We aimed to demonstrate that covert imitation effects as measured with the SRC paradigm extend to vowels, in two experiments. We examined whether covert imitation occurs for vowels in a consonant-vowel-consonant context in visual, audio, and audiovisual modalities. We presented the prompt at four time points to examine how covert imitation varied over the distracter's duration. The results of both experiments clearly demonstrated covert imitation effects for vowels, thus supporting simulation theories of speech perception. Covert imitation was not affected by stimulus modality and was maximal for later time points.
Effects of Visual Speech on Early Auditory Evoked Fields - From the Viewpoint of Individual Variance

PubMed Central

Yahata, Izumi; Kanno, Akitake; Hidaka, Hiroshi; Sakamoto, Shuichi; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio

2017-01-01

The effects of visual speech (the moving image of the speaker’s face uttering speech sound) on early auditory evoked fields (AEFs) were examined using a helmet-shaped magnetoencephalography system in 12 healthy volunteers (9 males, mean age 35.5 years). AEFs (N100m) in response to the monosyllabic sound /be/ were recorded and analyzed under three different visual stimulus conditions, the moving image of the same speaker’s face uttering /be/ (congruent visual stimuli) or uttering /ge/ (incongruent visual stimuli), and visual noise (still image processed from speaker’s face using a strong Gaussian filter: control condition). On average, latency of N100m was significantly shortened in the bilateral hemispheres for both congruent and incongruent auditory/visual (A/V) stimuli, compared to the control A/V condition. However, the degree of N100m shortening was not significantly different between the congruent and incongruent A/V conditions, despite the significant differences in psychophysical responses between these two A/V conditions. Moreover, analysis of the magnitudes of these visual effects on AEFs in individuals showed that the lip-reading effects on AEFs tended to be well correlated between the two different audio-visual conditions (congruent vs. incongruent visual stimuli) in the bilateral hemispheres but were not significantly correlated between right and left hemisphere. On the other hand, no significant correlation was observed between the magnitudes of visual speech effects and psychophysical responses. These results may indicate that the auditory-visual interaction observed on the N100m is a fundamental process which does not depend on the congruency of the visual information. PMID:28141836
Interactions between auditory and visual semantic stimulus classes: evidence for common processing networks for speech and body actions.

PubMed

Meyer, Georg F; Greenlee, Mark; Wuerger, Sophie

2011-09-01

Incongruencies between auditory and visual signals negatively affect human performance and cause selective activation in neuroimaging studies; therefore, they are increasingly used to probe audiovisual integration mechanisms. An open question is whether the increased BOLD response reflects computational demands in integrating mismatching low-level signals or reflects simultaneous unimodal conceptual representations of the competing signals. To address this question, we explore the effect of semantic congruency within and across three signal categories (speech, body actions, and unfamiliar patterns) for signals with matched low-level statistics. In a localizer experiment, unimodal (auditory and visual) and bimodal stimuli were used to identify ROIs. All three semantic categories cause overlapping activation patterns. We find no evidence for areas that show greater BOLD response to bimodal stimuli than predicted by the sum of the two unimodal responses. Conjunction analysis of the unimodal responses in each category identifies a network including posterior temporal, inferior frontal, and premotor areas. Semantic congruency effects are measured in the main experiment. We find that incongruent combinations of two meaningful stimuli (speech and body actions) but not combinations of meaningful with meaningless stimuli lead to increased BOLD response in the posterior STS (pSTS) bilaterally, the left SMA, the inferior frontal gyrus, the inferior parietal lobule, and the anterior insula. These interactions are not seen in premotor areas. Our findings are consistent with the hypothesis that pSTS and frontal areas form a recognition network that combines sensory categorical representations (in pSTS) with action hypothesis generation in inferior frontal gyrus/premotor areas. We argue that the same neural networks process speech and body actions.
Electrocortical Dynamics in Children with a Language-Learning Impairment Before and After Audiovisual Training.

PubMed

Heim, Sabine; Choudhury, Naseem; Benasich, April A

2016-05-01

Detecting and discriminating subtle and rapid sound changes in the speech environment is a fundamental prerequisite of language processing, and deficits in this ability have frequently been observed in individuals with language-learning impairments (LLI). One approach to studying associations between dysfunctional auditory dynamics and LLI, is to implement a training protocol tapping into this potential while quantifying pre- and post-intervention status. Event-related potentials (ERPs) are highly sensitive to the brain correlates of these dynamic changes and are therefore ideally suited for examining hypotheses regarding dysfunctional auditory processes. In this study, ERP measurements to rapid tone sequences (standard and deviant tone pairs) along with behavioral language testing were performed in 6- to 9-year-old LLI children (n = 21) before and after audiovisual training. A non-treatment group of children with typical language development (n = 12) was also assessed twice at a comparable time interval. The results indicated that the LLI group exhibited considerable gains on standardized measures of language. In terms of ERPs, we found evidence of changes in the LLI group specifically at the level of the P2 component, later than 250 ms after the onset of the second stimulus in the deviant tone pair. These changes suggested enhanced discrimination of deviant from standard tone sequences in widespread cortices, in LLI children after training.
Cue Integration in Categorical Tasks: Insights from Audio-Visual Speech Perception

PubMed Central

Bejjanki, Vikranth Rao; Clayards, Meghan; Knill, David C.; Aslin, Richard N.

2011-01-01

Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks. PMID:21637344
On the role of crossmodal prediction in audiovisual emotion perception.

PubMed

Jessen, Sarah; Kotz, Sonja A

2013-01-01

Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others' emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of cross-modal prediction. In emotion perception, as in most other settings, visual information precedes the auditory information. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, so far it has not been addressed in audiovisual emotion perception. Based on the current state of the art in (a) cross-modal prediction and (b) multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG) and magnetoencephalographic (MEG) studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow more reliable predicting of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 EEG response and the duration of visual emotional, but not non-emotional information. If the assumption that emotional content allows more reliable predicting can be corroborated in future studies, cross-modal prediction is a crucial factor in our understanding of multisensory emotion perception.
A General Audiovisual Temporal Processing Deficit in Adult Readers with Dyslexia

ERIC Educational Resources Information Center

Francisco, Ana A.; Jesse, Alexandra; Groen, Margriet A.; McQueen, James M.

2017-01-01

Purpose: Because reading is an audiovisual process, reading impairment may reflect an audiovisual processing deficit. The aim of the present study was to test the existence and scope of such a deficit in adult readers with dyslexia. Method: We tested 39 typical readers and 51 adult readers with dyslexia on their sensitivity to the simultaneity of…
Audiovisual alignment of co-speech gestures to speech supports word learning in 2-year-olds.

PubMed

Jesse, Alexandra; Johnson, Elizabeth K

2016-05-01

Analyses of caregiver-child communication suggest that an adult tends to highlight objects in a child's visual scene by moving them in a manner that is temporally aligned with the adult's speech productions. Here, we used the looking-while-listening paradigm to examine whether 25-month-olds use audiovisual temporal alignment to disambiguate and learn novel word-referent mappings in a difficult word-learning task. Videos of two equally interesting and animated novel objects were simultaneously presented to children, but the movement of only one of the objects was aligned with an accompanying object-labeling audio track. No social cues (e.g., pointing, eye gaze, touch) were available to the children because the speaker was edited out of the videos. Immediately afterward, toddlers were presented with still images of the two objects and asked to look at one or the other. Toddlers looked reliably longer to the labeled object, demonstrating their acquisition of the novel word-referent mapping. A control condition showed that children's performance was not solely due to the single unambiguous labeling that had occurred at experiment onset. We conclude that the temporal link between a speaker's utterances and the motion they imposed on the referent object helps toddlers to deduce a speaker's intended reference in a difficult word-learning scenario. In combination with our previous work, these findings suggest that intersensory redundancy is a source of information used by language users of all ages. That is, intersensory redundancy is not just a word-learning tool used by young infants. Copyright © 2015 Elsevier Inc. All rights reserved.
Audiovisual Processing in Children with and without Autism Spectrum Disorders

ERIC Educational Resources Information Center

Mongillo, Elizabeth A.; Irwin, Julia R.; Whalen, D. H.; Klaiman, Cheryl; Carter, Alice S.; Schultz, Robert T.

2008-01-01

Fifteen children with autism spectrum disorders (ASD) and twenty-one children without ASD completed six perceptual tasks designed to characterize the nature of the audiovisual processing difficulties experienced by children with ASD. Children with ASD scored significantly lower than children without ASD on audiovisual tasks involving human faces…
Temporal processing of audiovisual stimuli is enhanced in musicians: evidence from magnetoencephalography (MEG).

PubMed

Lu, Yao; Paraskevopoulos, Evangelos; Herholz, Sibylle C; Kuchenbuch, Anja; Pantev, Christo

2014-01-01

Numerous studies have demonstrated that the structural and functional differences between professional musicians and non-musicians are not only found within a single modality, but also with regard to multisensory integration. In this study we have combined psychophysical with neurophysiological measurements investigating the processing of non-musical, synchronous or various levels of asynchronous audiovisual events. We hypothesize that long-term multisensory experience alters temporal audiovisual processing already at a non-musical stage. Behaviorally, musicians scored significantly better than non-musicians in judging whether the auditory and visual stimuli were synchronous or asynchronous. At the neural level, the statistical analysis for the audiovisual asynchronous response revealed three clusters of activations including the ACC and the SFG and two bilaterally located activations in IFG and STG in both groups. Musicians, in comparison to the non-musicians, responded to synchronous audiovisual events with enhanced neuronal activity in a broad left posterior temporal region that covers the STG, the insula and the Postcentral Gyrus. Musicians also showed significantly greater activation in the left Cerebellum, when confronted with an audiovisual asynchrony. Taken together, our MEG results form a strong indication that long-term musical training alters the basic audiovisual temporal processing already in an early stage (direct after the auditory N1 wave), while the psychophysical results indicate that musical training may also provide behavioral benefits in the accuracy of the estimates regarding the timing of audiovisual events.
Real-time speech-driven animation of expressive talking faces

NASA Astrophysics Data System (ADS)

Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli

2011-05-01

In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.
Le Niveau-Seuil Peut-Il Renouveler la Conception des Cours Audiovisuels pour Debutants? (Can a Threshold Level Renew the Concept of Audiovisual Courses for Beginners?)

ERIC Educational Resources Information Center

Courtillon-Leclercq, Janine; Papo, Eliane

1977-01-01

An attempt to show that a threshold level would furnish conditions for a renewed methodology and greater student creativity. The acquisition of communicative competence would be constructed around two types of activities: analysis of the conditions of speech production and systematization of two levels of grammar. (Text is in French.) (AMH)
Age-related audiovisual interactions in the superior colliculus of the rat.

PubMed

Costa, M; Piché, M; Lepore, F; Guillemot, J-P

2016-04-21

It is well established that multisensory integration is a functional characteristic of the superior colliculus that disambiguates external stimuli and therefore reduces the reaction times toward simple audiovisual targets in space. However, in a condition where a complex audiovisual stimulus is used, such as the optical flow in the presence of modulated audio signals, little is known about the processing of the multisensory integration in the superior colliculus. Furthermore, since visual and auditory deficits constitute hallmark signs during aging, we sought to gain some insight on whether audiovisual processes in the superior colliculus are altered with age. Extracellular single-unit recordings were conducted in the superior colliculus of anesthetized Sprague-Dawley adult (10-12 months) and aged (21-22 months) rats. Looming circular concentric sinusoidal (CCS) gratings were presented alone and in the presence of sinusoidally amplitude modulated white noise. In both groups of rats, two different audiovisual response interactions were encountered in the spatial domain: superadditive, and suppressive. In contrast, additive audiovisual interactions were found only in adult rats. Hence, superior colliculus audiovisual interactions were more numerous in adult rats (38%) than in aged rats (8%). These results suggest that intersensory interactions in the superior colliculus play an essential role in space processing toward audiovisual moving objects during self-motion. Moreover, aging has a deleterious effect on complex audiovisual interactions. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.
Audiovisual spoken word training can promote or impede auditory-only perceptual learning: prelingually deafened adults with late-acquired cochlear implants versus normal hearing adults

PubMed Central

Bernstein, Lynne E.; Eberhardt, Silvio P.; Auer, Edward T.

2014-01-01

Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training. PMID:25206344
Cholinergic Potentiation and Audiovisual Repetition-Imitation Therapy Improve Speech Production and Communication Deficits in a Person with Crossed Aphasia by Inducing Structural Plasticity in White Matter Tracts

PubMed Central

Berthier, Marcelo L.; De-Torres, Irene; Paredes-Pacheco, José; Roé-Vellvé, Núria; Thurnhofer-Hemsi, Karl; Torres-Prioris, María J.; Alfaro, Francisco; Moreno-Torres, Ignacio; López-Barroso, Diana; Dávila, Guadalupe

2017-01-01

Donepezil (DP), a cognitive-enhancing drug targeting the cholinergic system, combined with massed sentence repetition training augmented and speeded up recovery of speech production deficits in patients with chronic conduction aphasia and extensive left hemisphere infarctions (Berthier et al., 2014). Nevertheless, a still unsettled question is whether such improvements correlate with restorative structural changes in gray matter and white matter pathways mediating speech production. In the present study, we used pharmacological magnetic resonance imaging to study treatment-induced brain changes in gray matter and white matter tracts in a right-handed male with chronic conduction aphasia and a right subcortical lesion (crossed aphasia). A single-patient, open-label multiple-baseline design incorporating two different treatments and two post-treatment evaluations was used. The patient received an initial dose of DP (5 mg/day) which was maintained during 4 weeks and then titrated up to 10 mg/day and administered alone (without aphasia therapy) during 8 weeks (Endpoint 1). Thereafter, the drug was combined with an audiovisual repetition-imitation therapy (Look-Listen-Repeat, LLR) during 3 months (Endpoint 2). Language evaluations, diffusion weighted imaging (DWI), and voxel-based morphometry (VBM) were performed at baseline and at both endpoints in JAM and once in 21 healthy control males. Treatment with DP alone and combined with LLR therapy induced marked improvement in aphasia and communication deficits as well as in selected measures of connected speech production, and phrase repetition. The obtained gains in speech production remained well-above baseline scores even 4 months after ending combined therapy. Longitudinal DWI showed structural plasticity in the right frontal aslant tract and direct segment of the arcuate fasciculus with both interventions. VBM revealed no structural changes in other white matter tracts nor in cortical areas linked by these tracts. In conclusion, cholinergic potentiation alone and combined with a model-based aphasia therapy improved language deficits by promoting structural plastic changes in right white matter tracts. PMID:28659776
Audiovisual spoken word training can promote or impede auditory-only perceptual learning: prelingually deafened adults with late-acquired cochlear implants versus normal hearing adults.

PubMed

Bernstein, Lynne E; Eberhardt, Silvio P; Auer, Edward T

2014-01-01

Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training.
The effect of audiovisual and binaural listening on the acceptable noise level (ANL): establishing an ANL conceptual model.

PubMed

Wu, Yu-Hsiang; Stangl, Elizabeth; Pang, Carol; Zhang, Xuyang

2014-02-01

Little is known regarding the acoustic features of a stimulus used by listeners to determine the acceptable noise level (ANL). Features suggested by previous research include speech intelligibility (noise is unacceptable when it degrades speech intelligibility to a certain degree; the intelligibility hypothesis) and loudness (noise is unacceptable when the speech-to-noise loudness ratio is poorer than a certain level; the loudness hypothesis). The purpose of the study was to investigate if speech intelligibility or loudness is the criterion feature that determines ANL. To achieve this, test conditions were chosen so that the intelligibility and loudness hypotheses would predict different results. In Experiment 1, the effect of audiovisual (AV) and binaural listening on ANL was investigated; in Experiment 2, the effect of interaural correlation (ρ) on ANL was examined. A single-blinded, repeated-measures design was used. Thirty-two and twenty-five younger adults with normal hearing participated in Experiments 1 and 2, respectively. In Experiment 1, both ANL and speech recognition performance were measured using the AV version of the Connected Speech Test (CST) in three conditions: AV-binaural, auditory only (AO)-binaural, and AO-monaural. Lipreading skill was assessed using the Utley lipreading test. In Experiment 2, ANL and speech recognition performance were measured using the Hearing in Noise Test (HINT) in three binaural conditions, wherein the interaural correlation of noise was varied: ρ = 1 (N(o)S(o) [a listening condition wherein both speech and noise signals are identical across two ears]), -1 (NπS(o) [a listening condition wherein speech signals are identical across two ears whereas the noise signals of two ears are 180 degrees out of phase]), and 0 (N(u)S(o) [a listening condition wherein speech signals are identical across two ears whereas noise signals are uncorrelated across ears]). The results were compared to the predictions made based on the intelligibility and loudness hypotheses. The results of the AV and AO conditions appeared to support the intelligibility hypothesis due to the significant correlation between visual benefit in ANL (AV re: AO ANL) and (1) visual benefit in CST performance (AV re: AO CST) and (2) lipreading skill. The results of the N(o)S(o), NπS(o), and N(u)S(o) conditions negated the intelligibility hypothesis because binaural processing benefit (NπS(o) re: N(o)S(o), and N(u)S(o) re: N(o)S(o)) in ANL was not correlated to that in HINT performance. Instead, the results somewhat supported the loudness hypothesis because the pattern of ANL results across the three conditions (N(o)S(o) ≈ NπS(o) ≈ N(u)S(o) ANL) was more consistent with what was predicted by the loudness hypothesis (N(o)S(o) ≈ NπS(o) < N(u)S(o) ANL) than by the intelligibility hypothesis (NπS(o) < N(u)S(o) < N(o)S(o) ANL). The results of the binaural and monaural conditions supported neither hypothesis because (1) binaural benefit (binaural re: monaural) in ANL was not correlated to that in speech recognition performance, and (2) the pattern of ANL results across conditions (binaural < monaural ANL) was not consistent with the prediction made based on previous binaural loudness summation research (binaural ≥ monaural ANL). The study suggests that listeners may use multiple acoustic features to make ANL judgments. The binaural/monaural results showing that neither hypothesis was supported further indicate that factors other than speech intelligibility and loudness, such as psychological factors, may affect ANL. The weightings of different acoustic features in ANL judgments may vary widely across individuals and listening conditions. American Academy of Audiology.
Attention distributed across sensory modalities enhances perceptual performance

PubMed Central

Mishra, Jyoti; Gazzaley, Adam

2012-01-01

This study investigated the interaction between top-down attentional control and multisensory processing in humans. Using semantically congruent and incongruent audiovisual stimulus streams, we found target detection to be consistently improved in the setting of distributed audiovisual attention versus focused visual attention. This performance benefit was manifested as faster reaction times for congruent audiovisual stimuli, and as accuracy improvements for incongruent stimuli, resulting in a resolution of stimulus interference. Electrophysiological recordings revealed that these behavioral enhancements were associated with reduced neural processing of both auditory and visual components of the audiovisual stimuli under distributed vs. focused visual attention. These neural changes were observed at early processing latencies, within 100–300 ms post-stimulus onset, and localized to auditory, visual, and polysensory temporal cortices. These results highlight a novel neural mechanism for top-down driven performance benefits via enhanced efficacy of sensory neural processing during distributed audiovisual attention relative to focused visual attention. PMID:22933811

Neural Correlates of Audiovisual Integration of Semantic Category Information

ERIC Educational Resources Information Center

Hu, Zhonghua; Zhang, Ruiling; Zhang, Qinglin; Liu, Qiang; Li, Hong

2012-01-01

Previous studies have found a late frontal-central audiovisual interaction during the time period about 150-220 ms post-stimulus. However, it is unclear to which process is this audiovisual interaction related: to processing of acoustical features or to classification of stimuli? To investigate this question, event-related potentials were recorded…
Future-saving audiovisual content for Data Science: Preservation of geoinformatics video heritage with the TIB|AV-Portal

NASA Astrophysics Data System (ADS)

Löwe, Peter; Plank, Margret; Ziedorn, Frauke

2015-04-01

In data driven research, the access to citation and preservation of the full triad consisting of journal article, research data and -software has started to become good scientific practice. To foster the adoption of this practice the significance of software tools has to be acknowledged, which enable scientists to harness auxiliary audiovisual content in their research work. The advent of ubiquitous computer-based audiovisual recording and corresponding Web 2.0 hosting platforms like Youtube, Slideshare and GitHub has created new ecosystems for contextual information related to scientific software and data, which continues to grow both in size and variety of content. The current Web 2.0 platforms lack capabilities for long term archiving and scientific citation, such as persistent identifiers allowing to reference specific intervals of the overall content. The audiovisual content currently shared by scientists ranges from commented howto-demonstrations on software handling, installation and data-processing, to aggregated visual analytics of the evolution of software projects over time. Such content are crucial additions to the scientific message, as they ensure that software-based data-processing workflows can be assessed, understood and reused in the future. In the context of data driven research, such content needs to be accessible by effective search capabilities, enabling the content to be retrieved and ensuring that the content producers receive credit for their efforts within the scientific community. Improved multimedia archiving and retrieval services for scientific audiovisual content which meet these requirements are currently implemented by the scientific library community. This paper exemplifies the existing challenges, requirements, benefits and the potential of the preservation, accessibility and citability of such audiovisual content for the Open Source communities based on the new audiovisual web service TIB|AV Portal of the German National Library of Science and Technology. The web-based portal allows for extended search capabilities based on enhanced metadata derived by automated video analysis. By combining state-of-the-art multimedia retrieval techniques such as speech-, text-, and image recognition with semantic analysis, content-based access to videos at the segment level is provided. Further, by using the open standard Media Fragment Identifier (MFID), a citable Digital Object Identifier is displayed for each video segment. In addition to the continuously growing footprint of contemporary content, the importance of vintage audiovisual information needs to be considered: This paper showcases the successful application of the TIB|AV-Portal in the preservation and provision of a newly discovered version of a GRASS GIS promotional video produced by US Army -Corps of Enginers Laboratory (US-CERL) in 1987. The video is provides insight into the constraints of the very early days of the GRASS GIS project, which is the oldest active Free and Open Source Software (FOSS) GIS project which has been active for over thirty years. GRASS itself has turned into a collaborative scientific platform and a repository of scientific peer-reviewed code and algorithm/knowledge hub for future generation of scientists [1]. This is a reference case for future preservation activities regarding semantic-enhanced Web 2.0 content from geospatial software projects within Academia and beyond. References: [1] Chemin, Y., Petras V., Petrasova, A., Landa, M., Gebbert, S., Zambelli, P., Neteler, M., Löwe, P.: GRASS GIS: a peer-reviewed scientific platform and future research Repository, Geophysical Research Abstracts, Vol. 17, EGU2015-8314-1, 2015 (submitted)
Neural Mechanisms Underlying Cross-Modal Phonetic Encoding.

PubMed

Shahin, Antoine J; Backer, Kristina C; Rosenblum, Lawrence D; Kerlin, Jess R

2018-02-14

Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ ( illusion-fa ), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ ( illusion-ba ), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived illusion-ba , and a reduced N1 when they perceived illusion-fa , mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex. SIGNIFICANCE STATEMENT The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator). Copyright © 2018 the authors 0270-6474/18/381835-15$15.00/0.
Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

PubMed

Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D; Senn, Pascal

2013-01-01

To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0-500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Higher frame rate (>7 fps), higher camera resolution (>640 × 480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Webcameras have the potential to improve telecommunication of hearing-impaired individuals.
Temporal Processing of Audiovisual Stimuli Is Enhanced in Musicians: Evidence from Magnetoencephalography (MEG)

PubMed Central

Lu, Yao; Paraskevopoulos, Evangelos; Herholz, Sibylle C.; Kuchenbuch, Anja; Pantev, Christo

2014-01-01

Numerous studies have demonstrated that the structural and functional differences between professional musicians and non-musicians are not only found within a single modality, but also with regard to multisensory integration. In this study we have combined psychophysical with neurophysiological measurements investigating the processing of non-musical, synchronous or various levels of asynchronous audiovisual events. We hypothesize that long-term multisensory experience alters temporal audiovisual processing already at a non-musical stage. Behaviorally, musicians scored significantly better than non-musicians in judging whether the auditory and visual stimuli were synchronous or asynchronous. At the neural level, the statistical analysis for the audiovisual asynchronous response revealed three clusters of activations including the ACC and the SFG and two bilaterally located activations in IFG and STG in both groups. Musicians, in comparison to the non-musicians, responded to synchronous audiovisual events with enhanced neuronal activity in a broad left posterior temporal region that covers the STG, the insula and the Postcentral Gyrus. Musicians also showed significantly greater activation in the left Cerebellum, when confronted with an audiovisual asynchrony. Taken together, our MEG results form a strong indication that long-term musical training alters the basic audiovisual temporal processing already in an early stage (direct after the auditory N1 wave), while the psychophysical results indicate that musical training may also provide behavioral benefits in the accuracy of the estimates regarding the timing of audiovisual events. PMID:24595014
Naturalistic FMRI mapping reveals superior temporal sulcus as the hub for the distributed brain network for social perception.

PubMed

Lahnakoski, Juha M; Glerean, Enrico; Salmi, Juha; Jääskeläinen, Iiro P; Sams, Mikko; Hari, Riitta; Nummenmaa, Lauri

2012-01-01

Despite the abundant data on brain networks processing static social signals, such as pictures of faces, the neural systems supporting social perception in naturalistic conditions are still poorly understood. Here we delineated brain networks subserving social perception under naturalistic conditions in 19 healthy humans who watched, during 3-T functional magnetic resonance imaging (fMRI), a set of 137 short (approximately 16 s each, total 27 min) audiovisual movie clips depicting pre-selected social signals. Two independent raters estimated how well each clip represented eight social features (faces, human bodies, biological motion, goal-oriented actions, emotion, social interaction, pain, and speech) and six filler features (places, objects, rigid motion, people not in social interaction, non-goal-oriented action, and non-human sounds) lacking social content. These ratings were used as predictors in the fMRI analysis. The posterior superior temporal sulcus (STS) responded to all social features but not to any non-social features, and the anterior STS responded to all social features except bodies and biological motion. We also found four partially segregated, extended networks for processing of specific social signals: (1) a fronto-temporal network responding to multiple social categories, (2) a fronto-parietal network preferentially activated to bodies, motion, and pain, (3) a temporo-amygdalar network responding to faces, social interaction, and speech, and (4) a fronto-insular network responding to pain, emotions, social interactions, and speech. Our results highlight the role of the pSTS in processing multiple aspects of social information, as well as the feasibility and efficiency of fMRI mapping under conditions that resemble the complexity of real life.
The influence of visual speech information on the intelligibility of English consonants produced by non-native speakers.

PubMed

Kawase, Saya; Hannah, Beverly; Wang, Yue

2014-09-01

This study examines how visual speech information affects native judgments of the intelligibility of speech sounds produced by non-native (L2) speakers. Native Canadian English perceivers as judges perceived three English phonemic contrasts (/b-v, θ-s, l-ɹ/) produced by native Japanese speakers as well as native Canadian English speakers as controls. These stimuli were presented under audio-visual (AV, with speaker voice and face), audio-only (AO), and visual-only (VO) conditions. The results showed that, across conditions, the overall intelligibility of Japanese productions of the native (Japanese)-like phonemes (/b, s, l/) was significantly higher than the non-Japanese phonemes (/v, θ, ɹ/). In terms of visual effects, the more visually salient non-Japanese phonemes /v, θ/ were perceived as significantly more intelligible when presented in the AV compared to the AO condition, indicating enhanced intelligibility when visual speech information is available. However, the non-Japanese phoneme /ɹ/ was perceived as less intelligible in the AV compared to the AO condition. Further analysis revealed that, unlike the native English productions, the Japanese speakers produced /ɹ/ without visible lip-rounding, indicating that non-native speakers' incorrect articulatory configurations may decrease the degree of intelligibility. These results suggest that visual speech information may either positively or negatively affect L2 speech intelligibility.
A link between individual differences in multisensory speech perception and eye movements

PubMed Central

Gurler, Demet; Doyle, Nathan; Walker, Edgar; Magnotti, John; Beauchamp, Michael

2015-01-01

The McGurk effect is an illusion in which visual speech information dramatically alters the perception of auditory speech. However, there is a high degree of individual variability in how frequently the illusion is perceived: some individuals almost always perceive the McGurk effect, while others rarely do. Another axis of individual variability is the pattern of eye movements make while viewing a talking face: some individuals often fixate the mouth of the talker, while others rarely do. Since the talker's mouth carries the visual speech necessary information to induce the McGurk effect, we hypothesized that individuals who frequently perceive the McGurk effect should spend more time fixating the talker's mouth. We used infrared eye tracking to study eye movements as 40 participants viewed audiovisual speech. Frequent perceivers of the McGurk effect were more likely to fixate the mouth of the talker, and there was a significant correlation between McGurk frequency and mouth looking time. The noisy encoding of disparity model of McGurk perception showed that individuals who frequently fixated the mouth had lower sensory noise and higher disparity thresholds than those who rarely fixated the mouth. Differences in eye movements when viewing the talker's face may be an important contributor to interindividual differences in multisensory speech perception. PMID:25810157
An investigation of the effects of a speech-restructuring treatment for stuttering on the distribution of intervals of phonation.

PubMed

Brown, Lisa; Wilson, Linda; Packman, Ann; Halaki, Mark; Onslow, Mark; Menzies, Ross

2016-12-01

The purpose of this study was to investigate whether stuttering reductions following the instatement phase of a speech-restructuring treatment for adults were accompanied by reductions in the frequency of short intervals of phonation (PIs). The study was prompted by the possibility that reductions in the frequency of short PIs is the mechanism underlying such reductions in stuttering. The distribution of PIs was determined for seven adults who stutter, before and immediately after the intensive phase of a speech-restructuring treatment program. Audiovisual recordings of conversational speech were made on both assessment occasions, with PIs recorded with an accelerometer. All seven participants had much lower levels of stuttering after treatment but these were associated with reductions in the frequency of short PIs for only four of them. For the other three participants, two showed no change in frequency of short PIs, while for the other participant the frequency of short PIs actually increased. Stuttering reduction with speech-restructuring treatment can co-occur with reduction in the frequency of short PIs. However, the latter does not appear necessary for this reduction in stuttering to occur. Thus, speech-restructuring treatment must have other, or additional, treatment agents for stuttering to reduce. Copyright © 2016 Elsevier Inc. All rights reserved.
Psychophysics of the McGurk and Other Audiovisual Speech Integration Effects

PubMed Central

Jiang, Jintao; Bernstein, Lynne E.

2011-01-01

When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called McGurk effect), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for the distribution of types of perceptual responses to 384 different stimuli from four talkers. The measures included mutual information, the presented acoustic signal versus the acoustic signal recorded with the presented video, and the correlation between the presented acoustic and video stimuli. In Experiment 1, open-set perceptual responses were obtained for acoustic /bA/ or /lA/ dubbed to video /bA, dA, gA, vA, zA, lA, wA, ΔA/. The talker, the video syllable, and the acoustic syllable significantly influenced the type of response. In Experiment 2, the best predictors of response category proportions were a subset of the physical stimulus measures, with the variance accounted for in the perceptual response category proportions between 17% and 52%. That audiovisual stimulus relationships can account for response distributions supports the possibility that internal representations are based on modality-specific stimulus relationships. PMID:21574741
Audio-video feature correlation: faces and speech

NASA Astrophysics Data System (ADS)

Durand, Gwenael; Montacie, Claude; Caraty, Marie-Jose; Faudemay, Pascal

1999-08-01

This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.
Long-term music training modulates the recalibration of audiovisual simultaneity.

PubMed

Jicol, Crescent; Proulx, Michael J; Pollick, Frank E; Petrini, Karin

2018-07-01

To overcome differences in physical transmission time and neural processing, the brain adaptively recalibrates the point of simultaneity between auditory and visual signals by adapting to audiovisual asynchronies. Here, we examine whether the prolonged recalibration process of passively sensed visual and auditory signals is affected by naturally occurring multisensory training known to enhance audiovisual perceptual accuracy. Hence, we asked a group of drummers, of non-drummer musicians and of non-musicians to judge the audiovisual simultaneity of musical and non-musical audiovisual events, before and after adaptation with two fixed audiovisual asynchronies. We found that the recalibration for the musicians and drummers was in the opposite direction (sound leading vision) to that of non-musicians (vision leading sound), and change together with both increased music training and increased perceptual accuracy (i.e. ability to detect asynchrony). Our findings demonstrate that long-term musical training reshapes the way humans adaptively recalibrate simultaneity between auditory and visual signals.
Internet Video Telephony Allows Speech Reading by Deaf Individuals and Improves Speech Perception by Cochlear Implant Users

PubMed Central

Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D.; Senn, Pascal

2013-01-01

Objective To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Methods Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280×720, 640×480, 320×240, 160×120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0–500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Results Higher frame rate (>7 fps), higher camera resolution (>640×480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Conclusion Webcameras have the potential to improve telecommunication of hearing-impaired individuals. PMID:23359119
The development of the Athens Emotional States Inventory (AESI): collection, validation and automatic processing of emotionally loaded sentences.

PubMed

Chaspari, Theodora; Soldatos, Constantin; Maragos, Petros

2015-01-01

The development of ecologically valid procedures for collecting reliable and unbiased emotional data towards computer interfaces with social and affective intelligence targeting patients with mental disorders. Following its development, presented with, the Athens Emotional States Inventory (AESI) proposes the design, recording and validation of an audiovisual database for five emotional states: anger, fear, joy, sadness and neutral. The items of the AESI consist of sentences each having content indicative of the corresponding emotion. Emotional content was assessed through a survey of 40 young participants with a questionnaire following the Latin square design. The emotional sentences that were correctly identified by 85% of the participants were recorded in a soundproof room with microphones and cameras. A preliminary validation of AESI is performed through automatic emotion recognition experiments from speech. The resulting database contains 696 recorded utterances in Greek language by 20 native speakers and has a total duration of approximately 28 min. Speech classification results yield accuracy up to 75.15% for automatically recognizing the emotions in AESI. These results indicate the usefulness of our approach for collecting emotional data with reliable content, balanced across classes and with reduced environmental variability.
Neurofunctional Underpinnings of Audiovisual Emotion Processing in Teens with Autism Spectrum Disorders

PubMed Central

Doyle-Thomas, Krissy A.R.; Goldberg, Jeremy; Szatmari, Peter; Hall, Geoffrey B.C.

2013-01-01

Despite successful performance on some audiovisual emotion tasks, hypoactivity has been observed in frontal and temporal integration cortices in individuals with autism spectrum disorders (ASD). Little is understood about the neurofunctional network underlying this ability in individuals with ASD. Research suggests that there may be processing biases in individuals with ASD, based on their ability to obtain meaningful information from the face and/or the voice. This functional magnetic resonance imaging study examined brain activity in teens with ASD (n = 18) and typically developing controls (n = 16) during audiovisual and unimodal emotion processing. Teens with ASD had a significantly lower accuracy when matching an emotional face to an emotion label. However, no differences in accuracy were observed between groups when matching an emotional voice or face-voice pair to an emotion label. In both groups brain activity during audiovisual emotion matching differed significantly from activity during unimodal emotion matching. Between-group analyses of audiovisual processing revealed significantly greater activation in teens with ASD in a parietofrontal network believed to be implicated in attention, goal-directed behaviors, and semantic processing. In contrast, controls showed greater activity in frontal and temporal association cortices during this task. These results suggest that in the absence of engaging integrative emotional networks during audiovisual emotion matching, teens with ASD may have recruited the parietofrontal network as an alternate compensatory system. PMID:23750139
Musical expertise is related to altered functional connectivity during audiovisual integration

PubMed Central

Paraskevopoulos, Evangelos; Kraneburg, Anja; Herholz, Sibylle Cornelia; Bamidis, Panagiotis D.; Pantev, Christo

2015-01-01

The present study investigated the cortical large-scale functional network underpinning audiovisual integration via magnetoencephalographic recordings. The reorganization of this network related to long-term musical training was investigated by comparing musicians to nonmusicians. Connectivity was calculated on the basis of the estimated mutual information of the sources’ activity, and the corresponding networks were statistically compared. Nonmusicians’ results indicated that the cortical network associated with audiovisual integration supports visuospatial processing and attentional shifting, whereas a sparser network, related to spatial awareness supports the identification of audiovisual incongruences. In contrast, musicians’ results showed enhanced connectivity in regions related to the identification of auditory pattern violations. Hence, nonmusicians rely on the processing of visual clues for the integration of audiovisual information, whereas musicians rely mostly on the corresponding auditory information. The large-scale cortical network underpinning multisensory integration is reorganized due to expertise in a cognitive domain that largely involves audiovisual integration, indicating long-term training-related neuroplasticity. PMID:26371305
How actions shape perception: learning action-outcome relations and predicting sensory outcomes promote audio-visual temporal binding

PubMed Central

Desantis, Andrea; Haggard, Patrick

2016-01-01

To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events. PMID:27982063
How actions shape perception: learning action-outcome relations and predicting sensory outcomes promote audio-visual temporal binding.

PubMed

Desantis, Andrea; Haggard, Patrick

2016-12-16

To maintain a temporally-unified representation of audio and visual features of objects in our environment, the brain recalibrates audio-visual simultaneity. This process allows adjustment for both differences in time of transmission and time for processing of audio and visual signals. In four experiments, we show that the cognitive processes for controlling instrumental actions also have strong influence on audio-visual recalibration. Participants learned that right and left hand button-presses each produced a specific audio-visual stimulus. Following one action the audio preceded the visual stimulus, while for the other action audio lagged vision. In a subsequent test phase, left and right button-press generated either the same audio-visual stimulus as learned initially, or the pair associated with the other action. We observed recalibration of simultaneity only for previously-learned audio-visual outcomes. Thus, learning an action-outcome relation promotes temporal grouping of the audio and visual events within the outcome pair, contributing to the creation of a temporally unified multisensory object. This suggests that learning action-outcome relations and the prediction of perceptual outcomes can provide an integrative temporal structure for our experiences of external events.
Semantic-based crossmodal processing during visual suppression.

PubMed

Cox, Dustin; Hong, Sang Wook

2015-01-01

To reveal the mechanisms underpinning the influence of auditory input on visual awareness, we examine, (1) whether purely semantic-based multisensory integration facilitates the access to visual awareness for familiar visual events, and (2) whether crossmodal semantic priming is the mechanism responsible for the semantic auditory influence on visual awareness. Using continuous flash suppression, we rendered dynamic and familiar visual events (e.g., a video clip of an approaching train) inaccessible to visual awareness. We manipulated the semantic auditory context of the videos by concurrently pairing them with a semantically matching soundtrack (congruent audiovisual condition), a semantically non-matching soundtrack (incongruent audiovisual condition), or with no soundtrack (neutral video-only condition). We found that participants identified the suppressed visual events significantly faster (an earlier breakup of suppression) in the congruent audiovisual condition compared to the incongruent audiovisual condition and video-only condition. However, this facilitatory influence of semantic auditory input was only observed when audiovisual stimulation co-occurred. Our results suggest that the enhanced visual processing with a semantically congruent auditory input occurs due to audiovisual crossmodal processing rather than semantic priming, which may occur even when visual information is not available to visual awareness.
Skill dependent audiovisual integration in the fusiform induces repetition suppression.

PubMed

McNorgan, Chris; Booth, James R

2015-02-01

Learning to read entails mapping existing phonological representations to novel orthographic representations and is thus an ideal context for investigating experience driven audiovisual integration. Because two dominant brain-based theories of reading development hinge on the sensitivity of the visual-object processing stream to phonological information, we were interested in how reading skill relates to audiovisual integration in this area. Thirty-two children between 8 and 13 years of age spanning a range of reading skill participated in a functional magnetic resonance imaging experiment. Participants completed a rhyme judgment task to word pairs presented unimodally (auditory- or visual-only) and cross-modally (auditory followed by visual). Skill-dependent sub-additive audiovisual modulation was found in left fusiform gyrus, extending into the putative visual word form area, and was correlated with behavioral orthographic priming. These results suggest learning to read promotes facilitatory audiovisual integration in the ventral visual-object processing stream and may optimize this region for orthographic processing. Copyright © 2014 Elsevier Inc. All rights reserved.

Skill Dependent Audiovisual Integration in the Fusiform Induces Repetition Suppression

PubMed Central

McNorgan, Chris; Booth, James R.

2015-01-01

Learning to read entails mapping existing phonological representations to novel orthographic representations and is thus an ideal context for investigating experience driven audiovisual integration. Because two dominant brain-based theories of reading development hinge on the sensitivity of the visual-object processing stream to phonological information, we were interested in how reading skill relates to audiovisual integration in this area. Thirty-two children between 8 and 13 years of age spanning a range of reading skill participated in a functional magnetic resonance imaging experiment. Participants completed a rhyme judgment task to word pairs presented unimodally (auditory- or visual-only) and cross-modally (auditory followed by visual). Skill-dependent sub-additive audiovisual modulation was found in left fusiform gyrus, extending into the putative visual word form area, and was correlated with behavioral orthographic priming. These results suggest learning to read promotes facilitatory audiovisual integration in the ventral visual-object processing stream and may optimize this region for orthographic processing. PMID:25585276
Audiovisual Interval Size Estimation Is Associated with Early Musical Training.

PubMed

Abel, Mary Kathryn; Li, H Charles; Russo, Frank A; Schlaug, Gottfried; Loui, Psyche

2016-01-01

Although pitch is a fundamental attribute of auditory perception, substantial individual differences exist in our ability to perceive differences in pitch. Little is known about how these individual differences in the auditory modality might affect crossmodal processes such as audiovisual perception. In this study, we asked whether individual differences in pitch perception might affect audiovisual perception, as it relates to age of onset and number of years of musical training. Fifty-seven subjects made subjective ratings of interval size when given point-light displays of audio, visual, and audiovisual stimuli of sung intervals. Audiovisual stimuli were divided into congruent and incongruent (audiovisual-mismatched) stimuli. Participants' ratings correlated strongly with interval size in audio-only, visual-only, and audiovisual-congruent conditions. In the audiovisual-incongruent condition, ratings correlated more with audio than with visual stimuli, particularly for subjects who had better pitch perception abilities and higher nonverbal IQ scores. To further investigate the effects of age of onset and length of musical training, subjects were divided into musically trained and untrained groups. Results showed that among subjects with musical training, the degree to which participants' ratings correlated with auditory interval size during incongruent audiovisual perception was correlated with both nonverbal IQ and age of onset of musical training. After partialing out nonverbal IQ, pitch discrimination thresholds were no longer associated with incongruent audio scores, whereas age of onset of musical training remained associated with incongruent audio scores. These findings invite future research on the developmental effects of musical training, particularly those relating to the process of audiovisual perception.
Audiovisual Interval Size Estimation Is Associated with Early Musical Training

PubMed Central

Abel, Mary Kathryn; Li, H. Charles; Russo, Frank A.; Schlaug, Gottfried; Loui, Psyche

2016-01-01

Although pitch is a fundamental attribute of auditory perception, substantial individual differences exist in our ability to perceive differences in pitch. Little is known about how these individual differences in the auditory modality might affect crossmodal processes such as audiovisual perception. In this study, we asked whether individual differences in pitch perception might affect audiovisual perception, as it relates to age of onset and number of years of musical training. Fifty-seven subjects made subjective ratings of interval size when given point-light displays of audio, visual, and audiovisual stimuli of sung intervals. Audiovisual stimuli were divided into congruent and incongruent (audiovisual-mismatched) stimuli. Participants’ ratings correlated strongly with interval size in audio-only, visual-only, and audiovisual-congruent conditions. In the audiovisual-incongruent condition, ratings correlated more with audio than with visual stimuli, particularly for subjects who had better pitch perception abilities and higher nonverbal IQ scores. To further investigate the effects of age of onset and length of musical training, subjects were divided into musically trained and untrained groups. Results showed that among subjects with musical training, the degree to which participants’ ratings correlated with auditory interval size during incongruent audiovisual perception was correlated with both nonverbal IQ and age of onset of musical training. After partialing out nonverbal IQ, pitch discrimination thresholds were no longer associated with incongruent audio scores, whereas age of onset of musical training remained associated with incongruent audio scores. These findings invite future research on the developmental effects of musical training, particularly those relating to the process of audiovisual perception. PMID:27760134
How sensory-motor systems impact the neural organization for language: direct contrasts between spoken and signed language

PubMed Central

Emmorey, Karen; McCullough, Stephen; Mehta, Sonya; Grabowski, Thomas J.

2014-01-01

To investigate the impact of sensory-motor systems on the neural organization for language, we conducted an H215O-PET study of sign and spoken word production (picture-naming) and an fMRI study of sign and audio-visual spoken language comprehension (detection of a semantically anomalous sentence) with hearing bilinguals who are native users of American Sign Language (ASL) and English. Directly contrasting speech and sign production revealed greater activation in bilateral parietal cortex for signing, while speaking resulted in greater activation in bilateral superior temporal cortex (STC) and right frontal cortex, likely reflecting auditory feedback control. Surprisingly, the language production contrast revealed a relative increase in activation in bilateral occipital cortex for speaking. We speculate that greater activation in visual cortex for speaking may actually reflect cortical attenuation when signing, which functions to distinguish self-produced from externally generated visual input. Directly contrasting speech and sign comprehension revealed greater activation in bilateral STC for speech and greater activation in bilateral occipital-temporal cortex for sign. Sign comprehension, like sign production, engaged bilateral parietal cortex to a greater extent than spoken language. We hypothesize that posterior parietal activation in part reflects processing related to spatial classifier constructions in ASL and that anterior parietal activation may reflect covert imitation that functions as a predictive model during sign comprehension. The conjunction analysis for comprehension revealed that both speech and sign bilaterally engaged the inferior frontal gyrus (with more extensive activation on the left) and the superior temporal sulcus, suggesting an invariant bilateral perisylvian language system. We conclude that surface level differences between sign and spoken languages should not be dismissed and are critical for understanding the neurobiology of language. PMID:24904497
Effects of audio-visual presentation of target words in word translation training

NASA Astrophysics Data System (ADS)

Akahane-Yamada, Reiko; Komaki, Ryo; Kubo, Rieko

2004-05-01

Komaki and Akahane-Yamada (Proc. ICA2004) used 2AFC translation task in vocabulary training, in which the target word is presented visually in orthographic form of one language, and the appropriate meaning in another language has to be chosen between two choices. Present paper examined the effect of audio-visual presentation of target word when native speakers of Japanese learn to translate English words into Japanese. Pairs of English words contrasted in several phonemic distinctions (e.g., /r/-/l/, /b/-/v/, etc.) were used as word materials, and presented in three conditions; visual-only (V), audio-only (A), and audio-visual (AV) presentations. Identification accuracy of those words produced by two talkers was also assessed. During pretest, the accuracy for A stimuli was lowest, implying that insufficient translation ability and listening ability interact with each other when aurally presented word has to be translated. However, there was no difference in accuracy between V and AV stimuli, suggesting that participants translate the words depending on visual information only. The effect of translation training using AV stimuli did not transfer to identification ability, showing that additional audio information during translation does not help improve speech perception. Further examination is necessary to determine the effective L2 training method. [Work supported by TAO, Japan.
Beta-Band Functional Connectivity Influences Audiovisual Integration in Older Age: An EEG Study

PubMed Central

Wang, Luyao; Wang, Wenhui; Yan, Tianyi; Song, Jiayong; Yang, Weiping; Wang, Bin; Go, Ritsu; Huang, Qiang; Wu, Jinglong

2017-01-01

Audiovisual integration occurs frequently and has been shown to exhibit age-related differences via behavior experiments or time-frequency analyses. In the present study, we examined whether functional connectivity influences audiovisual integration during normal aging. Visual, auditory, and audiovisual stimuli were randomly presented peripherally; during this time, participants were asked to respond immediately to the target stimulus. Electroencephalography recordings captured visual, auditory, and audiovisual processing in 12 old (60–78 years) and 12 young (22–28 years) male adults. For non-target stimuli, we focused on alpha (8–13 Hz), beta (13–30 Hz), and gamma (30–50 Hz) bands. We applied the Phase Lag Index to study the dynamics of functional connectivity. Then, the network topology parameters, which included the clustering coefficient, path length, small-worldness global efficiency, local efficiency and degree, were calculated for each condition. For the target stimulus, a race model was used to analyze the response time. Then, a Pearson correlation was used to test the relationship between each network topology parameters and response time. The results showed that old adults activated stronger connections during audiovisual processing in the beta band. The relationship between network topology parameters and the performance of audiovisual integration was detected only in old adults. Thus, we concluded that old adults who have a higher load during audiovisual integration need more cognitive resources. Furthermore, increased beta band functional connectivity influences the performance of audiovisual integration during normal aging. PMID:28824411
Beta-Band Functional Connectivity Influences Audiovisual Integration in Older Age: An EEG Study.

PubMed

Wang, Luyao; Wang, Wenhui; Yan, Tianyi; Song, Jiayong; Yang, Weiping; Wang, Bin; Go, Ritsu; Huang, Qiang; Wu, Jinglong

2017-01-01

Audiovisual integration occurs frequently and has been shown to exhibit age-related differences via behavior experiments or time-frequency analyses. In the present study, we examined whether functional connectivity influences audiovisual integration during normal aging. Visual, auditory, and audiovisual stimuli were randomly presented peripherally; during this time, participants were asked to respond immediately to the target stimulus. Electroencephalography recordings captured visual, auditory, and audiovisual processing in 12 old (60-78 years) and 12 young (22-28 years) male adults. For non-target stimuli, we focused on alpha (8-13 Hz), beta (13-30 Hz), and gamma (30-50 Hz) bands. We applied the Phase Lag Index to study the dynamics of functional connectivity. Then, the network topology parameters, which included the clustering coefficient, path length, small-worldness global efficiency, local efficiency and degree, were calculated for each condition. For the target stimulus, a race model was used to analyze the response time. Then, a Pearson correlation was used to test the relationship between each network topology parameters and response time. The results showed that old adults activated stronger connections during audiovisual processing in the beta band. The relationship between network topology parameters and the performance of audiovisual integration was detected only in old adults. Thus, we concluded that old adults who have a higher load during audiovisual integration need more cognitive resources. Furthermore, increased beta band functional connectivity influences the performance of audiovisual integration during normal aging.
Knowledge Generated by Audiovisual Narrative Action Research Loops

ERIC Educational Resources Information Center

Bautista Garcia-Vera, Antonio

2012-01-01

We present data collected from the research project funded by the Ministry of Education and Science of Spain entitled "Audiovisual Narratives and Intercultural Relations in Education." One of the aims of the research was to determine the nature of thought processes occurring during audiovisual narratives. We studied the possibility of…
Use of Audiovisual Texts in University Education Process

ERIC Educational Resources Information Center

Aleksandrov, Evgeniy P.

2014-01-01

Audio-visual learning technologies offer great opportunities in the development of students' analytical and projective abilities. These technologies can be used in classroom activities and for homework. This article discusses the features of audiovisual media texts use in a series of social sciences and humanities in the University curriculum.
The Audio-Visual Marketing Handbook for Independent Schools.

ERIC Educational Resources Information Center

Griffith, Tom

This how-to booklet offers specific advice on producing video or slide/tape programs for marketing independent schools. Five chapters present guidelines for various stages in the process: (1) Audio-Visual Marketing in Context (aesthetics and economics of audiovisual marketing); (2) A Question of Identity (identifying the audience and deciding on…
Infants in cocktail parties

NASA Astrophysics Data System (ADS)

Newman, Rochelle S.

2003-04-01

Most work on listeners' ability to separate streams of speech has focused on adults. Yet infants also find themselves in noisy environments. In order to learn from their caregivers' speech in these settings, they must first separate it from background noise such as that from television shows and siblings. Previous work has found that 7.5-month-old infants can separate streams of speech when the target voice is more intense than the distractor voice (Newman and Jusczyk, 1996), when the target voice is known to the infant (Barker and Newman, 2000) or when infants are presented with an audiovisual (rather than auditory-only) signal (Hollich, Jusczyk, and Newman, 2001). Unfortunately, the paradigm in these studies can only be used on infants at least 7.5 months of age, limiting the ability to investigate how stream segregation develops over time. The present work uses a new paradigm to explore younger infants' ability to separate streams of speech. Infants aged 4.5 months heard a female talker repeat either their own name or another infants' name, while several other voices spoke fluently in the background. We present data on infants' ability to recognize their own name in this cocktail party situation. [Work supported by NSF and NICHD.
The role of emotion in dynamic audiovisual integration of faces and voices

PubMed Central

Kotz, Sonja A.; Tavano, Alessandro; Schröger, Erich

2015-01-01

We used human electroencephalogram to study early audiovisual integration of dynamic angry and neutral expressions. An auditory-only condition served as a baseline for the interpretation of integration effects. In the audiovisual conditions, the validity of visual information was manipulated using facial expressions that were either emotionally congruent or incongruent with the vocal expressions. First, we report an N1 suppression effect for angry compared with neutral vocalizations in the auditory-only condition. Second, we confirm early integration of congruent visual and auditory information as indexed by a suppression of the auditory N1 and P2 components in the audiovisual compared with the auditory-only condition. Third, audiovisual N1 suppression was modulated by audiovisual congruency in interaction with emotion: for neutral vocalizations, there was N1 suppression in both the congruent and the incongruent audiovisual conditions. For angry vocalizations, there was N1 suppression only in the congruent but not in the incongruent condition. Extending previous findings of dynamic audiovisual integration, the current results suggest that audiovisual N1 suppression is congruency- and emotion-specific and indicate that dynamic emotional expressions compared with non-emotional expressions are preferentially processed in early audiovisual integration. PMID:25147273
Audiovisual integration facilitates monkeys' short-term memory.

PubMed

Bigelow, James; Poremba, Amy

2016-07-01

Many human behaviors are known to benefit from audiovisual integration, including language and communication, recognizing individuals, social decision making, and memory. Exceptionally little is known about the contributions of audiovisual integration to behavior in other primates. The current experiment investigated whether short-term memory in nonhuman primates is facilitated by the audiovisual presentation format. Three macaque monkeys that had previously learned an auditory delayed matching-to-sample (DMS) task were trained to perform a similar visual task, after which they were tested with a concurrent audiovisual DMS task with equal proportions of auditory, visual, and audiovisual trials. Parallel to outcomes in human studies, accuracy was higher and response times were faster on audiovisual trials than either unisensory trial type. Unexpectedly, two subjects exhibited superior unimodal performance on auditory trials, a finding that contrasts with previous studies, but likely reflects their training history. Our results provide the first demonstration of a bimodal memory advantage in nonhuman primates, lending further validation to their use as a model for understanding audiovisual integration and memory processing in humans.
Catalog of Audiovisual Materials Related to Rehabilitation.

ERIC Educational Resources Information Center

Mann, Joe, Ed.; Henderson, Jim, Ed.

An annotated listing of a variety of audiovisual formats on content related to the social-rehabilitation process is provided. The materials in the listing were selected from a collection of over 200 audiovisual catalogs. The major portion of the materials has not been screened. The materials are classified alphabetically by the following subject…
The spatial reliability of task-irrelevant sounds modulates bimodal audiovisual integration: An event-related potential study.

PubMed

Li, Qi; Yu, Hongtao; Wu, Yan; Gao, Ning

2016-08-26

The integration of multiple sensory inputs is essential for perception of the external world. The spatial factor is a fundamental property of multisensory audiovisual integration. Previous studies of the spatial constraints on bimodal audiovisual integration have mainly focused on the spatial congruity of audiovisual information. However, the effect of spatial reliability within audiovisual information on bimodal audiovisual integration remains unclear. In this study, we used event-related potentials (ERPs) to examine the effect of spatial reliability of task-irrelevant sounds on audiovisual integration. Three relevant ERP components emerged: the first at 140-200ms over a wide central area, the second at 280-320ms over the fronto-central area, and a third at 380-440ms over the parieto-occipital area. Our results demonstrate that ERP amplitudes elicited by audiovisual stimuli with reliable spatial relationships are larger than those elicited by stimuli with inconsistent spatial relationships. In addition, we hypothesized that spatial reliability within an audiovisual stimulus enhances feedback projections to the primary visual cortex from multisensory integration regions. Overall, our findings suggest that the spatial linking of visual and auditory information depends on spatial reliability within an audiovisual stimulus and occurs at a relatively late stage of processing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The role of emotion in dynamic audiovisual integration of faces and voices.

PubMed

Kokinous, Jenny; Kotz, Sonja A; Tavano, Alessandro; Schröger, Erich

2015-05-01

We used human electroencephalogram to study early audiovisual integration of dynamic angry and neutral expressions. An auditory-only condition served as a baseline for the interpretation of integration effects. In the audiovisual conditions, the validity of visual information was manipulated using facial expressions that were either emotionally congruent or incongruent with the vocal expressions. First, we report an N1 suppression effect for angry compared with neutral vocalizations in the auditory-only condition. Second, we confirm early integration of congruent visual and auditory information as indexed by a suppression of the auditory N1 and P2 components in the audiovisual compared with the auditory-only condition. Third, audiovisual N1 suppression was modulated by audiovisual congruency in interaction with emotion: for neutral vocalizations, there was N1 suppression in both the congruent and the incongruent audiovisual conditions. For angry vocalizations, there was N1 suppression only in the congruent but not in the incongruent condition. Extending previous findings of dynamic audiovisual integration, the current results suggest that audiovisual N1 suppression is congruency- and emotion-specific and indicate that dynamic emotional expressions compared with non-emotional expressions are preferentially processed in early audiovisual integration. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Reducing Information's Speed Improves Verbal Cognition and Behavior in Autism: A 2-Cases Report.

PubMed

Tardif, Carole; Latzko, Laura; Arciszewski, Thomas; Gepner, Bruno

2017-06-01

According to the temporal theory of autism spectrum disorders (ASDs), audiovisual changes in environment, particularly those linked to facial and verbal language, are often too fast to be faced, perceived, and/or interpreted online by many children with ASD, which could help explain their facial, verbal, and/or socioemotional interaction impairments. Our goal here was to test for the first time the impact of slowed-down audiovisual information on verbal cognition and behavior in 2 boys with ASD and verbal delay. Using 15 experimental sessions during 4 months, both boys were presented with various stimuli (eg, pictures, words, sentences, cartoons) and were then asked questions or given instructions regarding stimuli. The audiovisual stimuli and instructions/questions were presented on a computer's screen and were always displayed twice: at real-time speed (RTS) and at slowed-down speed (SDS) using the software Logiral. We scored the boys' verbal cognition performance (ie, ability to understand questions/instructions and answer them verbally/nonverbally) and their behavioral reactions (ie, attention, verbal/nonverbal communication, social reciprocity), and analyzed the effects of speed and order of the stimuli presentation on these factors. According to the results, both participants exhibited significant improvements in verbal cognition performance with SDS presentation compared with RTS presentation, and they scored better with RTS presentation when having SDS presentation before rather than after RTS presentation. Behavioral reactions were also improved in SDS conditions compared with RTS conditions. This initial evidence of a positive impact of slowed-down audiovisual information on verbal cognition should be tested in a large cohort of children with ASD and associated speech/language impairments. Copyright © 2017 by the American Academy of Pediatrics.
Cortical oscillations modulated by congruent and incongruent audiovisual stimuli.

PubMed

Herdman, A T; Fujioka, T; Chau, W; Ross, B; Pantev, C; Picton, T W

2004-11-30

Congruent or incongruent grapheme-phoneme stimuli are easily perceived as one or two linguistic objects. The main objective of this study was to investigate the changes in cortical oscillations that reflect the processing of congruent and incongruent audiovisual stimuli. Graphemes were Japanese Hiragana characters for four different vowels (/a/, /o/, /u/, and /i/). They were presented simultaneously with their corresponding phonemes (congruent) or non-corresponding phonemes (incongruent) to native-speaking Japanese participants. Participants' reaction times to the congruent audiovisual stimuli were significantly faster by 57 ms as compared to reaction times to incongruent stimuli. We recorded the brain responses for each condition using a whole-head magnetoencephalograph (MEG). A novel approach to analysing MEG data, called synthetic aperture magnetometry (SAM), was used to identify event-related changes in cortical oscillations involved in audiovisual processing. The SAM contrast between congruent and incongruent responses revealed greater event-related desynchonization (8-16 Hz) bilaterally in the occipital lobes and greater event-related synchronization (4-8 Hz) in the left transverse temporal gyrus. Results from this study further support the concept of interactions between the auditory and visual sensory cortices in multi-sensory processing of audiovisual objects.
Effect of perceptual load on semantic access by speech in children

PubMed Central

Jerger, Susan; Damian, Markus F.; Mills, Candice; Bartlett, James; Tye-Murray, Nancy; Abdi, Hervè

2013-01-01

Purpose To examine whether semantic access by speech requires attention in children. Method Children (N=200) named pictures and ignored distractors on a cross-modal (distractors: auditory-no face) or multi-modal (distractors: auditory-static face and audiovisual-dynamic face) picture word task. The cross-modal had a low load, and the multi-modal had a high load [i.e., respectively naming pictures displayed 1) on a blank screen vs 2) below the talker’s face on his T-shirt]. Semantic content of distractors was manipulated to be related vs unrelated to picture (e.g., picture dog with distractors bear vs cheese). Lavie's (2005) perceptual load model proposes that semantic access is independent of capacity limited attentional resources if irrelevant semantic-content manipulation influences naming times on both tasks despite variations in loads but dependent on attentional resources exhausted by higher load task if irrelevant content influences naming only on cross-modal (low load). Results Irrelevant semantic content affected performance for both tasks in 6- to 9-year-olds, but only on cross-modal in 4–5-year-olds. The addition of visual speech did not influence results on the multi-modal task. Conclusion Younger and older children differ in dependence on attentional resources for semantic access by speech. PMID:22896045
Effect of perceptual load on semantic access by speech in children.

PubMed

Jerger, Susan; Damian, Markus F; Mills, Candice; Bartlett, James; Tye-Murray, Nancy; Abdi, Hervé

2013-04-01

To examine whether semantic access by speech requires attention in children. Children (N = 200) named pictures and ignored distractors on a cross-modal (distractors: auditory-no face) or multimodal (distractors: auditory-static face and audiovisual-dynamic face) picture word task. The cross-modal task had a low load, and the multimodal task had a high load (i.e., respectively naming pictures displayed on a blank screen vs. below the talker's face on his T-shirt). Semantic content of distractors was manipulated to be related vs. unrelated to the picture (e.g., picture "dog" with distractors "bear" vs. "cheese"). If irrelevant semantic content manipulation influences naming times on both tasks despite variations in loads, Lavie's (2005) perceptual load model proposes that semantic access is independent of capacity-limited attentional resources; if, however, irrelevant content influences naming only on the cross-modal task (low load), the perceptual load model proposes that semantic access is dependent on attentional resources exhausted by the higher load task. Irrelevant semantic content affected performance for both tasks in 6- to 9-year-olds but only on the cross-modal task in 4- to 5-year-olds. The addition of visual speech did not influence results on the multimodal task. Younger and older children differ in dependence on attentional resources for semantic access by speech.

Hippocampal temporal-parietal junction interaction in the production of psychotic symptoms: a framework for understanding the schizophrenic syndrome

PubMed Central

Wible, Cynthia G.

2012-01-01

A framework is described for understanding the schizophrenic syndrome at the brain systems level. It is hypothesized that over-activation of dynamic gesture and social perceptual processes in the temporal-parietal occipital junction (TPJ), posterior superior temporal sulcus (PSTS) and surrounding regions produce the syndrome (including positive and negative symptoms, their prevalence, prodromal signs, and cognitive deficits). Hippocampal system hyper-activity and atrophy have been consistently found in schizophrenia. Hippocampal activity is highly correlated with activity in the TPJ and may be a source of over-excitation of the TPJ and surrounding regions. Strong evidence for this comes from in-vivo recordings in humans during psychotic episodes. Many positive symptoms of schizophrenia can be reframed as the erroneous sense of a presence or other who is observing, acting, speaking, or controlling; these qualia are similar to those evoked during abnormal activation of the TPJ. The TPJ and PSTS play a key role in the perception (and production) of dynamic social, emotional, and attentional gestures for the self and others (e.g., body/face/eye gestures, audiovisual speech and prosody, and social attentional gestures such as eye gaze). The single cell representation of dynamic gestures is multimodal (auditory, visual, tactile), matching the predominant hallucinatory categories in schizophrenia. Inherent in the single cell perceptual signal of dynamic gesture representations is a computation of intention, agency, and anticipation or expectancy (for the self and others). Stimulation of the TPJ resulting in activation of the self representation has been shown to result a feeling of a presence or multiple presences (due to heautoscopy) and also bizarre tactile experiences. Neurons in the TPJ are also tuned, or biased to detect threat related emotions. Abnormal over-activation in this system could produce the conscious hallucination of a voice (audiovisual speech), a person or a touch. Over-activation could interfere with attentional/emotional gesture perception and production (negative symptoms). It could produce the unconscious feeling of being watched, followed, or of a social situation unfolding along with accompanying abnormal perception of intent and agency (delusions). Abnormal activity in the TPJ would also be predicted to create several cognitive disturbances that are characteristic of schizophrenia, including abnormalities in attention, predictive social processing, working memory, and a bias to erroneously perceive threat. PMID:22737114
Motivation and appraisal in perception of poorly specified speech.

PubMed

Lidestam, Björn; Beskow, Jonas

2006-04-01

Normal-hearing students (n = 72) performed sentence, consonant, and word identification in either A (auditory), V (visual), or AV (audiovisual) modality. The auditory signal had difficult speech-to-noise relations. Talker (human vs. synthetic), topic (no cue vs. cue-words), and emotion (no cue vs. facially displayed vs. cue-words) were varied within groups. After the first block, effects of modality, face, topic, and emotion on initial appraisal and motivation were assessed. After the entire session, effects of modality on longer-term appraisal and motivation were assessed. The results from both assessments showed that V identification was more positively appraised than A identification. Correlations were tentatively interpreted such that evaluation of self-rated performance possibly depends on subjective standard and is reflected on motivation (if below subjective standard, AV group), or on appraisal (if above subjective standard, A group). Suggestions for further research are presented.
The process of developing audiovisual patient information: challenges and opportunities.

PubMed

Hutchison, Catherine; McCreaddie, May

2007-11-01

The aim of this project was to produce audiovisual patient information, which was user friendly and fit for purpose. The purpose of the audiovisual patient information is to inform patients about randomized controlled trials, as a supplement to their trial-specific written information sheet. Audiovisual patient information is known to be an effective way of informing patients about treatment. User involvement is also recognized as being important in the development of service provision. The aim of this paper is (i) to describe and discuss the process of developing the audiovisual patient information and (ii) to highlight the challenges and opportunities, thereby identifying implications for practice. A future study will test the effectiveness of the audiovisual patient information in the cancer clinical trial setting. An advisory group was set up to oversee the project and provide guidance in relation to information content, level and delivery. An expert panel of two patients provided additional guidance and a dedicated operational team dealt with the logistics of the project including: ethics; finance; scriptwriting; filming; editing and intellectual property rights. Challenges included the limitations of filming in a busy clinical environment, restricted technical and financial resources, ethical needs and issues around copyright. There were, however, substantial opportunities that included utilizing creative skills, meaningfully involving patients, teamworking and mutual appreciation of clinical, multidisciplinary and technical expertise. Developing audiovisual patient information is an important area for nurses to be involved with. However, this must be performed within the context of the multiprofessional team. Teamworking, including patient involvement, is crucial as a wide variety of expertise is required. Many aspects of the process are transferable and will provide information and guidance for nurses, regardless of specialty, considering developing this format of patient information.
Multisensory training can promote or impede visual perceptual learning of speech stimuli: visual-tactile vs. visual-auditory training.

PubMed

Eberhardt, Silvio P; Auer, Edward T; Bernstein, Lynne E

2014-01-01

In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee's primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee's lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).
Visual speech discrimination and identification of natural and synthetic consonant stimuli

PubMed Central

Files, Benjamin T.; Tjan, Bosco S.; Jiang, Jintao; Bernstein, Lynne E.

2015-01-01

From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d’) and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d’) increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d’ with inverted stimuli but a persistent pattern of larger d’ for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis. PMID:26217249
Multisensory training can promote or impede visual perceptual learning of speech stimuli: visual-tactile vs. visual-auditory training

PubMed Central

Eberhardt, Silvio P.; Auer Jr., Edward T.; Bernstein, Lynne E.

2014-01-01

In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee’s primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee’s lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT). PMID:25400566
Audio-visual synchrony and spatial attention enhance processing of dynamic visual stimulation independently and in parallel: A frequency-tagging study.

PubMed

Covic, Amra; Keitel, Christian; Porcu, Emanuele; Schröger, Erich; Müller, Matthias M

2017-11-01

The neural processing of a visual stimulus can be facilitated by attending to its position or by a co-occurring auditory tone. Using frequency-tagging, we investigated whether facilitation by spatial attention and audio-visual synchrony rely on similar neural processes. Participants attended to one of two flickering Gabor patches (14.17 and 17 Hz) located in opposite lower visual fields. Gabor patches further "pulsed" (i.e. showed smooth spatial frequency variations) at distinct rates (3.14 and 3.63 Hz). Frequency-modulating an auditory stimulus at the pulse-rate of one of the visual stimuli established audio-visual synchrony. Flicker and pulsed stimulation elicited stimulus-locked rhythmic electrophysiological brain responses that allowed tracking the neural processing of simultaneously presented Gabor patches. These steady-state responses (SSRs) were quantified in the spectral domain to examine visual stimulus processing under conditions of synchronous vs. asynchronous tone presentation and when respective stimulus positions were attended vs. unattended. Strikingly, unique patterns of effects on pulse- and flicker driven SSRs indicated that spatial attention and audiovisual synchrony facilitated early visual processing in parallel and via different cortical processes. We found attention effects to resemble the classical top-down gain effect facilitating both, flicker and pulse-driven SSRs. Audio-visual synchrony, in turn, only amplified synchrony-producing stimulus aspects (i.e. pulse-driven SSRs) possibly highlighting the role of temporally co-occurring sights and sounds in bottom-up multisensory integration. Copyright © 2017 Elsevier Inc. All rights reserved.
Preserved Discrimination Performance and Neural Processing during Crossmodal Attention in Aging

PubMed Central

Mishra, Jyoti; Gazzaley, Adam

2013-01-01

In a recent study in younger adults (19-29 year olds) we showed evidence that distributed audiovisual attention resulted in improved discrimination performance for audiovisual stimuli compared to focused visual attention. Here, we extend our findings to healthy older adults (60-90 year olds), showing that performance benefits of distributed audiovisual attention in this population match those of younger adults. Specifically, improved performance was revealed in faster response times for semantically congruent audiovisual stimuli during distributed relative to focused visual attention, without any differences in accuracy. For semantically incongruent stimuli, discrimination accuracy was significantly improved during distributed relative to focused attention. Furthermore, event-related neural processing showed intact crossmodal integration in higher performing older adults similar to younger adults. Thus, there was insufficient evidence to support an age-related deficit in crossmodal attention. PMID:24278464
Primary School Pupils' Response to Audio-Visual Learning Process in Port-Harcourt

ERIC Educational Resources Information Center

Olube, Friday K.

2015-01-01

The purpose of this study is to examine primary school children's response on the use of audio-visual learning processes--a case study of Chokhmah International Academy, Port-Harcourt (owned by Salvation Ministries). It looked at the elements that enhance pupils' response to educational television programmes and their hindrances to these…
Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

PubMed

Keitel, Christian; Müller, Matthias M

2016-05-01

Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space.
Young children's recall and reconstruction of audio and audiovisual narratives.

PubMed

Gibbons, J; Anderson, D R; Smith, R; Field, D E; Fischer, C

1986-08-01

It has been claimed that the visual component of audiovisual media dominates young children's cognitive processing. This experiment examines the effects of input modality while controlling the complexity of the visual and auditory content and while varying the comprehension task (recall vs. reconstruction). 4- and 7-year-olds were presented brief stories through either audio or audiovisual media. The audio version consisted of narrated character actions and character utterances. The narrated actions were matched to the utterances on the basis of length and propositional complexity. The audiovisual version depicted the actions visually by means of stop animation instead of by auditory narrative statements. The character utterances were the same in both versions. Audiovisual input produced superior performance on explicit information in the 4-year-olds and produced more inferences at both ages. Because performance on utterances was superior in the audiovisual condition as compared to the audio condition, there was no evidence that visual input inhibits processing of auditory information. Actions were more likely to be produced by the younger children than utterances, regardless of input medium, indicating that prior findings of visual dominance may have been due to the salience of narrative action. Reconstruction, as compared to recall, produced superior depiction of actions at both ages as well as more constrained relevant inferences and narrative conventions.
The production of audiovisual teaching tools in minimally invasive surgery.

PubMed

Tolerton, Sarah K; Hugh, Thomas J; Cosman, Peter H

2012-01-01

Audiovisual learning resources have become valuable adjuncts to formal teaching in surgical training. This report discusses the process and challenges of preparing an audiovisual teaching tool for laparoscopic cholecystectomy. The relative value in surgical education and training, for both the creator and viewer are addressed. This audiovisual teaching resource was prepared as part of the Master of Surgery program at the University of Sydney, Australia. The different methods of video production used to create operative teaching tools are discussed. Collating and editing material for an audiovisual teaching resource can be a time-consuming and technically challenging process. However, quality learning resources can now be produced even with limited prior video editing experience. With minimal cost and suitable guidance to ensure clinically relevant content, most surgeons should be able to produce short, high-quality education videos of both open and minimally invasive surgery. Despite the challenges faced during production of audiovisual teaching tools, these resources are now relatively easy to produce using readily available software. These resources are particularly attractive to surgical trainees when real time operative footage is used. They serve as valuable adjuncts to formal teaching, particularly in the setting of minimally invasive surgery. Copyright © 2012 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Multiple concurrent temporal recalibrations driven by audiovisual stimuli with apparent physical differences.

PubMed

Yuan, Xiangyong; Bi, Cuihua; Huang, Xiting

2015-05-01

Out-of-synchrony experiences can easily recalibrate one's subjective simultaneity point in the direction of the experienced asynchrony. Although temporal adjustment of multiple audiovisual stimuli has been recently demonstrated to be spatially specific, perceptual grouping processes that organize separate audiovisual stimuli into distinctive "objects" may play a more important role in forming the basis for subsequent multiple temporal recalibrations. We investigated whether apparent physical differences between audiovisual pairs that make them distinct from each other can independently drive multiple concurrent temporal recalibrations regardless of spatial overlap. Experiment 1 verified that reducing the physical difference between two audiovisual pairs diminishes the multiple temporal recalibrations by exposing observers to two utterances with opposing temporal relationships spoken by one single speaker rather than two distinct speakers at the same location. Experiment 2 found that increasing the physical difference between two stimuli pairs can promote multiple temporal recalibrations by complicating their non-temporal dimensions (e.g., disks composed of two rather than one attribute and tones generated by multiplying two frequencies); however, these recalibration aftereffects were subtle. Experiment 3 further revealed that making the two audiovisual pairs differ in temporal structures (one transient and one gradual) was sufficient to drive concurrent temporal recalibration. These results confirm that the more audiovisual pairs physically differ, especially in temporal profile, the more likely multiple temporal perception adjustments will be content-constrained regardless of spatial overlap. These results indicate that multiple temporal recalibrations are based secondarily on the outcome of perceptual grouping processes.
Effects of Sound Frequency on Audiovisual Integration: An Event-Related Potential Study.

PubMed

Yang, Weiping; Yang, Jingjing; Gao, Yulin; Tang, Xiaoyu; Ren, Yanna; Takahashi, Satoshi; Wu, Jinglong

2015-01-01

A combination of signals across modalities can facilitate sensory perception. The audiovisual facilitative effect strongly depends on the features of the stimulus. Here, we investigated how sound frequency, which is one of basic features of an auditory signal, modulates audiovisual integration. In this study, the task of the participant was to respond to a visual target stimulus by pressing a key while ignoring auditory stimuli, comprising of tones of different frequencies (0.5, 1, 2.5 and 5 kHz). A significant facilitation of reaction times was obtained following audiovisual stimulation, irrespective of whether the task-irrelevant sounds were low or high frequency. Using event-related potential (ERP), audiovisual integration was found over the occipital area for 0.5 kHz auditory stimuli from 190-210 ms, for 1 kHz stimuli from 170-200 ms, for 2.5 kHz stimuli from 140-200 ms, 5 kHz stimuli from 100-200 ms. These findings suggest that a higher frequency sound signal paired with visual stimuli might be early processed or integrated despite the auditory stimuli being task-irrelevant information. Furthermore, audiovisual integration in late latency (300-340 ms) ERPs with fronto-central topography was found for auditory stimuli of lower frequencies (0.5, 1 and 2.5 kHz). Our results confirmed that audiovisual integration is affected by the frequency of an auditory stimulus. Taken together, the neurophysiological results provide unique insight into how the brain processes a multisensory visual signal and auditory stimuli of different frequencies.
Audio-visual speech perception in prelingually deafened Japanese children following sequential bilateral cochlear implantation.

PubMed

Yamamoto, Ryosuke; Naito, Yasushi; Tona, Risa; Moroto, Saburo; Tamaya, Rinko; Fujiwara, Keizo; Shinohara, Shogo; Takebayashi, Shinji; Kikuchi, Masahiro; Michida, Tetsuhiko

2017-11-01

An effect of audio-visual (AV) integration is observed when the auditory and visual stimuli are incongruent (the McGurk effect). In general, AV integration is helpful especially in subjects wearing hearing aids or cochlear implants (CIs). However, the influence of AV integration on spoken word recognition in individuals with bilateral CIs (Bi-CIs) has not been fully investigated so far. In this study, we investigated AV integration in children with Bi-CIs. The study sample included thirty one prelingually deafened children who underwent sequential bilateral cochlear implantation. We assessed their responses to congruent and incongruent AV stimuli with three CI-listening modes: only the 1st CI, only the 2nd CI, and Bi-CIs. The responses were assessed in the whole group as well as in two sub-groups: a proficient group (syllable intelligibility ≥80% with the 1st CI) and a non-proficient group (syllable intelligibility < 80% with the 1st CI). We found evidence of the McGurk effect in each of the three CI-listening modes. AV integration responses were observed in a subset of incongruent AV stimuli, and the patterns observed with the 1st CI and with Bi-CIs were similar. In the proficient group, the responses with the 2nd CI were not significantly different from those with the 1st CI whereas in the non-proficient group the responses with the 2nd CI were driven by visual stimuli more than those with the 1st CI. Our results suggested that prelingually deafened Japanese children who underwent sequential bilateral cochlear implantation exhibit AV integration abilities, both in monaural listening as well as in binaural listening. We also observed a higher influence of visual stimuli on speech perception with the 2nd CI in the non-proficient group, suggesting that Bi-CIs listeners with poorer speech recognition rely on visual information more compared to the proficient subjects to compensate for poorer auditory input. Nevertheless, poorer quality auditory input with the 2nd CI did not interfere with AV integration with binaural listening (with Bi-CIs). Overall, the findings of this study might be used to inform future research to identify the best strategies for speech training using AV integration effectively in prelingually deafened children. Copyright © 2017 Elsevier B.V. All rights reserved.
Attenuated audiovisual integration in middle-aged adults in a discrimination task.

PubMed

Yang, Weiping; Ren, Yanna

2018-02-01

Numerous studies have focused on the diversity of audiovisual integration between younger and older adults. However, consecutive trends in audiovisual integration throughout life are still unclear. In the present study, to clarify audiovisual integration characteristics in middle-aged adults, we instructed younger and middle-aged adults to conduct an auditory/visual stimuli discrimination experiment. Randomized streams of unimodal auditory (A), unimodal visual (V) or audiovisual stimuli were presented on the left or right hemispace of the central fixation point, and subjects were instructed to respond to the target stimuli rapidly and accurately. Our results demonstrated that the responses of middle-aged adults to all unimodal and bimodal stimuli were significantly slower than those of younger adults (p < 0.05). Audiovisual integration was markedly delayed (onset time 360 ms) and weaker (peak 3.97%) in middle-aged adults than in younger adults (onset time 260 ms, peak 11.86%). The results suggested that audiovisual integration was attenuated in middle-aged adults and further confirmed age-related decline in information processing.
A Network Model of Observation and Imitation of Speech

PubMed Central

Mashal, Nira; Solodkin, Ana; Dick, Anthony Steven; Chen, E. Elinor; Small, Steven L.

2012-01-01

Much evidence has now accumulated demonstrating and quantifying the extent of shared regional brain activation for observation and execution of speech. However, the nature of the actual networks that implement these functions, i.e., both the brain regions and the connections among them, and the similarities and differences across these networks has not been elucidated. The current study aims to characterize formally a network for observation and imitation of syllables in the healthy adult brain and to compare their structure and effective connectivity. Eleven healthy participants observed or imitated audiovisual syllables spoken by a human actor. We constructed four structural equation models to characterize the networks for observation and imitation in each of the two hemispheres. Our results show that the network models for observation and imitation comprise the same essential structure but differ in important ways from each other (in both hemispheres) based on connectivity. In particular, our results show that the connections from posterior superior temporal gyrus and sulcus to ventral premotor, ventral premotor to dorsal premotor, and dorsal premotor to primary motor cortex in the left hemisphere are stronger during imitation than during observation. The first two connections are implicated in a putative dorsal stream of speech perception, thought to involve translating auditory speech signals into motor representations. Thus, the current results suggest that flow of information during imitation, starting at the posterior superior temporal cortex and ending in the motor cortex, enhances input to the motor cortex in the service of speech execution. PMID:22470360
The System-Wide Effect of Real-Time Audiovisual Feedback and Postevent Debriefing for In-Hospital Cardiac Arrest: The Cardiopulmonary Resuscitation Quality Improvement Initiative.

PubMed

Couper, Keith; Kimani, Peter K; Abella, Benjamin S; Chilwan, Mehboob; Cooke, Matthew W; Davies, Robin P; Field, Richard A; Gao, Fang; Quinton, Sarah; Stallard, Nigel; Woolley, Sarah; Perkins, Gavin D

2015-11-01

To evaluate the effect of implementing real-time audiovisual feedback with and without postevent debriefing on survival and quality of cardiopulmonary resuscitation quality at in-hospital cardiac arrest. A two-phase, multicentre prospective cohort study. Three UK hospitals, all part of one National Health Service Acute Trust. One thousand three hundred and ninety-five adult patients who sustained an in-hospital cardiac arrest at the study hospitals and were treated by hospital emergency teams between November 2009 and May 2013. During phase 1, quality of cardiopulmonary resuscitation and patient outcomes were measured with no intervention implemented. During phase 2, staff at hospital 1 received real-time audiovisual feedback, whereas staff at hospital 2 received real-time audiovisual feedback supplemented by postevent debriefing. No intervention was implemented at hospital 3 during phase 2. The primary outcome was return of spontaneous circulation. Secondary endpoints included other patient-focused outcomes, such as survival to hospital discharge, and process-focused outcomes, such as chest compression depth. Random-effect logistic and linear regression models, adjusted for baseline patient characteristics, were used to analyze the effect of the interventions on study outcomes. In comparison with no intervention, neither real-time audiovisual feedback (adjusted odds ratio, 0.62; 95% CI, 0.31-1.22; p=0.17) nor real-time audiovisual feedback supplemented by postevent debriefing (adjusted odds ratio, 0.65; 95% CI, 0.35-1.21; p=0.17) was associated with a statistically significant improvement in return of spontaneous circulation or any process-focused outcome. Despite this, there was evidence of a system-wide improvement in phase 2, leading to improvements in return of spontaneous circulation (adjusted odds ratio, 1.87; 95% CI, 1.06-3.30; p=0.03) and process-focused outcomes. Implementation of real-time audiovisual feedback with or without postevent debriefing did not lead to a measured improvement in patient or process-focused outcomes at individual hospital sites. However, there was an unexplained system-wide improvement in return of spontaneous circulation and process-focused outcomes during the second phase of the study.
Action-outcome learning and prediction shape the window of simultaneity of audiovisual outcomes.

PubMed

Desantis, Andrea; Haggard, Patrick

2016-08-01

To form a coherent representation of the objects around us, the brain must group the different sensory features composing these objects. Here, we investigated whether actions contribute in this grouping process. In particular, we assessed whether action-outcome learning and prediction contribute to audiovisual temporal binding. Participants were presented with two audiovisual pairs: one pair was triggered by a left action, and the other by a right action. In a later test phase, the audio and visual components of these pairs were presented at different onset times. Participants judged whether they were simultaneous or not. To assess the role of action-outcome prediction on audiovisual simultaneity, each action triggered either the same audiovisual pair as in the learning phase ('predicted' pair), or the pair that had previously been associated with the other action ('unpredicted' pair). We found the time window within which auditory and visual events appeared simultaneous increased for predicted compared to unpredicted pairs. However, no change in audiovisual simultaneity was observed when audiovisual pairs followed visual cues, rather than voluntary actions. This suggests that only action-outcome learning promotes temporal grouping of audio and visual effects. In a second experiment we observed that changes in audiovisual simultaneity do not only depend on our ability to predict what outcomes our actions generate, but also on learning the delay between the action and the multisensory outcome. When participants learned that the delay between action and audiovisual pair was variable, the window of audiovisual simultaneity for predicted pairs increased, relative to a fixed action-outcome pair delay. This suggests that participants learn action-based predictions of audiovisual outcome, and adapt their temporal perception of outcome events based on such predictions. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Attention Strongly Modulates Reliability of Neural Responses to Naturalistic Narrative Stimuli.

PubMed

Ki, Jason J; Kelly, Simon P; Parra, Lucas C

2016-03-09

Attentional engagement is a major determinant of how effectively we gather information through our senses. Alongside the sheer growth in the amount and variety of information content that we are presented with through modern media, there is increased variability in the degree to which we "absorb" that information. Traditional research on attention has illuminated the basic principles of sensory selection to isolated features or locations, but it provides little insight into the neural underpinnings of our attentional engagement with modern naturalistic content. Here, we show in human subjects that the reliability of an individual's neural responses with respect to a larger group provides a highly robust index of the level of attentional engagement with a naturalistic narrative stimulus. Specifically, fast electroencephalographic evoked responses were more strongly correlated across subjects when naturally attending to auditory or audiovisual narratives than when attention was directed inward to a mental arithmetic task during stimulus presentation. This effect was strongest for audiovisual stimuli with a cohesive narrative and greatly reduced for speech stimuli lacking meaning. For compelling audiovisual narratives, the effect is remarkably strong, allowing perfect discrimination between attentional state across individuals. Control experiments rule out possible confounds related to altered eye movement trajectories or order of presentation. We conclude that reliability of evoked activity reproduced across subjects viewing the same movie is highly sensitive to the attentional state of the viewer and listener, which is aided by a cohesive narrative. Copyright © 2016 Ki et al.

Enhanced Sensitivity to Subphonemic Segments in Dyslexia: A New Instance of Allophonic Perception

PubMed Central

Serniclaes, Willy; Seck, M’ballo

2018-01-01

Although dyslexia can be individuated in many different ways, it has only three discernable sources: a visual deficit that affects the perception of letters, a phonological deficit that affects the perception of speech sounds, and an audio-visual deficit that disturbs the association of letters with speech sounds. However, the very nature of each of these core deficits remains debatable. The phonological deficit in dyslexia, which is generally attributed to a deficit of phonological awareness, might result from a specific mode of speech perception characterized by the use of allophonic (i.e., subphonemic) units. Here we will summarize the available evidence and present new data in support of the “allophonic theory” of dyslexia. Previous studies have shown that the dyslexia deficit in the categorical perception of phonemic features (e.g., the voicing contrast between /t/ and /d/) is due to the enhanced sensitivity to allophonic features (e.g., the difference between two variants of /d/). Another consequence of allophonic perception is that it should also give rise to an enhanced sensitivity to allophonic segments, such as those that take place within a consonant cluster. This latter prediction is validated by the data presented in this paper. PMID:29587419
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English

PubMed Central

Russo, Frank A.

2018-01-01

The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976. PMID:29768426
Rapid recalibration of speech perception after experiencing the McGurk illusion.

PubMed

Lüttke, Claudia S; Pérez-Bellido, Alexis; de Lange, Floris P

2018-03-01

The human brain can quickly adapt to changes in the environment. One example is phonetic recalibration: a speech sound is interpreted differently depending on the visual speech and this interpretation persists in the absence of visual information. Here, we examined the mechanisms of phonetic recalibration. Participants categorized the auditory syllables /aba/ and /ada/, which were sometimes preceded by the so-called McGurk stimuli (in which an /aba/ sound, due to visual /aga/ input, is often perceived as 'ada'). We found that only one trial of exposure to the McGurk illusion was sufficient to induce a recalibration effect, i.e. an auditory /aba/ stimulus was subsequently more often perceived as 'ada'. Furthermore, phonetic recalibration took place only when auditory and visual inputs were integrated to 'ada' (McGurk illusion). Moreover, this recalibration depended on the sensory similarity between the preceding and current auditory stimulus. Finally, signal detection theoretical analysis showed that McGurk-induced phonetic recalibration resulted in both a criterion shift towards /ada/ and a reduced sensitivity to distinguish between /aba/ and /ada/ sounds. The current study shows that phonetic recalibration is dependent on the perceptual integration of audiovisual information and leads to a perceptual shift in phoneme categorization.
Effects of Sound Frequency on Audiovisual Integration: An Event-Related Potential Study

PubMed Central

Yang, Weiping; Yang, Jingjing; Gao, Yulin; Tang, Xiaoyu; Ren, Yanna; Takahashi, Satoshi; Wu, Jinglong

2015-01-01

A combination of signals across modalities can facilitate sensory perception. The audiovisual facilitative effect strongly depends on the features of the stimulus. Here, we investigated how sound frequency, which is one of basic features of an auditory signal, modulates audiovisual integration. In this study, the task of the participant was to respond to a visual target stimulus by pressing a key while ignoring auditory stimuli, comprising of tones of different frequencies (0.5, 1, 2.5 and 5 kHz). A significant facilitation of reaction times was obtained following audiovisual stimulation, irrespective of whether the task-irrelevant sounds were low or high frequency. Using event-related potential (ERP), audiovisual integration was found over the occipital area for 0.5 kHz auditory stimuli from 190–210 ms, for 1 kHz stimuli from 170–200 ms, for 2.5 kHz stimuli from 140–200 ms, 5 kHz stimuli from 100–200 ms. These findings suggest that a higher frequency sound signal paired with visual stimuli might be early processed or integrated despite the auditory stimuli being task-irrelevant information. Furthermore, audiovisual integration in late latency (300–340 ms) ERPs with fronto-central topography was found for auditory stimuli of lower frequencies (0.5, 1 and 2.5 kHz). Our results confirmed that audiovisual integration is affected by the frequency of an auditory stimulus. Taken together, the neurophysiological results provide unique insight into how the brain processes a multisensory visual signal and auditory stimuli of different frequencies. PMID:26384256
Audiologist-driven versus patient-driven fine tuning of hearing instruments.

PubMed

Boymans, Monique; Dreschler, Wouter A

2012-03-01

Two methods of fine tuning the initial settings of hearing aids were compared: An audiologist-driven approach--using real ear measurements and a patient-driven fine-tuning approach--using feedback from real-life situations. The patient-driven fine tuning was conducted by employing the Amplifit(®) II system using audiovideo clips. The audiologist-driven fine tuning was based on the NAL-NL1 prescription rule. Both settings were compared using the same hearing aids in two 6-week trial periods following a randomized blinded cross-over design. After each trial period, the settings were evaluated by insertion-gain measurements. Performance was evaluated by speech tests in quiet, in noise, and in time-reversed speech, presented at 0° and with spatially separated sound sources. Subjective results were evaluated using extensive questionnaires and audiovisual video clips. A total of 73 participants were included. On average, higher gain values were found for the audiologist-driven settings than for the patient-driven settings, especially at 1000 and 2000 Hz. Better objective performance was obtained for the audiologist-driven settings for speech perception in quiet and in time-reversed speech. This was supported by better scores on a number of subjective judgments and in the subjective ratings of video clips. The perception of loud sounds scored higher than when patient-driven, but the overall preference was in favor of the audiologist-driven settings for 67% of the participants.
Utilizing New Audiovisual Resources

ERIC Educational Resources Information Center

Miller, Glen

1975-01-01

The University of Arizona's Agriculture Department has found that video cassette systems and 8 mm films are excellent audiovisual aids to classroom instruction at the high school level in small gasoline engines. Each system is capable of improving the instructional process for motor skill development. (MW)
AUDIOVISUAL HANDBOOK.

ERIC Educational Resources Information Center

JOHNSON, HARRY A.

UNDERGRADUATE AND GRADUATE ACADEMIC OFFERINGS IN THE DEPARTMENT OF AUDIOVISUAL EDUCATION ARE LISTED, AND THE INSERVICE FACULTY TRAINING PROGRAM AND THE EXTENSION AND CONSULTANT SERVICES ARE DESCRIBED. GENERAL SERVICES OFFERED BY THE CENTER ARE A COLLEGE FILM SHOWING SERVICE, A CHILDREN'S THEATRE, A PRODUCTION WORKSHOP, AN EMBOSOGRAF PROCESS,…
Robust inter-subject audiovisual decoding in functional magnetic resonance imaging using high-dimensional regression.

PubMed

Raz, Gal; Svanera, Michele; Singer, Neomi; Gilam, Gadi; Cohen, Maya Bleich; Lin, Tamar; Admon, Roee; Gonen, Tal; Thaler, Avner; Granot, Roni Y; Goebel, Rainer; Benini, Sergio; Valente, Giancarlo

2017-12-01

Major methodological advancements have been recently made in the field of neural decoding, which is concerned with the reconstruction of mental content from neuroimaging measures. However, in the absence of a large-scale examination of the validity of the decoding models across subjects and content, the extent to which these models can be generalized is not clear. This study addresses the challenge of producing generalizable decoding models, which allow the reconstruction of perceived audiovisual features from human magnetic resonance imaging (fMRI) data without prior training of the algorithm on the decoded content. We applied an adapted version of kernel ridge regression combined with temporal optimization on data acquired during film viewing (234 runs) to generate standardized brain models for sound loudness, speech presence, perceived motion, face-to-frame ratio, lightness, and color brightness. The prediction accuracies were tested on data collected from different subjects watching other movies mainly in another scanner. Substantial and significant (Q FDR <0.05) correlations between the reconstructed and the original descriptors were found for the first three features (loudness, speech, and motion) in all of the 9 test movies (R¯=0.62, R¯ = 0.60, R¯ = 0.60, respectively) with high reproducibility of the predictors across subjects. The face ratio model produced significant correlations in 7 out of 8 movies (R¯=0.56). The lightness and brightness models did not show robustness (R¯=0.23, R¯ = 0). Further analysis of additional data (95 runs) indicated that loudness reconstruction veridicality can consistently reveal relevant group differences in musical experience. The findings point to the validity and generalizability of our loudness, speech, motion, and face ratio models for complex cinematic stimuli (as well as for music in the case of loudness). While future research should further validate these models using controlled stimuli and explore the feasibility of extracting more complex models via this method, the reliability of our results indicates the potential usefulness of the approach and the resulting models in basic scientific and diagnostic contexts. Copyright © 2017 Elsevier Inc. All rights reserved.
Neural correlates of audiovisual integration in music reading.

PubMed

Nichols, Emily S; Grahn, Jessica A

2016-10-01

Integration of auditory and visual information is important to both language and music. In the linguistic domain, audiovisual integration alters event-related potentials (ERPs) at early stages of processing (the mismatch negativity (MMN)) as well as later stages (P300(Andres et al., 2011)). However, the role of experience in audiovisual integration is unclear, as reading experience is generally confounded with developmental stage. Here we tested whether audiovisual integration of music appears similar to reading, and how musical experience altered integration. We compared brain responses in musicians and non-musicians on an auditory pitch-interval oddball task that evoked the MMN and P300, while manipulating whether visual pitch-interval information was congruent or incongruent with the auditory information. We predicted that the MMN and P300 would be largest when both auditory and visual stimuli deviated, because audiovisual integration would increase the neural response when the deviants were congruent. The results indicated that scalp topography differed between musicians and non-musicians for both the MMN and P300 response to deviants. Interestingly, musicians' musical training modulated integration of congruent deviants at both early and late stages of processing. We propose that early in the processing stream, visual information may guide interpretation of auditory information, leading to a larger MMN when auditory and visual information mismatch. At later attentional stages, integration of the auditory and visual stimuli leads to a larger P300 amplitude. Thus, experience with musical visual notation shapes the way the brain integrates abstract sound-symbol pairings, suggesting that musicians can indeed inform us about the role of experience in audiovisual integration. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
The System-Wide Effect of Real-Time Audiovisual Feedback and Postevent Debriefing for In-Hospital Cardiac Arrest: The Cardiopulmonary Resuscitation Quality Improvement Initiative*

PubMed Central

Couper, Keith; Kimani, Peter K.; Abella, Benjamin S.; Chilwan, Mehboob; Cooke, Matthew W.; Davies, Robin P.; Field, Richard A.; Gao, Fang; Quinton, Sarah; Stallard, Nigel; Woolley, Sarah

2015-01-01

Objective: To evaluate the effect of implementing real-time audiovisual feedback with and without postevent debriefing on survival and quality of cardiopulmonary resuscitation quality at in-hospital cardiac arrest. Design: A two-phase, multicentre prospective cohort study. Setting: Three UK hospitals, all part of one National Health Service Acute Trust. Patients: One thousand three hundred and ninety-five adult patients who sustained an in-hospital cardiac arrest at the study hospitals and were treated by hospital emergency teams between November 2009 and May 2013. Interventions: During phase 1, quality of cardiopulmonary resuscitation and patient outcomes were measured with no intervention implemented. During phase 2, staff at hospital 1 received real-time audiovisual feedback, whereas staff at hospital 2 received real-time audiovisual feedback supplemented by postevent debriefing. No intervention was implemented at hospital 3 during phase 2. Measurements and Main Results: The primary outcome was return of spontaneous circulation. Secondary endpoints included other patient-focused outcomes, such as survival to hospital discharge, and process-focused outcomes, such as chest compression depth. Random-effect logistic and linear regression models, adjusted for baseline patient characteristics, were used to analyze the effect of the interventions on study outcomes. In comparison with no intervention, neither real-time audiovisual feedback (adjusted odds ratio, 0.62; 95% CI, 0.31–1.22; p = 0.17) nor real-time audiovisual feedback supplemented by postevent debriefing (adjusted odds ratio, 0.65; 95% CI, 0.35–1.21; p = 0.17) was associated with a statistically significant improvement in return of spontaneous circulation or any process-focused outcome. Despite this, there was evidence of a system-wide improvement in phase 2, leading to improvements in return of spontaneous circulation (adjusted odds ratio, 1.87; 95% CI, 1.06–3.30; p = 0.03) and process-focused outcomes. Conclusions: Implementation of real-time audiovisual feedback with or without postevent debriefing did not lead to a measured improvement in patient or process-focused outcomes at individual hospital sites. However, there was an unexplained system-wide improvement in return of spontaneous circulation and process-focused outcomes during the second phase of the study. PMID:26186567
Age-related differences in audiovisual interactions of semantically different stimuli.

PubMed

Viggiano, Maria Pia; Giovannelli, Fabio; Giganti, Fiorenza; Rossi, Arianna; Metitieri, Tiziana; Rebai, Mohamed; Guerrini, Renzo; Cincotta, Massimo

2017-01-01

Converging results have shown that adults benefit from congruent multisensory stimulation in the identification of complex stimuli, whereas the developmental trajectory of the ability to integrate multisensory inputs in children is less well understood. In this study we explored the effects of audiovisual semantic congruency on identification of visually presented stimuli belonging to different categories, using a cross-modal approach. Four groups of children ranging in age from 6 to 13 years and adults were administered an object identification task of visually presented pictures belonging to living and nonliving entities. Stimuli were presented in visual, congruent audiovisual, incongruent audiovisual, and noise conditions. Results showed that children under 12 years of age did not benefit from multisensory presentation in speeding up the identification. In children the incoherent audiovisual condition had an interfering effect, especially for the identification of living things. These data suggest that the facilitating effect of the audiovisual interaction into semantic factors undergoes developmental changes and the consolidation of adult-like processing of multisensory stimuli begins in late childhood. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Boosting pitch encoding with audiovisual interactions in congenital amusia.

PubMed

Albouy, Philippe; Lévêque, Yohana; Hyde, Krista L; Bouchet, Patrick; Tillmann, Barbara; Caclin, Anne

2015-01-01

The combination of information across senses can enhance perception, as revealed for example by decreased reaction times or improved stimulus detection. Interestingly, these facilitatory effects have been shown to be maximal when responses to unisensory modalities are weak. The present study investigated whether audiovisual facilitation can be observed in congenital amusia, a music-specific disorder primarily ascribed to impairments of pitch processing. Amusic individuals and their matched controls performed two tasks. In Task 1, they were required to detect auditory, visual, or audiovisual stimuli as rapidly as possible. In Task 2, they were required to detect as accurately and as rapidly as possible a pitch change within an otherwise monotonic 5-tone sequence that was presented either only auditorily (A condition), or simultaneously with a temporally congruent, but otherwise uninformative visual stimulus (AV condition). Results of Task 1 showed that amusics exhibit typical auditory and visual detection, and typical audiovisual integration capacities: both amusics and controls exhibited shorter response times for audiovisual stimuli than for either auditory stimuli or visual stimuli. Results of Task 2 revealed that both groups benefited from simultaneous uninformative visual stimuli to detect pitch changes: accuracy was higher and response times shorter in the AV condition than in the A condition. The audiovisual improvements of response times were observed for different pitch interval sizes depending on the group. These results suggest that both typical listeners and amusic individuals can benefit from multisensory integration to improve their pitch processing abilities and that this benefit varies as a function of task difficulty. These findings constitute the first step towards the perspective to exploit multisensory paradigms to reduce pitch-related deficits in congenital amusia, notably by suggesting that audiovisual paradigms are effective in an appropriate range of unimodal performance. Copyright © 2014 Elsevier Ltd. All rights reserved.
Audiovisual associations alter the perception of low-level visual motion

PubMed Central

Kafaligonul, Hulusi; Oluk, Can

2015-01-01

Motion perception is a pervasive nature of vision and is affected by both immediate pattern of sensory inputs and prior experiences acquired through associations. Recently, several studies reported that an association can be established quickly between directions of visual motion and static sounds of distinct frequencies. After the association is formed, sounds are able to change the perceived direction of visual motion. To determine whether such rapidly acquired audiovisual associations and their subsequent influences on visual motion perception are dependent on the involvement of higher-order attentive tracking mechanisms, we designed psychophysical experiments using regular and reverse-phi random dot motions isolating low-level pre-attentive motion processing. Our results show that an association between the directions of low-level visual motion and static sounds can be formed and this audiovisual association alters the subsequent perception of low-level visual motion. These findings support the view that audiovisual associations are not restricted to high-level attention based motion system and early-level visual motion processing has some potential role. PMID:25873869
Distinct functional contributions of primary sensory and association areas to audiovisual integration in object categorization.

PubMed

Werner, Sebastian; Noppeney, Uta

2010-02-17

Multisensory interactions have been demonstrated in a distributed neural system encompassing primary sensory and higher-order association areas. However, their distinct functional roles in multisensory integration remain unclear. This functional magnetic resonance imaging study dissociated the functional contributions of three cortical levels to multisensory integration in object categorization. Subjects actively categorized or passively perceived noisy auditory and visual signals emanating from everyday actions with objects. The experiment included two 2 x 2 factorial designs that manipulated either (1) the presence/absence or (2) the informativeness of the sensory inputs. These experimental manipulations revealed three patterns of audiovisual interactions. (1) In primary auditory cortices (PACs), a concurrent visual input increased the stimulus salience by amplifying the auditory response regardless of task-context. Effective connectivity analyses demonstrated that this automatic response amplification is mediated via both direct and indirect [via superior temporal sulcus (STS)] connectivity to visual cortices. (2) In STS and intraparietal sulcus (IPS), audiovisual interactions sustained the integration of higher-order object features and predicted subjects' audiovisual benefits in object categorization. (3) In the left ventrolateral prefrontal cortex (vlPFC), explicit semantic categorization resulted in suppressive audiovisual interactions as an index for multisensory facilitation of semantic retrieval and response selection. In conclusion, multisensory integration emerges at multiple processing stages within the cortical hierarchy. The distinct profiles of audiovisual interactions dissociate audiovisual salience effects in PACs, formation of object representations in STS/IPS and audiovisual facilitation of semantic categorization in vlPFC. Furthermore, in STS/IPS, the profiles of audiovisual interactions were behaviorally relevant and predicted subjects' multisensory benefits in performance accuracy.
Superadditive responses in superior temporal sulcus predict audiovisual benefits in object categorization.

PubMed

Werner, Sebastian; Noppeney, Uta

2010-08-01

Merging information from multiple senses provides a more reliable percept of our environment. Yet, little is known about where and how various sensory features are combined within the cortical hierarchy. Combining functional magnetic resonance imaging and psychophysics, we investigated the neural mechanisms underlying integration of audiovisual object features. Subjects categorized or passively perceived audiovisual object stimuli with the informativeness (i.e., degradation) of the auditory and visual modalities being manipulated factorially. Controlling for low-level integration processes, we show higher level audiovisual integration selectively in the superior temporal sulci (STS) bilaterally. The multisensory interactions were primarily subadditive and even suppressive for intact stimuli but turned into additive effects for degraded stimuli. Consistent with the inverse effectiveness principle, auditory and visual informativeness determine the profile of audiovisual integration in STS similarly to the influence of physical stimulus intensity in the superior colliculus. Importantly, when holding stimulus degradation constant, subjects' audiovisual behavioral benefit predicts their multisensory integration profile in STS: only subjects that benefit from multisensory integration exhibit superadditive interactions, while those that do not benefit show suppressive interactions. In conclusion, superadditive and subadditive integration profiles in STS are functionally relevant and related to behavioral indices of multisensory integration with superadditive interactions mediating successful audiovisual object categorization.
Children with a history of SLI show reduced sensitivity to audiovisual temporal asynchrony: an ERP study.

PubMed

Kaganovich, Natalya; Schumaker, Jennifer; Leonard, Laurence B; Gustafson, Dana; Macias, Danielle

2014-08-01

The authors examined whether school-age children with a history of specific language impairment (H-SLI), their peers with typical development (TD), and adults differ in sensitivity to audiovisual temporal asynchrony and whether such difference stems from the sensory encoding of audiovisual information. Fifteen H-SLI children, 15 TD children, and 15 adults judged whether a flashed explosion-shaped figure and a 2-kHz pure tone occurred simultaneously. The stimuli were presented at 0-, 100-, 200-, 300-, 400-, and 500-ms temporal offsets. This task was combined with EEG recordings. H-SLI children were profoundly less sensitive to temporal separations between auditory and visual modalities compared with their TD peers. Those H-SLI children who performed better at simultaneity judgment also had higher language aptitude. TD children were less accurate than adults, revealing a remarkably prolonged developmental course of the audiovisual temporal discrimination. Analysis of early event-related potential components suggested that poor sensory encoding was not a key factor in H-SLI children's reduced sensitivity to audiovisual asynchrony. Audiovisual temporal discrimination is impaired in H-SLI children and is still immature during mid-childhood in TD children. The present findings highlight the need for further evaluation of the role of atypical audiovisual processing in the development of SLI.
Children with a history of SLI show reduced sensitivity to audiovisual temporal asynchrony: An ERP Study

PubMed Central

Kaganovich, Natalya; Schumaker, Jennifer; Leonard, Laurence B.; Gustafson, Dana; Macias, Danielle

2014-01-01

Purpose We examined whether school-age children with a history of SLI (H-SLI), their typically developing (TD) peers, and adults differ in sensitivity to audiovisual temporal asynchrony and whether such difference stems from the sensory encoding of audiovisual information. Method 15 H-SLI children, 15 TD children, and 15 adults judged whether a flashed explosion-shaped figure and a 2 kHz pure tone occurred simultaneously. The stimuli were presented at 0, 100, 200, 300, 400, and 500 ms temporal offsets. This task was combined with EEG recordings. Results H-SLI children were profoundly less sensitive to temporal separations between auditory and visual modalities compared to their TD peers. Those H-SLI children who performed better at simultaneity judgment also had higher language aptitude. TD children were less accurate than adults, revealing a remarkably prolonged developmental course of the audiovisual temporal discrimination. Analysis of early ERP components suggested that poor sensory encoding was not a key factor in H-SLI children’s reduced sensitivity to audiovisual asynchrony. Conclusions Audiovisual temporal discrimination is impaired in H-SLI children and is still immature during mid-childhood in TD children. The present findings highlight the need for further evaluation of the role of atypical audiovisual processing in the development of SLI. PMID:24686922
The Effect of Conventional and Transparent Surgical Masks on Speech Understanding in Individuals with and without Hearing Loss.

PubMed

Atcherson, Samuel R; Mendel, Lisa Lucks; Baltimore, Wesley J; Patro, Chhayakanta; Lee, Sungmin; Pousson, Monique; Spann, M Joshua

2017-01-01

It is generally well known that speech perception is often improved with integrated audiovisual input whether in quiet or in noise. In many health-care environments, however, conventional surgical masks block visual access to the mouth and obscure other potential facial cues. In addition, these environments can be noisy. Although these masks may not alter the acoustic properties, the presence of noise in addition to the lack of visual input can have a deleterious effect on speech understanding. A transparent ("see-through") surgical mask may help to overcome this issue. To compare the effect of noise and various visual input conditions on speech understanding for listeners with normal hearing (NH) and hearing impairment using different surgical masks. Participants were assigned to one of three groups based on hearing sensitivity in this quasi-experimental, cross-sectional study. A total of 31 adults participated in this study: one talker, ten listeners with NH, ten listeners with moderate sensorineural hearing loss, and ten listeners with severe-to-profound hearing loss. Selected lists from the Connected Speech Test were digitally recorded with and without surgical masks and then presented to the listeners at 65 dB HL in five conditions against a background of four-talker babble (+10 dB SNR): without a mask (auditory only), without a mask (auditory and visual), with a transparent mask (auditory only), with a transparent mask (auditory and visual), and with a paper mask (auditory only). A significant difference was found in the spectral analyses of the speech stimuli with and without the masks; however, no more than ∼2 dB root mean square. Listeners with NH performed consistently well across all conditions. Both groups of listeners with hearing impairment benefitted from visual input from the transparent mask. The magnitude of improvement in speech perception in noise was greatest for the severe-to-profound group. Findings confirm improved speech perception performance in noise for listeners with hearing impairment when visual input is provided using a transparent surgical mask. Most importantly, the use of the transparent mask did not negatively affect speech perception performance in noise. American Academy of Audiology
Reduced orienting to audiovisual synchrony in infancy predicts autism diagnosis at 3 years of age.

PubMed

Falck-Ytter, Terje; Nyström, Pär; Gredebäck, Gustaf; Gliga, Teodora; Bölte, Sven

2018-01-23

Effective multisensory processing develops in infancy and is thought to be important for the perception of unified and multimodal objects and events. Previous research suggests impaired multisensory processing in autism, but its role in the early development of the disorder is yet uncertain. Here, using a prospective longitudinal design, we tested whether reduced visual attention to audiovisual synchrony is an infant marker of later-emerging autism diagnosis. We studied 10-month-old siblings of children with autism using an eye tracking task previously used in studies of preschoolers. The task assessed the effect of manipulations of audiovisual synchrony on viewing patterns while the infants were observing point light displays of biological motion. We analyzed the gaze data recorded in infancy according to diagnostic status at 3 years of age (DSM-5). Ten-month-old infants who later received an autism diagnosis did not orient to audiovisual synchrony expressed within biological motion. In contrast, both infants at low-risk and high-risk siblings without autism at follow-up had a strong preference for this type of information. No group differences were observed in terms of orienting to upright biological motion. This study suggests that reduced orienting to audiovisual synchrony within biological motion is an early sign of autism. The findings support the view that poor multisensory processing could be an important antecedent marker of this neurodevelopmental condition. © 2018 Association for Child and Adolescent Mental Health.
Audio-Visual Situational Awareness for General Aviation Pilots

NASA Technical Reports Server (NTRS)

Spirkovska, Lilly; Lodha, Suresh K.; Clancy, Daniel (Technical Monitor)

2001-01-01

Weather is one of the major causes of general aviation accidents. Researchers are addressing this problem from various perspectives including improving meteorological forecasting techniques, collecting additional weather data automatically via on-board sensors and "flight" modems, and improving weather data dissemination and presentation. We approach the problem from the improved presentation perspective and propose weather visualization and interaction methods tailored for general aviation pilots. Our system, Aviation Weather Data Visualization Environment (AWE), utilizes information visualization techniques, a direct manipulation graphical interface, and a speech-based interface to improve a pilot's situational awareness of relevant weather data. The system design is based on a user study and feedback from pilots.

Cataloging, Processing, Administering AV Materials. A Model for Wisconsin Schools.

ERIC Educational Resources Information Center

Little, Robert D., Ed.

The objective of this cataloging manual is to recommend specific methods for cataloging audiovisual materials for use in individual school media centers. The following types of audiovisual aids are included: educational games, filmstrips, flat graphics, kits, models, motion pictures, realia, records, slides, sound filmstrips, tapes,…
Audio-Visual Aids for Cooperative Education and Training.

ERIC Educational Resources Information Center

Botham, C. N.

Within the context of cooperative education, audiovisual aids may be used for spreading the idea of cooperatives and helping to consolidate study groups; for the continuous process of education, both formal and informal, within the cooperative movement; for constant follow up purposes; and for promoting loyalty to the movement. Detailed…
Rapid recalibration of speech perception after experiencing the McGurk illusion

PubMed Central

Pérez-Bellido, Alexis; de Lange, Floris P.

2018-01-01

The human brain can quickly adapt to changes in the environment. One example is phonetic recalibration: a speech sound is interpreted differently depending on the visual speech and this interpretation persists in the absence of visual information. Here, we examined the mechanisms of phonetic recalibration. Participants categorized the auditory syllables /aba/ and /ada/, which were sometimes preceded by the so-called McGurk stimuli (in which an /aba/ sound, due to visual /aga/ input, is often perceived as ‘ada’). We found that only one trial of exposure to the McGurk illusion was sufficient to induce a recalibration effect, i.e. an auditory /aba/ stimulus was subsequently more often perceived as ‘ada’. Furthermore, phonetic recalibration took place only when auditory and visual inputs were integrated to ‘ada’ (McGurk illusion). Moreover, this recalibration depended on the sensory similarity between the preceding and current auditory stimulus. Finally, signal detection theoretical analysis showed that McGurk-induced phonetic recalibration resulted in both a criterion shift towards /ada/ and a reduced sensitivity to distinguish between /aba/ and /ada/ sounds. The current study shows that phonetic recalibration is dependent on the perceptual integration of audiovisual information and leads to a perceptual shift in phoneme categorization. PMID:29657743
Video self-modeling as a post-treatment fluency recovery strategy for adults.

PubMed

Harasym, Jessica; Langevin, Marilyn; Kully, Deborah

2015-06-01

This multiple-baseline across subjects study investigated the effectiveness of video self-modeling (VSM) in reducing stuttering and bringing about improvements in associated self-report measures. Participants' viewing practices and perceptions of the utility of VSM also were explored. Three adult males who had previously completed speech restructuring treatment viewed VSM recordings twice per week for 6 weeks. Weekly speech data, treatment viewing logs, and pre- and post-treatment self-report measures were obtained. An exit interview also was conducted. Two participants showed a decreasing trend in stuttering frequency. All participants appeared to engage in fewer avoidance behaviors and had less expectations to stutter. All participants perceived that, in different ways, the VSM treatment had benefited them and all participants had unique viewing practices. Given the increasing availability and ease in using portable audio-visual technology, VSM appears to offer an economical and clinically useful tool for clients who are motivated to use the technology to recover fluency. Readers will be able to describe: (a) the tenets of video-self modeling; (b) the main components of video-self modeling as a fluency recovery treatment as used in this study; and (c) speech and self-report outcomes. Copyright © 2015 Elsevier Inc. All rights reserved.
Effects of frequency shifts and visual gender information on vowel category judgments

NASA Astrophysics Data System (ADS)

Glidden, Catherine; Assmann, Peter F.

2003-10-01

Visual morphing techniques were used together with a high-quality vocoder to study the audiovisual contribution of talker gender to the identification of frequency-shifted vowels. A nine-step continuum ranging from ``bit'' to ``bet'' was constructed from natural recorded syllables spoken by an adult female talker. Upward and downward frequency shifts in spectral envelope (scale factors of 0.85 and 1.0) were applied in combination with shifts in fundamental frequency, F0 (scale factors of 0.5 and 1.0). Downward frequency shifts generally resulted in malelike voices whereas upward shifts were perceived as femalelike. Two separate nine-step visual continua from ``bit'' to ``bet'' were also constructed, one from a male face and the other a female face, each producing the end-point words. Each step along the two visual continua was paired with the corresponding step on the acoustic continuum, creating natural audiovisual utterances. Category boundary shifts were found for both acoustic cues (F0 and formant frequency shifts) and visual cues (visual gender). The visual gender effect was larger when acoustic and visual information were matched appropriately. These results suggest that visual information provided by the speech signal plays an important supplemental role in talker normalization.
Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss

PubMed Central

Brooks, Cassandra J.; Chan, Yu Man; Anderson, Andrew J.; McKendrick, Allison M.

2018-01-01

Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information. PMID:29867415
Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss.

PubMed

Brooks, Cassandra J; Chan, Yu Man; Anderson, Andrew J; McKendrick, Allison M

2018-01-01

Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information.
Designing between Pedagogies and Cultures: Audio-Visual Chinese Language Resources for Australian Schools

ERIC Educational Resources Information Center

Yuan, Yifeng; Shen, Huizhong

2016-01-01

This design-based study examines the creation and development of audio-visual Chinese language teaching and learning materials for Australian schools by incorporating users' feedback and content writers' input that emerged in the designing process. Data were collected from workshop feedback of two groups of Chinese-language teachers from primary…
An Annotated Guide to Audio-Visual Materials for Teaching Shakespeare.

ERIC Educational Resources Information Center

Albert, Richard N.

Audio-visual materials, found in a variety of periodicals, catalogs, and reference works, are listed in this guide to expedite the process of finding appropriate classroom materials for a study of William Shakespeare in the classroom. Separate listings of films, filmstrips, and recordings are provided, with subdivisions for "The Plays"…
Dissociating Verbal and Nonverbal Audiovisual Object Processing

ERIC Educational Resources Information Center

Hocking, Julia; Price, Cathy J.

2009-01-01

This fMRI study investigates how audiovisual integration differs for verbal stimuli that can be matched at a phonological level and nonverbal stimuli that can be matched at a semantic level. Subjects were presented simultaneously with one visual and one auditory stimulus and were instructed to decide whether these stimuli referred to the same…
Inactivation of Primate Prefrontal Cortex Impairs Auditory and Audiovisual Working Memory.

PubMed

Plakke, Bethany; Hwang, Jaewon; Romanski, Lizabeth M

2015-07-01

The prefrontal cortex is associated with cognitive functions that include planning, reasoning, decision-making, working memory, and communication. Neurophysiology and neuropsychology studies have established that dorsolateral prefrontal cortex is essential in spatial working memory while the ventral frontal lobe processes language and communication signals. Single-unit recordings in nonhuman primates has shown that ventral prefrontal (VLPFC) neurons integrate face and vocal information and are active during audiovisual working memory. However, whether VLPFC is essential in remembering face and voice information is unknown. We therefore trained nonhuman primates in an audiovisual working memory paradigm using naturalistic face-vocalization movies as memoranda. We inactivated VLPFC, with reversible cortical cooling, and examined performance when faces, vocalizations or both faces and vocalization had to be remembered. We found that VLPFC inactivation impaired subjects' performance in audiovisual and auditory-alone versions of the task. In contrast, VLPFC inactivation did not disrupt visual working memory. Our studies demonstrate the importance of VLPFC in auditory and audiovisual working memory for social stimuli but suggest a different role for VLPFC in unimodal visual processing. The ventral frontal lobe, or inferior frontal gyrus, plays an important role in audiovisual communication in the human brain. Studies with nonhuman primates have found that neurons within ventral prefrontal cortex (VLPFC) encode both faces and vocalizations and that VLPFC is active when animals need to remember these social stimuli. In the present study, we temporarily inactivated VLPFC by cooling the cortex while nonhuman primates performed a working memory task. This impaired the ability of subjects to remember a face and vocalization pair or just the vocalization alone. Our work highlights the importance of the primate VLPFC in the processing of faces and vocalizations in a manner that is similar to the inferior frontal gyrus in the human brain. Copyright © 2015 the authors 0270-6474/15/359666-10$15.00/0.
Statistical learning of multisensory regularities is enhanced in musicians: An MEG study.

PubMed

Paraskevopoulos, Evangelos; Chalas, Nikolas; Kartsidis, Panagiotis; Wollbrink, Andreas; Bamidis, Panagiotis

2018-07-15

The present study used magnetoencephalography (MEG) to identify the neural correlates of audiovisual statistical learning, while disentangling the differential contributions of uni- and multi-modal statistical mismatch responses in humans. The applied paradigm was based on a combination of a statistical learning paradigm and a multisensory oddball one, combining an audiovisual, an auditory and a visual stimulation stream, along with the corresponding deviances. Plasticity effects due to musical expertise were investigated by comparing the behavioral and MEG responses of musicians to non-musicians. The behavioral results indicated that the learning was successful for both musicians and non-musicians. The unimodal MEG responses are consistent with previous studies, revealing the contribution of Heschl's gyrus for the identification of auditory statistical mismatches and the contribution of medial temporal and visual association areas for the visual modality. The cortical network underlying audiovisual statistical learning was found to be partly common and partly distinct from the corresponding unimodal networks, comprising right temporal and left inferior frontal sources. Musicians showed enhanced activation in superior temporal and superior frontal gyrus. Connectivity and information processing flow amongst the sources comprising the cortical network of audiovisual statistical learning, as estimated by transfer entropy, was reorganized in musicians, indicating enhanced top-down processing. This neuroplastic effect showed a cross-modal stability between the auditory and audiovisual modalities. Copyright © 2018 Elsevier Inc. All rights reserved.
Temporal Ventriloquism Reveals Intact Audiovisual Temporal Integration in Amblyopia.

PubMed

Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

2018-02-01

We have shown previously that amblyopia involves impaired detection of asynchrony between auditory and visual events. To distinguish whether this impairment represents a defect in temporal integration or nonintegrative multisensory processing (e.g., cross-modal matching), we used the temporal ventriloquism effect in which visual temporal order judgment (TOJ) is normally enhanced by a lagging auditory click. Participants with amblyopia (n = 9) and normally sighted controls (n = 9) performed a visual TOJ task. Pairs of clicks accompanied the two lights such that the first click preceded the first light, or second click lagged the second light by 100, 200, or 450 ms. Baseline audiovisual synchrony and visual-only conditions also were tested. Within both groups, just noticeable differences for the visual TOJ task were significantly reduced compared with baseline in the 100- and 200-ms click lag conditions. Within the amblyopia group, poorer stereo acuity and poorer visual acuity in the amblyopic eye were significantly associated with greater enhancement in visual TOJ performance in the 200-ms click lag condition. Audiovisual temporal integration is intact in amblyopia, as indicated by perceptual enhancement in the temporal ventriloquism effect. Furthermore, poorer stereo acuity and poorer visual acuity in the amblyopic eye are associated with a widened temporal binding window for the effect. These findings suggest that previously reported abnormalities in audiovisual multisensory processing may result from impaired cross-modal matching rather than a diminished capacity for temporal audiovisual integration.
Sensitivity to Audiovisual Temporal Asynchrony in Children With a History of Specific Language Impairment and Their Peers With Typical Development: A Replication and Follow-Up Study

PubMed Central

2017-01-01

Purpose Earlier, my colleagues and I showed that children with a history of specific language impairment (H-SLI) are significantly less able to detect audiovisual asynchrony compared with children with typical development (TD; Kaganovich & Schumaker, 2014). Here, I first replicate this finding in a new group of children with H-SLI and TD and then examine a relationship among audiovisual function, attention skills, and language in a combined pool of children. Method The stimuli were a pure tone and an explosion-shaped figure. Stimulus onset asynchrony (SOA) varied from 0–500 ms. Children pressed 1 button for perceived synchrony and another for asynchrony. I measured the number of synchronous perceptions at each SOA and calculated children's temporal binding windows. I, then, conducted multiple regressions to determine if audiovisual processing and attention can predict language skills. Results As in the earlier study, children with H-SLI perceived asynchrony significantly less frequently than children with TD at SOAs of 400–500 ms. Their temporal binding windows were also larger. Temporal precision and attention predicted 23%–37% of children's language ability. Conclusions Audiovisual temporal processing is impaired in children with H-SLI. The degree of this impairment is a predictor of language skills. Once understood, the mechanisms underlying this deficit may become a new focus for language remediation. PMID:28715546
Emotional Cues during Simultaneous Face and Voice Processing: Electrophysiological Insights

PubMed Central

Liu, Taosheng; Pinheiro, Ana; Zhao, Zhongxin; Nestor, Paul G.; McCarley, Robert W.; Niznikiewicz, Margaret A.

2012-01-01

Both facial expression and tone of voice represent key signals of emotional communication but their brain processing correlates remain unclear. Accordingly, we constructed a novel implicit emotion recognition task consisting of simultaneously presented human faces and voices with neutral, happy, and angry valence, within the context of recognizing monkey faces and voices task. To investigate the temporal unfolding of the processing of affective information from human face-voice pairings, we recorded event-related potentials (ERPs) to these audiovisual test stimuli in 18 normal healthy subjects; N100, P200, N250, P300 components were observed at electrodes in the frontal-central region, while P100, N170, P270 were observed at electrodes in the parietal-occipital region. Results indicated a significant audiovisual stimulus effect on the amplitudes and latencies of components in frontal-central (P200, P300, and N250) but not the parietal occipital region (P100, N170 and P270). Specifically, P200 and P300 amplitudes were more positive for emotional relative to neutral audiovisual stimuli, irrespective of valence, whereas N250 amplitude was more negative for neutral relative to emotional stimuli. No differentiation was observed between angry and happy conditions. The results suggest that the general effect of emotion on audiovisual processing can emerge as early as 200 msec (P200 peak latency) post stimulus onset, in spite of implicit affective processing task demands, and that such effect is mainly distributed in the frontal-central region. PMID:22383987
Audio-visual presentation of information for informed consent for participation in clinical trials.

PubMed

Synnot, Anneliese; Ryan, Rebecca; Prictor, Megan; Fetherstonhaugh, Deirdre; Parker, Barbara

2014-05-09

Informed consent is a critical component of clinical research. Different methods of presenting information to potential participants of clinical trials may improve the informed consent process. Audio-visual interventions (presented, for example, on the Internet or on DVD) are one such method. We updated a 2008 review of the effects of these interventions for informed consent for trial participation. To assess the effects of audio-visual information interventions regarding informed consent compared with standard information or placebo audio-visual interventions regarding informed consent for potential clinical trial participants, in terms of their understanding, satisfaction, willingness to participate, and anxiety or other psychological distress. We searched: the Cochrane Central Register of Controlled Trials (CENTRAL), The Cochrane Library, issue 6, 2012; MEDLINE (OvidSP) (1946 to 13 June 2012); EMBASE (OvidSP) (1947 to 12 June 2012); PsycINFO (OvidSP) (1806 to June week 1 2012); CINAHL (EbscoHOST) (1981 to 27 June 2012); Current Contents (OvidSP) (1993 Week 27 to 2012 Week 26); and ERIC (Proquest) (searched 27 June 2012). We also searched reference lists of included studies and relevant review articles, and contacted study authors and experts. There were no language restrictions. We included randomised and quasi-randomised controlled trials comparing audio-visual information alone, or in conjunction with standard forms of information provision (such as written or verbal information), with standard forms of information provision or placebo audio-visual information, in the informed consent process for clinical trials. Trials involved individuals or their guardians asked to consider participating in a real or hypothetical clinical study. (In the earlier version of this review we only included studies evaluating informed consent interventions for real studies). Two authors independently assessed studies for inclusion and extracted data. We synthesised the findings using meta-analysis, where possible, and narrative synthesis of results. We assessed the risk of bias of individual studies and considered the impact of the quality of the overall evidence on the strength of the results. We included 16 studies involving data from 1884 participants. Nine studies included participants considering real clinical trials, and eight included participants considering hypothetical clinical trials, with one including both. All studies were conducted in high-income countries.There is still much uncertainty about the effect of audio-visual informed consent interventions on a range of patient outcomes. However, when considered across comparisons, we found low to very low quality evidence that such interventions may slightly improve knowledge or understanding of the parent trial, but may make little or no difference to rate of participation or willingness to participate. Audio-visual presentation of informed consent may improve participant satisfaction with the consent information provided. However its effect on satisfaction with other aspects of the process is not clear. There is insufficient evidence to draw conclusions about anxiety arising from audio-visual informed consent. We found conflicting, very low quality evidence about whether audio-visual interventions took more or less time to administer. No study measured researcher satisfaction with the informed consent process, nor ease of use.The evidence from real clinical trials was rated as low quality for most outcomes, and for hypothetical studies, very low. We note, however, that this was in large part due to poor study reporting, the hypothetical nature of some studies and low participant numbers, rather than inconsistent results between studies or confirmed poor trial quality. We do not believe that any studies were funded by organisations with a vested interest in the results. The value of audio-visual interventions as a tool for helping to enhance the informed consent process for people considering participating in clinical trials remains largely unclear, although trends are emerging with regard to improvements in knowledge and satisfaction. Many relevant outcomes have not been evaluated in randomised trials. Triallists should continue to explore innovative methods of providing information to potential trial participants during the informed consent process, mindful of the range of outcomes that the intervention should be designed to achieve, and balancing the resource implications of intervention development and delivery against the purported benefits of any intervention.More trials, adhering to CONSORT standards, and conducted in settings and populations underserved in this review, i.e. low- and middle-income countries and people with low literacy, would strengthen the results of this review and broaden its applicability. Assessing process measures, such as time taken to administer the intervention and researcher satisfaction, would inform the implementation of audio-visual consent materials.
Theological Media Literacy Education and Hermeneutic Analysis of Soviet Audiovisual Anti-Religious Media Texts in Students' Classroom

ERIC Educational Resources Information Center

Fedorov, Alexander

2015-01-01

This article realized the Russian way of theological media education literacy and hermeneutic analysis of specific examples of Soviet anti-religious audiovisual media texts: a study of the process of interpretation of these media texts, cultural and historical factors influencing the views of the media agency/authors. The hermeneutic analysis…
I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance

PubMed Central

Hantke, Simone; Weninger, Felix; Kurle, Richard; Ringeval, Fabien; Batliner, Anton; Mousa, Amr El-Desoky; Schuller, Björn

2016-01-01

We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient. PMID:27176486
Audio-visual integration through the parallel visual pathways.

PubMed

Kaposvári, Péter; Csete, Gergő; Bognár, Anna; Csibri, Péter; Tóth, Eszter; Szabó, Nikoletta; Vécsei, László; Sáry, Gyula; Tamás Kincses, Zsigmond

2015-10-22

Audio-visual integration has been shown to be present in a wide range of different conditions, some of which are processed through the dorsal, and others through the ventral visual pathway. Whereas neuroimaging studies have revealed integration-related activity in the brain, there has been no imaging study of the possible role of segregated visual streams in audio-visual integration. We set out to determine how the different visual pathways participate in this communication. We investigated how audio-visual integration can be supported through the dorsal and ventral visual pathways during the double flash illusion. Low-contrast and chromatic isoluminant stimuli were used to drive preferably the dorsal and ventral pathways, respectively. In order to identify the anatomical substrates of the audio-visual interaction in the two conditions, the psychophysical results were correlated with the white matter integrity as measured by diffusion tensor imaging.The psychophysiological data revealed a robust double flash illusion in both conditions. A correlation between the psychophysical results and local fractional anisotropy was found in the occipito-parietal white matter in the low-contrast condition, while a similar correlation was found in the infero-temporal white matter in the chromatic isoluminant condition. Our results indicate that both of the parallel visual pathways may play a role in the audio-visual interaction. Copyright © 2015. Published by Elsevier B.V.
Optimal Audiovisual Integration in the Ventriloquism Effect But Pervasive Deficits in Unisensory Spatial Localization in Amblyopia.

PubMed

Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

2018-01-01

Classically understood as a deficit in spatial vision, amblyopia is increasingly recognized to also impair audiovisual multisensory processing. Studies to date, however, have not determined whether the audiovisual abnormalities reflect a failure of multisensory integration, or an optimal strategy in the face of unisensory impairment. We use the ventriloquism effect and the maximum-likelihood estimation (MLE) model of optimal integration to investigate integration of audiovisual spatial information in amblyopia. Participants with unilateral amblyopia (n = 14; mean age 28.8 years; 7 anisometropic, 3 strabismic, 4 mixed mechanism) and visually normal controls (n = 16, mean age 29.2 years) localized brief unimodal auditory, unimodal visual, and bimodal (audiovisual) stimuli during binocular viewing using a location discrimination task. A subset of bimodal trials involved the ventriloquism effect, an illusion in which auditory and visual stimuli originating from different locations are perceived as originating from a single location. Localization precision and bias were determined by psychometric curve fitting, and the observed parameters were compared with predictions from the MLE model. Spatial localization precision was significantly reduced in the amblyopia group compared with the control group for unimodal visual, unimodal auditory, and bimodal stimuli. Analyses of localization precision and bias for bimodal stimuli showed no significant deviations from the MLE model in either the amblyopia group or the control group. Despite pervasive deficits in localization precision for visual, auditory, and audiovisual stimuli, audiovisual integration remains intact and optimal in unilateral amblyopia.

Multistability, cross-modal binding and the additivity of conjoined grouping principles

PubMed Central

Kubovy, Michael; Yu, Minhong

2012-01-01

We present a sceptical view of multimodal multistability—drawing most of our examples from the relation between audition and vision. We begin by summarizing some of the principal ways in which audio-visual binding takes place. We review the evidence that unambiguous stimulation in one modality may affect the perception of a multistable stimulus in another modality. Cross-modal influences of one multistable stimulus on the multistability of another are different: they have occurred only in speech perception. We then argue that the strongest relation between perceptual organization in vision and perceptual organization in audition is likely to be by way of analogous Gestalt laws. We conclude with some general observations about multimodality. PMID:22371617
Audio-visual temporal perception in children with restored hearing.

PubMed

Gori, Monica; Chilosi, Anna; Forli, Francesca; Burr, David

2017-05-01

It is not clear how audio-visual temporal perception develops in children with restored hearing. In this study we measured temporal discrimination thresholds with an audio-visual temporal bisection task in 9 deaf children with restored audition, and 22 typically hearing children. In typically hearing children, audition was more precise than vision, with no gain in multisensory conditions (as previously reported in Gori et al. (2012b)). However, deaf children with restored audition showed similar thresholds for audio and visual thresholds and some evidence of gain in audio-visual temporal multisensory conditions. Interestingly, we found a strong correlation between auditory weighting of multisensory signals and quality of language: patients who gave more weight to audition had better language skills. Similarly, auditory thresholds for the temporal bisection task were also a good predictor of language skills. This result supports the idea that the temporal auditory processing is associated with language development. Copyright © 2017. Published by Elsevier Ltd.
The Development of Attentional Biases for Faces in Infancy: A Developmental Systems Perspective

PubMed Central

Reynolds, Greg D.; Roth, Kelly C.

2018-01-01

We present an integrative review of research and theory on major factors involved in the early development of attentional biases to faces. Research utilizing behavioral, eye-tracking, and neuroscience measures with infant participants as well as comparative research with animal subjects are reviewed. We begin with coverage of research demonstrating the presence of an attentional bias for faces shortly after birth, such as newborn infants’ visual preference for face-like over non-face stimuli. The role of experience and the process of perceptual narrowing in face processing are examined as infants begin to demonstrate enhanced behavioral and neural responsiveness to mother over stranger, female over male, own- over other-race, and native over non-native faces. Next, we cover research on developmental change in infants’ neural responsiveness to faces in multimodal contexts, such as audiovisual speech. We also explore the potential influence of arousal and attention on early perceptual preferences for faces. Lastly, the potential influence of the development of attention systems in the brain on social-cognitive processing is discussed. In conclusion, we interpret the findings under the framework of Developmental Systems Theory, emphasizing the combined and distributed influence of several factors, both internal (e.g., arousal, neural development) and external (e.g., early social experience) to the developing child, in the emergence of attentional biases that lead to enhanced responsiveness and processing of faces commonly encountered in the native environment. PMID:29541043
The redeployment of attention to the mouth of a talking face during the second year of life.

PubMed

Hillairet de Boisferon, Anne; Tift, Amy H; Minar, Nicholas J; Lewkowicz, David J

2018-08-01

Previous studies have found that when monolingual infants are exposed to a talking face speaking in a native language, 8- and 10-month-olds attend more to the talker's mouth, whereas 12-month-olds no longer do so. It has been hypothesized that the attentional focus on the talker's mouth at 8 and 10 months of age reflects reliance on the highly salient audiovisual (AV) speech cues for the acquisition of basic speech forms and that the subsequent decline of attention to the mouth by 12 months of age reflects the emergence of basic native speech expertise. Here, we investigated whether infants may redeploy their attention to the mouth once they fully enter the word-learning phase. To test this possibility, we recorded eye gaze in monolingual English-learning 14- and 18-month-olds while they saw and heard a talker producing an English or Spanish utterance in either an infant-directed (ID) or adult-directed (AD) manner. Results indicated that the 14-month-olds attended more to the talker's mouth than to the eyes when exposed to the ID utterance and that the 18-month-olds attended more to the talker's mouth when exposed to the ID and the AD utterance. These results show that infants redeploy their attention to a talker's mouth when they enter the word acquisition phase and suggest that infants rely on the greater perceptual salience of redundant AV speech cues to acquire their lexicon. Copyright © 2018 Elsevier Inc. All rights reserved.
Role of Audio and Audio-Visual Materials in Enhancing the Learning Process of Health Science Personnel.

ERIC Educational Resources Information Center

Cooper, William

The material presented here is the result of a review of the Technical Development Plan of the National Library of Medicine, made with the object of describing the role of audiovisual materials in medical education, research and service, and particularly in the continuing education of physicians and allied health personnel. A historical background…
Dissociating verbal and nonverbal audiovisual object processing.

PubMed

Hocking, Julia; Price, Cathy J

2009-02-01

This fMRI study investigates how audiovisual integration differs for verbal stimuli that can be matched at a phonological level and nonverbal stimuli that can be matched at a semantic level. Subjects were presented simultaneously with one visual and one auditory stimulus and were instructed to decide whether these stimuli referred to the same object or not. Verbal stimuli were simultaneously presented spoken and written object names, and nonverbal stimuli were photographs of objects simultaneously presented with naturally occurring object sounds. Stimulus differences were controlled by including two further conditions that paired photographs of objects with spoken words and object sounds with written words. Verbal matching, relative to all other conditions, increased activation in a region of the left superior temporal sulcus that has previously been associated with phonological processing. Nonverbal matching, relative to all other conditions, increased activation in a right fusiform region that has previously been associated with structural and conceptual object processing. Thus, we demonstrate how brain activation for audiovisual integration depends on the verbal content of the stimuli, even when stimulus and task processing differences are controlled.
Secure access to patient's health records using SpeechXRays a mutli-channel biometrics platform for user authentication.

PubMed

Spanakis, Emmanouil G; Spanakis, Marios; Karantanas, Apostolos; Marias, Kostas

2016-08-01

The most commonly used method for user authentication in ICT services or systems is the application of identification tools such as passwords or personal identification numbers (PINs). The rapid development in ICT technology regarding smart devices (laptops, tablets and smartphones) has allowed also the advance of hardware components that capture several biometric traits such as fingerprints and voice. These components are aiming among others to overcome weaknesses and flaws of password usage under the prism of improved user authentication with higher level of security, privacy and usability. To this respect, the potential application of biometrics for secure user authentication regarding access in systems with sensitive data (i.e. patient's data from electronic health records) shows great potentials. SpeechXRays aims to provide a user recognition platform based on biometrics of voice acoustics analysis and audio-visual identity verification. Among others, the platform aims to be applied as an authentication tool for medical personnel in order to gain specific access to patient's electronic health records. In this work a short description of SpeechXrays implementation tool regarding eHealth is provided and analyzed. This study explores security and privacy issues, and offers a comprehensive overview of biometrics technology applications in addressing the e-Health security challenges. We present and describe the necessary requirement for an eHealth platform concerning biometric security.
Delayed audiovisual integration of patients with mild cognitive impairment and Alzheimer's disease compared with normal aged controls.

PubMed

Wu, Jinglong; Yang, Jiajia; Yu, Yinghua; Li, Qi; Nakamura, Naoya; Shen, Yong; Ohta, Yasuyuki; Yu, Shengyuan; Abe, Koji

2012-01-01

The human brain can anatomically combine task-relevant information from different sensory pathways to form a unified perception; this process is called multisensory integration. The aim of the present study was to test whether the multisensory integration abilities of patients with mild cognitive impairment (MCI) and Alzheimer's disease (AD) differed from those of normal aged controls (NC). A total of 64 subjects were divided into three groups: NC individuals (n = 24), MCI patients (n = 19), and probable AD patients (n = 21). All of the subjects were asked to perform three separate audiovisual integration tasks and were instructed to press the response key associated with the auditory, visual, or audiovisual stimuli in the three tasks. The accuracy and response time (RT) of each task were measured, and the RTs were analyzed using cumulative distribution functions to observe the audiovisual integration. Our results suggest that the mean RT of patients with AD was significantly longer than those of patients with MCI and NC individuals. Interestingly, we found that patients with both MCI and AD exhibited adequate audiovisual integration, and a greater peak (time bin with the highest percentage of benefit) and broader temporal window (time duration of benefit) of multisensory enhancement were observed. However, the onset time and peak benefit of audiovisual integration in MCI and AD patients occurred significantly later than did those of the NC. This finding indicates that the cognitive functional deficits of patients with MCI and AD contribute to the differences in performance enhancements of audiovisual integration compared with NC.
Delayed Audiovisual Integration of Patients with Mild Cognitive Impairment and Alzheimer’s Disease Compared with Normal Aged Controls

PubMed Central

Wu, Jinglong; Yang, Jiajia; Yu, Yinghua; Li, Qi; Nakamura, Naoya; Shen, Yong; Ohta, Yasuyuki; Yu, Shengyuan; Abe, Koji

2013-01-01

The human brain can anatomically combine task-relevant information from different sensory pathways to form a unified perception; this process is called multisensory integration. The aim of the present study was to test whether the multisensory integration abilities of patients with mild cognitive impairment (MCI) and Alzheimer’s disease (AD) differed from those of normal aged controls (NC). A total of 64 subjects were divided into three groups: NC individuals (n = 24), MCI patients (n = 19), and probable AD patients (n = 21). All of the subjects were asked to perform three separate audiovisual integration tasks and were instructed to press the response key associated with the auditory, visual, or audiovisual stimuli in the three tasks. The accuracy and response time (RT) of each task were measured, and the RTs were analyzed using cumulative distribution functions to observe the audiovisual integration. Our results suggest that the mean RT of patients with AD was significantly longer than those of patients with MCI and NC individuals. Interestingly, we found that patients with both MCI and AD exhibited adequate audiovisual integration, and a greater peak (time bin with the highest percentage of benefit) and broader temporal window (time duration of benefit) of multisensory enhancement were observed. However, the onset time and peak benefit of audiovisual integration in MCI and AD patients occurred significantly later than did those of the NC. This finding indicates that the cognitive functional deficits of patients with MCI and AD contribute to the differences in performance enhancements of audiovisual integration compared with NC. PMID:22810093
Audiovisual focus of attention and its application to Ultra High Definition video compression

NASA Astrophysics Data System (ADS)

Rerabek, Martin; Nemoto, Hiromi; Lee, Jong-Seok; Ebrahimi, Touradj

2014-02-01

Using Focus of Attention (FoA) as a perceptual process in image and video compression belongs to well-known approaches to increase coding efficiency. It has been shown that foveated coding, when compression quality varies across the image according to region of interest, is more efficient than the alternative coding, when all region are compressed in a similar way. However, widespread use of such foveated compression has been prevented due to two main conflicting causes, namely, the complexity and the efficiency of algorithms for FoA detection. One way around these is to use as much information as possible from the scene. Since most video sequences have an associated audio, and moreover, in many cases there is a correlation between the audio and the visual content, audiovisual FoA can improve efficiency of the detection algorithm while remaining of low complexity. This paper discusses a simple yet efficient audiovisual FoA algorithm based on correlation of dynamics between audio and video signal components. Results of audiovisual FoA detection algorithm are subsequently taken into account for foveated coding and compression. This approach is implemented into H.265/HEVC encoder producing a bitstream which is fully compliant to any H.265/HEVC decoder. The influence of audiovisual FoA in the perceived quality of high and ultra-high definition audiovisual sequences is explored and the amount of gain in compression efficiency is analyzed.
Detecting Dementia Through Interactive Computer Avatars

PubMed Central

Adachi, Hiroyoshi; Ukita, Norimichi; Ikeda, Manabu; Kazui, Hiroaki; Kudo, Takashi; Nakamura, Satoshi

2017-01-01

This paper proposes a new approach to automatically detect dementia. Even though some works have detected dementia from speech and language attributes, most have applied detection using picture descriptions, narratives, and cognitive tasks. In this paper, we propose a new computer avatar with spoken dialog functionalities that produces spoken queries based on the mini-mental state examination, the Wechsler memory scale-revised, and other related neuropsychological questions. We recorded the interactive data of spoken dialogues from 29 participants (14 dementia and 15 healthy controls) and extracted various audiovisual features. We tried to predict dementia using audiovisual features and two machine learning algorithms (support vector machines and logistic regression). Here, we show that the support vector machines outperformed logistic regression, and by using the extracted features they classified the participants into two groups with 0.93 detection performance, as measured by the areas under the receiver operating characteristic curve. We also newly identified some contributing features, e.g., gap before speaking, the variations of fundamental frequency, voice quality, and the ratio of smiling. We concluded that our system has the potential to detect dementia through spoken dialog systems and that the system can assist health care workers. In addition, these findings could help medical personnel detect signs of dementia. PMID:29018636
Audio-visual presentation of information for informed consent for participation in clinical trials.

PubMed

Ryan, R E; Prictor, M J; McLaughlin, K J; Hill, S J

2008-01-23

Informed consent is a critical component of clinical research. Different methods of presenting information to potential participants of clinical trials may improve the informed consent process. Audio-visual interventions (presented for example on the Internet, DVD, or video cassette) are one such method. To assess the effects of providing audio-visual information alone, or in conjunction with standard forms of information provision, to potential clinical trial participants in the informed consent process, in terms of their satisfaction, understanding and recall of information about the study, level of anxiety and their decision whether or not to participate. We searched: the Cochrane Consumers and Communication Review Group Specialised Register (searched 20 June 2006); the Cochrane Central Register of Controlled Trials (CENTRAL), The Cochrane Library, issue 2, 2006; MEDLINE (Ovid) (1966 to June week 1 2006); EMBASE (Ovid) (1988 to 2006 week 24); and other databases. We also searched reference lists of included studies and relevant review articles, and contacted study authors and experts. There were no language restrictions. Randomised and quasi-randomised controlled trials comparing audio-visual information alone, or in conjunction with standard forms of information provision (such as written or oral information as usually employed in the particular service setting), with standard forms of information provision alone, in the informed consent process for clinical trials. Trials involved individuals or their guardians asked to participate in a real (not hypothetical) clinical study. Two authors independently assessed studies for inclusion and extracted data. Due to heterogeneity no meta-analysis was possible; we present the findings in a narrative review. We included 4 trials involving data from 511 people. Studies were set in the USA and Canada. Three were randomised controlled trials (RCTs) and the fourth a quasi-randomised trial. Their quality was mixed and results should be interpreted with caution. Considerable uncertainty remains about the effects of audio-visual interventions, compared with standard forms of information provision (such as written or oral information normally used in the particular setting), for use in the process of obtaining informed consent for clinical trials. Audio-visual interventions did not consistently increase participants' levels of knowledge/understanding (assessed in four studies), although one study showed better retention of knowledge amongst intervention recipients. An audio-visual intervention may transiently increase people's willingness to participate in trials (one study), but this was not sustained at two to four weeks post-intervention. Perceived worth of the trial did not appear to be influenced by an audio-visual intervention (one study), but another study suggested that the quality of information disclosed may be enhanced by an audio-visual intervention. Many relevant outcomes including harms were not measured. The heterogeneity in results may reflect the differences in intervention design, content and delivery, the populations studied and the diverse methods of outcome assessment in included studies. The value of audio-visual interventions for people considering participating in clinical trials remains unclear. Evidence is mixed as to whether audio-visual interventions enhance people's knowledge of the trial they are considering entering, and/or the health condition the trial is designed to address; one study showed improved retention of knowledge amongst intervention recipients. The intervention may also have small positive effects on the quality of information disclosed, and may increase willingness to participate in the short-term; however the evidence is weak. There were no data for several primary outcomes, including harms. In the absence of clear results, triallists should continue to explore innovative methods of providing information to potential trial participants. Further research should take the form of high-quality randomised controlled trials, with clear reporting of methods. Studies should conduct content assessment of audio-visual and other innovative interventions for people of differing levels of understanding and education; also for different age and cultural groups. Researchers should assess systematically the effects of different intervention components and delivery characteristics, and should involve consumers in intervention development. Studies should assess additional outcomes relevant to individuals' decisional capacity, using validated tools, including satisfaction; anxiety; and adherence to the subsequent trial protocol.
Electrophysiological correlates of predictive coding of auditory location in the perception of natural audiovisual events.

PubMed

Stekelenburg, Jeroen J; Vroomen, Jean

2012-01-01

In many natural audiovisual events (e.g., a clap of the two hands), the visual signal precedes the sound and thus allows observers to predict when, where, and which sound will occur. Previous studies have reported that there are distinct neural correlates of temporal (when) versus phonetic/semantic (which) content on audiovisual integration. Here we examined the effect of visual prediction of auditory location (where) in audiovisual biological motion stimuli by varying the spatial congruency between the auditory and visual parts. Visual stimuli were presented centrally, whereas auditory stimuli were presented either centrally or at 90° azimuth. Typical sub-additive amplitude reductions (AV - V < A) were found for the auditory N1 and P2 for spatially congruent and incongruent conditions. The new finding is that this N1 suppression was greater for the spatially congruent stimuli. A very early audiovisual interaction was also found at 40-60 ms (P50) in the spatially congruent condition, while no effect of congruency was found on the suppression of the P2. This indicates that visual prediction of auditory location can be coded very early in auditory processing.
Neural time course of visually enhanced echo suppression.

PubMed

Bishop, Christopher W; London, Sam; Miller, Lee M

2012-10-01

Auditory spatial perception plays a critical role in day-to-day communication. For instance, listeners utilize acoustic spatial information to segregate individual talkers into distinct auditory "streams" to improve speech intelligibility. However, spatial localization is an exceedingly difficult task in everyday listening environments with numerous distracting echoes from nearby surfaces, such as walls. Listeners' brains overcome this unique challenge by relying on acoustic timing and, quite surprisingly, visual spatial information to suppress short-latency (1-10 ms) echoes through a process known as "the precedence effect" or "echo suppression." In the present study, we employed electroencephalography (EEG) to investigate the neural time course of echo suppression both with and without the aid of coincident visual stimulation in human listeners. We find that echo suppression is a multistage process initialized during the auditory N1 (70-100 ms) and followed by space-specific suppression mechanisms from 150 to 250 ms. Additionally, we find a robust correlate of listeners' spatial perception (i.e., suppressing or not suppressing the echo) over central electrode sites from 300 to 500 ms. Contrary to our hypothesis, vision's powerful contribution to echo suppression occurs late in processing (250-400 ms), suggesting that vision contributes primarily during late sensory or decision making processes. Together, our findings support growing evidence that echo suppression is a slow, progressive mechanism modifiable by visual influences during late sensory and decision making stages. Furthermore, our findings suggest that audiovisual interactions are not limited to early, sensory-level modulations but extend well into late stages of cortical processing.
Seminario latinoamericano de didactica de los medios audiovisuales (Latin American Seminar on Teaching with Audiovisual Aids).

ERIC Educational Resources Information Center

Eduplan Informa, 1971

1971-01-01

This seminar on the use of audiovisual aids reached several conclusions on the need for and the use of such aids in Latin America. The need for educational innovation in the face of a new society, a new type of communication, and a new vision of man is stressed. A new definition of teaching and learning as a fundamental process of communication is…
Audiovisual integration increases the intentional step synchronization of side-by-side walkers.

PubMed

Noy, Dominic; Mouta, Sandra; Lamas, Joao; Basso, Daniel; Silva, Carlos; Santos, Jorge A

2017-12-01

When people walk side-by-side, they often synchronize their steps. To achieve this, individuals might cross-modally match audiovisual signals from the movements of the partner and kinesthetic, cutaneous, visual and auditory signals from their own movements. Because signals from different sensory systems are processed with noise and asynchronously, the challenge of the CNS is to derive the best estimate based on this conflicting information. This is currently thought to be done by a mechanism operating as a Maximum Likelihood Estimator (MLE). The present work investigated whether audiovisual signals from the partner are integrated according to MLE in order to synchronize steps during walking. Three experiments were conducted in which the sensory cues from a walking partner were virtually simulated. In Experiment 1 seven participants were instructed to synchronize with human-sized Point Light Walkers and/or footstep sounds. Results revealed highest synchronization performance with auditory and audiovisual cues. This was quantified by the time to achieve synchronization and by synchronization variability. However, this auditory dominance effect might have been due to artifacts of the setup. Therefore, in Experiment 2 human-sized virtual mannequins were implemented. Also, audiovisual stimuli were rendered in real-time and thus were synchronous and co-localized. All four participants synchronized best with audiovisual cues. For three of the four participants results point toward their optimal integration consistent with the MLE model. Experiment 3 yielded performance decrements for all three participants when the cues were incongruent. Overall, these findings suggest that individuals might optimally integrate audiovisual cues to synchronize steps during side-by-side walking. Copyright © 2017 Elsevier B.V. All rights reserved.
Facial Speech Gestures: The Relation between Visual Speech Processing, Phonological Awareness, and Developmental Dyslexia in 10-Year-Olds

ERIC Educational Resources Information Center

Schaadt, Gesa; Männel, Claudia; van der Meer, Elke; Pannekamp, Ann; Friederici, Angela D.

2016-01-01

Successful communication in everyday life crucially involves the processing of auditory and visual components of speech. Viewing our interlocutor and processing visual components of speech facilitates speech processing by triggering auditory processing. Auditory phoneme processing, analyzed by event-related brain potentials (ERP), has been shown…
Multisensory integration in complete unawareness: evidence from audiovisual congruency priming.

PubMed

Faivre, Nathan; Mudrik, Liad; Schwartz, Naama; Koch, Christof

2014-11-01

Multisensory integration is thought to require conscious perception. Although previous studies have shown that an invisible stimulus could be integrated with an audible one, none have demonstrated integration of two subliminal stimuli of different modalities. Here, pairs of identical or different audiovisual target letters (the sound /b/ with the written letter "b" or "m," respectively) were preceded by pairs of masked identical or different audiovisual prime digits (the sound /6/ with the written digit "6" or "8," respectively). In three experiments, awareness of the audiovisual digit primes was manipulated, such that participants were either unaware of the visual digit, the auditory digit, or both. Priming of the semantic relations between the auditory and visual digits was found in all experiments. Moreover, a further experiment showed that unconscious multisensory integration was not obtained when participants did not undergo prior conscious training of the task. This suggests that following conscious learning, unconscious processing suffices for multisensory integration. © The Author(s) 2014.
Speech processing using maximum likelihood continuity mapping

DOEpatents

Hogden, John E.

2000-01-01

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.E.

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.