Should visual speech cues (speechreading) be considered when fitting hearing aids?
NASA Astrophysics Data System (ADS)
Grant, Ken
2002-05-01
When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.
Speech identification in noise: Contribution of temporal, spectral, and visual speech cues.
Kim, Jeesun; Davis, Chris; Groot, Christopher
2009-12-01
This study investigated the degree to which two types of reduced auditory signals (cochlear implant simulations) and visual speech cues combined for speech identification. The auditory speech stimuli were filtered to have only amplitude envelope cues or both amplitude envelope and spectral cues and were presented with/without visual speech. In Experiment 1, IEEE sentences were presented in quiet and noise. For in-quiet presentation, speech identification was enhanced by the addition of both spectral and visual speech cues. Due to a ceiling effect, the degree to which these effects combined could not be determined. In noise, these facilitation effects were more marked and were additive. Experiment 2 examined consonant and vowel identification in the context of CVC or VCV syllables presented in noise. For consonants, both spectral and visual speech cues facilitated identification and these effects were additive. For vowels, the effect of combined cues was underadditive, with the effect of spectral cues reduced when presented with visual speech cues. Analysis indicated that without visual speech, spectral cues facilitated the transmission of place information and vowel height, whereas with visual speech, they facilitated lip rounding, with little impact on the transmission of place information.
Preschoolers Benefit From Visually Salient Speech Cues
Holt, Rachael Frush
2015-01-01
Purpose This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method Twelve adults and 27 typically developing 3- and 4-year-old children completed 3 audiovisual (AV) speech integration tasks: matching, discrimination, and recognition. The authors compared AV benefit for visually salient and less visually salient speech discrimination contrasts and assessed the visual saliency of consonant confusions in auditory-only and AV word recognition. Results Four-year-olds and adults demonstrated visual influence on all measures. Three-year-olds demonstrated visual influence on speech discrimination and recognition measures. All groups demonstrated greater AV benefit for the visually salient discrimination contrasts. AV recognition benefit in 4-year-olds and adults depended on the visual saliency of speech sounds. Conclusions Preschoolers can demonstrate AV speech integration. Their AV benefit results from efficient use of visually salient speech cues. Four-year-olds, but not 3-year-olds, used visual phonological knowledge to take advantage of visually salient speech cues, suggesting possible developmental differences in the mechanisms of AV benefit. PMID:25322336
Visual speech influences speech perception immediately but not automatically.
Mitterer, Holger; Reinisch, Eva
2017-02-01
Two experiments examined the time course of the use of auditory and visual speech cues to spoken word recognition using an eye-tracking paradigm. Results of the first experiment showed that the use of visual speech cues from lipreading is reduced if concurrently presented pictures require a division of attentional resources. This reduction was evident even when listeners' eye gaze was on the speaker rather than the (static) pictures. Experiment 2 used a deictic hand gesture to foster attention to the speaker. At the same time, the visual processing load was reduced by keeping the visual display constant over a fixed number of successive trials. Under these conditions, the visual speech cues from lipreading were used. Moreover, the eye-tracking data indicated that visual information was used immediately and even earlier than auditory information. In combination, these data indicate that visual speech cues are not used automatically, but if they are used, they are used immediately.
Visual speech segmentation: using facial cues to locate word boundaries in continuous speech
Mitchel, Aaron D.; Weiss, Daniel J.
2014-01-01
Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition. PMID:25018577
NASA Astrophysics Data System (ADS)
Ramirez, Joshua; Mann, Virginia
2005-08-01
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.
Preschoolers Benefit from Visually Salient Speech Cues
ERIC Educational Resources Information Center
Lalonde, Kaylah; Holt, Rachael Frush
2015-01-01
Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…
ERIC Educational Resources Information Center
Altvater-Mackensen, Nicole; Grossmann, Tobias
2015-01-01
Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…
ten Oever, Sanne; Sack, Alexander T.; Wheat, Katherine L.; Bien, Nina; van Atteveldt, Nienke
2013-01-01
Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception. PMID:23805110
Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke
2013-01-01
Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.
Stacey, Paula C.; Kitterick, Pádraig T.; Morris, Saffron D.; Sumner, Christian J.
2017-01-01
Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker's voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues. PMID:27085797
Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.
Smayda, Kirsten E; Van Engen, Kristin J; Maddox, W Todd; Chandrasekaran, Bharath
2016-01-01
Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35) and thirty-three older adults (ages 60-90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both semantic and visual cues are available to the listener.
Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults
Smayda, Kirsten E.; Van Engen, Kristin J.; Maddox, W. Todd; Chandrasekaran, Bharath
2016-01-01
Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18–35) and thirty-three older adults (ages 60–90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both semantic and visual cues are available to the listener. PMID:27031343
Audio-visual speech perception: a developmental ERP investigation
Knowland, Victoria CP; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael SC
2014-01-01
Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language learning. We therefore explored this at the neural level. The event-related potential (ERP) technique has been used to assess the mechanisms of audio-visual speech perception in adults, with visual cues reliably modulating auditory ERP responses to speech. Previous work has shown congruence-dependent shortening of auditory N1/P2 latency and congruence-independent attenuation of amplitude in the presence of auditory and visual speech signals, compared to auditory alone. The aim of this study was to chart the development of these well-established modulatory effects over mid-to-late childhood. Experiment 1 employed an adult sample to validate a child-friendly stimulus set and paradigm by replicating previously observed effects of N1/P2 amplitude and latency modulation by visual speech cues; it also revealed greater attenuation of component amplitude given incongruent audio-visual stimuli, pointing to a new interpretation of the amplitude modulation effect. Experiment 2 used the same paradigm to map cross-sectional developmental change in these ERP responses between 6 and 11 years of age. The effect of amplitude modulation by visual cues emerged over development, while the effect of latency modulation was stable over the child sample. These data suggest that auditory ERP modulation by visual speech represents separable underlying cognitive processes, some of which show earlier maturation than others over the course of development. PMID:24176002
Cross-Modal Matching of Audio-Visual German and French Fluent Speech in Infancy
Kubicek, Claudia; Hillairet de Boisferon, Anne; Dupierrix, Eve; Pascalis, Olivier; Lœvenbruck, Hélène; Gervain, Judit; Schwarzer, Gudrun
2014-01-01
The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants’ audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matching native as well as non-native audio and visual speech stimuli, whereas 6-month-olds perceived the audio-visual correspondence of native language stimuli only. This suggests that intersensory matching narrows for fluent speech between 4.5 and 6 months of age. In Experiment 2, auditory and visual speech information was presented simultaneously, therefore, providing temporal synchrony cues. Here, 6-month-olds were found to match native as well as non-native speech indicating facilitation of temporal synchrony cues on the intersensory perception of non-native fluent speech. Intriguingly, despite the fact that audio and visual stimuli cohered temporally, 12-month-olds matched the non-native language only. Results were discussed with regard to multisensory perceptual narrowing during the first year of life. PMID:24586651
Audio-Visual Speech Perception: A Developmental ERP Investigation
ERIC Educational Resources Information Center
Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.
2014-01-01
Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…
Reduced efficiency of audiovisual integration for nonnative speech.
Yi, Han-Gyol; Phelps, Jasmine E B; Smiljanic, Rajka; Chandrasekaran, Bharath
2013-11-01
The role of visual cues in native listeners' perception of speech produced by nonnative speakers has not been extensively studied. Native perception of English sentences produced by native English and Korean speakers in audio-only and audiovisual conditions was examined. Korean speakers were rated as more accented in audiovisual than in the audio-only condition. Visual cues enhanced word intelligibility for native English speech but less so for Korean-accented speech. Reduced intelligibility of Korean-accented audiovisual speech was associated with implicit visual biases, suggesting that listener-related factors partially influence the efficiency of audiovisual integration for nonnative speech perception.
Modeling the Development of Audiovisual Cue Integration in Speech Perception
Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.
2017-01-01
Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558
Modeling the Development of Audiovisual Cue Integration in Speech Perception.
Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C
2017-03-21
Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
The contribution of dynamic visual cues to audiovisual speech perception.
Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador
2015-08-01
Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.
Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias
2016-02-01
Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds on their ability to detect mismatches between concurrently presented auditory and visual vowels and related their performance to their productive abilities and later vocabulary size. Results show that infants' ability to detect mismatches between auditory and visually presented vowels differs depending on the vowels involved. Furthermore, infants' sensitivity to mismatches is modulated by their current articulatory knowledge and correlates with their vocabulary size at 12 months of age. This suggests that-aside from infants' ability to match nonnative audiovisual cues (Pons et al., 2009)-their ability to match native auditory and visual cues continues to develop during the first year of life. Our findings point to a potential role of salient vowel cues and productive abilities in the development of audiovisual speech perception, and further indicate a relation between infants' early sensitivity to audiovisual speech cues and their later language development. PsycINFO Database Record (c) 2016 APA, all rights reserved.
Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo
2015-05-01
The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.
Altvater-Mackensen, Nicole; Grossmann, Tobias
2015-01-01
Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential looking paradigm, 44 German 6-month olds' ability to detect mismatches between concurrently presented auditory and visual native vowels was tested. Outcomes were related to mothers' speech style and interactive behavior assessed during free play with their infant, and to infant-specific factors assessed through a questionnaire. Results show that mothers' and infants' social behavior modulated infants' preference for matching audiovisual speech. Moreover, infants' audiovisual speech perception correlated with later vocabulary size, suggesting a lasting effect on language development. © 2014 The Authors. Child Development © 2014 Society for Research in Child Development, Inc.
Miller, Christi W; Stewart, Erin K; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A; Tremblay, Kelly
2017-08-16
This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2 measures of WM were taken: a reading span measure, and Word Auditory Recognition and Recall Measure (Smith, Pichora-Fuller, & Alexander, 2016). Speech recognition was measured with the Multi-Modal Lexical Sentence Test for Adults (Kirk et al., 2012) in steady-state noise and 4-talker babble, with and without visual cues. Testing was under unaided conditions. A linear mixed model revealed visual cues and pure-tone average as the only significant predictors of Multi-Modal Lexical Sentence Test outcomes. Neither WM measure nor noise type showed a significant effect. The contribution of WM in explaining unaided speech recognition in noise was negligible and not influenced by noise type or visual cues. We anticipate that with audibility partially restored by hearing aids, the effects of WM will increase. For clinical practice to be affected, more significant effect sizes are needed.
Electrophysiological evidence for Audio-visuo-lingual speech integration.
Treille, Avril; Vilain, Coriandre; Schwartz, Jean-Luc; Hueber, Thomas; Sato, Marc
2018-01-31
Recent neurophysiological studies demonstrate that audio-visual speech integration partly operates through temporal expectations and speech-specific predictions. From these results, one common view is that the binding of auditory and visual, lipread, speech cues relies on their joint probability and prior associative audio-visual experience. The present EEG study examined whether visual tongue movements integrate with relevant speech sounds, despite little associative audio-visual experience between the two modalities. A second objective was to determine possible similarities and differences of audio-visual speech integration between unusual audio-visuo-lingual and classical audio-visuo-labial modalities. To this aim, participants were presented with auditory, visual, and audio-visual isolated syllables, with the visual presentation related to either a sagittal view of the tongue movements or a facial view of the lip movements of a speaker, with lingual and facial movements previously recorded by an ultrasound imaging system and a video camera. In line with previous EEG studies, our results revealed an amplitude decrease and a latency facilitation of P2 auditory evoked potentials in both audio-visual-lingual and audio-visuo-labial conditions compared to the sum of unimodal conditions. These results argue against the view that auditory and visual speech cues solely integrate based on prior associative audio-visual perceptual experience. Rather, they suggest that dynamic and phonetic informational cues are sharable across sensory modalities, possibly through a cross-modal transfer of implicit articulatory motor knowledge. Copyright © 2017 Elsevier Ltd. All rights reserved.
Discrepant visual speech facilitates covert selective listening in "cocktail party" conditions.
Williams, Jason A
2012-06-01
The presence of congruent visual speech information facilitates the identification of auditory speech, while the addition of incongruent visual speech information often impairs accuracy. This latter arrangement occurs naturally when one is being directly addressed in conversation but listens to a different speaker. Under these conditions, performance may diminish since: (a) one is bereft of the facilitative effects of the corresponding lip motion and (b) one becomes subject to visual distortion by incongruent visual speech; by contrast, speech intelligibility may be improved due to (c) bimodal localization of the central unattended stimulus. Participants were exposed to centrally presented visual and auditory speech while attending to a peripheral speech stream. In some trials, the lip movements of the central visual stimulus matched the unattended speech stream; in others, the lip movements matched the attended peripheral speech. Accuracy for the peripheral stimulus was nearly one standard deviation greater with incongruent visual information, compared to the congruent condition which provided bimodal pattern recognition cues. Likely, the bimodal localization of the central stimulus further differentiated the stimuli and thus facilitated intelligibility. Results are discussed with regard to similar findings in an investigation of the ventriloquist effect, and the relative strength of localization and speech cues in covert listening.
Prediction and constraint in audiovisual speech perception
Peelle, Jonathan E.; Sommers, Mitchell S.
2015-01-01
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. PMID:25890390
Drijvers, Linda; Özyürek, Asli
2017-01-01
This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Twenty participants watched videos of an actress uttering an action verb and completed a free-recall task. The videos were presented in 3 speech conditions (2-band noise-vocoding, 6-band noise-vocoding, clear), 3 multimodal conditions (speech + lips blurred, speech + visible speech, speech + visible speech + gesture), and 2 visual-only conditions (visible speech, visible speech + gesture). Accuracy levels were higher when both visual articulators were present compared with 1 or none. The enhancement effects of (a) visible speech, (b) gestural information on top of visible speech, and (c) both visible speech and iconic gestures were larger in 6-band than 2-band noise-vocoding or visual-only conditions. Gestural enhancement in 2-band noise-vocoding did not differ from gestural enhancement in visual-only conditions. When perceiving degraded speech in a visual context, listeners benefit more from having both visual articulators present compared with 1. This benefit was larger at 6-band than 2-band noise-vocoding, where listeners can benefit from both phonological cues from visible speech and semantic cues from iconic gestures to disambiguate speech.
Ultrasound visual feedback treatment and practice variability for residual speech sound errors
Preston, Jonathan L.; McCabe, Patricia; Rivera-Campos, Ahmed; Whittle, Jessica L.; Landry, Erik; Maas, Edwin
2014-01-01
Purpose The goals were to (1) test the efficacy of a motor-learning based treatment that includes ultrasound visual feedback for individuals with residual speech sound errors, and (2) explore whether the addition of prosodic cueing facilitates speech sound learning. Method A multiple baseline single subject design was used, replicated across 8 participants. For each participant, one sound context was treated with ultrasound plus prosodic cueing for 7 sessions, and another sound context was treated with ultrasound but without prosodic cueing for 7 sessions. Sessions included ultrasound visual feedback as well as non-ultrasound treatment. Word-level probes assessing untreated words were used to evaluate retention and generalization. Results For most participants, increases in accuracy of target sound contexts at the word level were observed with the treatment program regardless of whether prosodic cueing was included. Generalization between onset singletons and clusters was observed, as well as generalization to sentence-level accuracy. There was evidence of retention during post-treatment probes, including at a two-month follow-up. Conclusions A motor-based treatment program that includes ultrasound visual feedback can facilitate learning of speech sounds in individuals with residual speech sound errors. PMID:25087938
Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly
2017-01-01
Purpose This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2 measures of WM were taken: a reading span measure, and Word Auditory Recognition and Recall Measure (Smith, Pichora-Fuller, & Alexander, 2016). Speech recognition was measured with the Multi-Modal Lexical Sentence Test for Adults (Kirk et al., 2012) in steady-state noise and 4-talker babble, with and without visual cues. Testing was under unaided conditions. Results A linear mixed model revealed visual cues and pure-tone average as the only significant predictors of Multi-Modal Lexical Sentence Test outcomes. Neither WM measure nor noise type showed a significant effect. Conclusion The contribution of WM in explaining unaided speech recognition in noise was negligible and not influenced by noise type or visual cues. We anticipate that with audibility partially restored by hearing aids, the effects of WM will increase. For clinical practice to be affected, more significant effect sizes are needed. PMID:28744550
Prediction and constraint in audiovisual speech perception.
Peelle, Jonathan E; Sommers, Mitchell S
2015-07-01
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Speaker Identity Supports Phonetic Category Learning
ERIC Educational Resources Information Center
Mani, Nivedita; Schneider, Signe
2013-01-01
Visual cues from the speaker's face, such as the discriminable mouth movements used to produce speech sounds, improve discrimination of these sounds by adults. The speaker's face, however, provides more information than just the mouth movements used to produce speech--it also provides a visual indexical cue of the identity of the speaker. The…
Crossmodal and Incremental Perception of Audiovisual Cues to Emotional Speech
ERIC Educational Resources Information Center
Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc
2010-01-01
In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests…
van Hoesel, Richard J M
2015-04-01
One of the key benefits of using cochlear implants (CIs) in both ears rather than just one is improved localization. It is likely that in complex listening scenes, improved localization allows bilateral CI users to orient toward talkers to improve signal-to-noise ratios and gain access to visual cues, but to date, that conjecture has not been tested. To obtain an objective measure of that benefit, seven bilateral CI users were assessed for both auditory-only and audio-visual speech intelligibility in noise using a novel dynamic spatial audio-visual test paradigm. For each trial conducted in spatially distributed noise, first, an auditory-only cueing phrase that was spoken by one of four talkers was selected and presented from one of four locations. Shortly afterward, a target sentence was presented that was either audio-visual or, in another test configuration, audio-only and was spoken by the same talker and from the same location as the cueing phrase. During the target presentation, visual distractors were added at other spatial locations. Results showed that in terms of speech reception thresholds (SRTs), the average improvement for bilateral listening over the better performing ear alone was 9 dB for the audio-visual mode, and 3 dB for audition-alone. Comparison of bilateral performance for audio-visual and audition-alone showed that inclusion of visual cues led to an average SRT improvement of 5 dB. For unilateral device use, no such benefit arose, presumably due to the greatly reduced ability to localize the target talker to acquire visual information. The bilateral CI speech intelligibility advantage over the better ear in the present study is much larger than that previously reported for static talker locations and indicates greater everyday speech benefits and improved cost-benefit than estimated to date.
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation
Banks, Briony; Gowen, Emma; Munro, Kevin J.; Adank, Patti
2015-01-01
Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker’s facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants’ eye gaze was recorded to verify that they looked at the speaker’s face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation. PMID:26283946
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.
Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti
2015-01-01
Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.
The relationship between level of autistic traits and local bias in the context of the McGurk effect
Ujiie, Yuta; Asai, Tomohisa; Wakabayashi, Akio
2015-01-01
The McGurk effect is a well-known illustration that demonstrates the influence of visual information on hearing in the context of speech perception. Some studies have reported that individuals with autism spectrum disorder (ASD) display abnormal processing of audio-visual speech integration, while other studies showed contradictory results. Based on the dimensional model of ASD, we administered two analog studies to examine the link between level of autistic traits, as assessed by the Autism Spectrum Quotient (AQ), and the McGurk effect among a sample of university students. In the first experiment, we found that autistic traits correlated negatively with fused (McGurk) responses. Then, we manipulated presentation types of visual stimuli to examine whether the local bias toward visual speech cues modulated individual differences in the McGurk effect. The presentation included four types of visual images, comprising no image, mouth only, mouth and eyes, and full face. The results revealed that global facial information facilitates the influence of visual speech cues on McGurk stimuli. Moreover, individual differences between groups with low and high levels of autistic traits appeared when the full-face visual speech cue with an incongruent voice condition was presented. These results suggest that individual differences in the McGurk effect might be due to a weak ability to process global facial information in individuals with high levels of autistic traits. PMID:26175705
ERIC Educational Resources Information Center
Zaccagnini, Cindy M.; Antia, Shirin D.
1993-01-01
This study of the effects of intensive multisensory speech training on the speech production of a profoundly hearing-impaired child (age nine) found that the addition of Visual Phonics hand cues did not result in speech production gains. All six target phonemes were generalized to new words and maintained after the intervention was discontinued.…
Speech Cues Contribute to Audiovisual Spatial Integration
Bishop, Christopher W.; Miller, Lee M.
2011-01-01
Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral ‘what’ and dorsal ‘where’ pathways. PMID:21909378
ERIC Educational Resources Information Center
Miller, Christi W.; Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly
2017-01-01
Purpose: This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method: Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2…
Seeing the talker's face supports executive processing of speech in steady state noise.
Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary
2013-01-01
Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.
Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline
2017-01-01
We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework of the multimodal nature of human communication. PMID:28424636
How Autism Affects Speech Understanding in Multitalker Environments
2013-10-01
difficult than will typically- developing children. Knowing whether toddlers with ASD have difficulties processing speech in the presence of acoustic...to separate the speech of different talkers than do their typically- developing peers. We also predict that they will fail to exploit visual cues on...learn language from many settings in which children are typically placed. In addition, one of the cues that typically- developing listeners use to
The Function of Consciousness in Multisensory Integration
ERIC Educational Resources Information Center
Palmer, Terry D.; Ramsey, Ashley K.
2012-01-01
The function of consciousness was explored in two contexts of audio-visual speech, cross-modal visual attention guidance and McGurk cross-modal integration. Experiments 1, 2, and 3 utilized a novel cueing paradigm in which two different flash suppressed lip-streams cooccured with speech sounds matching one of these streams. A visual target was…
Peeters, David; Snijders, Tineke M; Hagoort, Peter; Özyürek, Aslı
2017-01-27
In everyday communication speakers often refer in speech and/or gesture to objects in their immediate environment, thereby shifting their addressee's attention to an intended referent. The neurobiological infrastructure involved in the comprehension of such basic multimodal communicative acts remains unclear. In an event-related fMRI study, we presented participants with pictures of a speaker and two objects while they concurrently listened to her speech. In each picture, one of the objects was singled out, either through the speaker's index-finger pointing gesture or through a visual cue that made the object perceptually more salient in the absence of gesture. A mismatch (compared to a match) between speech and the object singled out by the speaker's pointing gesture led to enhanced activation in left IFG and bilateral pMTG, showing the importance of these areas in conceptual matching between speech and referent. Moreover, a match (compared to a mismatch) between speech and the object made salient through a visual cue led to enhanced activation in the mentalizing system, arguably reflecting an attempt to converge on a jointly attended referent in the absence of pointing. These findings shed new light on the neurobiological underpinnings of the core communicative process of comprehending a speaker's multimodal referential act and stress the power of pointing as an important natural device to link speech to objects. Copyright © 2016 Elsevier Ltd. All rights reserved.
Cue Integration in Categorical Tasks: Insights from Audio-Visual Speech Perception
Bejjanki, Vikranth Rao; Clayards, Meghan; Knill, David C.; Aslin, Richard N.
2011-01-01
Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks. PMID:21637344
Enhancing Speech Intelligibility: Interactions among Context, Modality, Speech Style, and Masker
ERIC Educational Resources Information Center
Van Engen, Kristin J.; Phelps, Jasmine E. B.; Smiljanic, Rajka; Chandrasekaran, Bharath
2014-01-01
Purpose: The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method: Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous…
Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker
2016-06-17
The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. © The Author(s) 2016.
Sheffield, Benjamin M; Schuchman, Gerald; Bernstein, Joshua G W
2015-01-01
As cochlear implant (CI) acceptance increases and candidacy criteria are expanded, these devices are increasingly recommended for individuals with less than profound hearing loss. As a result, many individuals who receive a CI also retain acoustic hearing, often in the low frequencies, in the nonimplanted ear (i.e., bimodal hearing) and in some cases in the implanted ear (i.e., hybrid hearing) which can enhance the performance achieved by the CI alone. However, guidelines for clinical decisions pertaining to cochlear implantation are largely based on expectations for postsurgical speech-reception performance with the CI alone in auditory-only conditions. A more comprehensive prediction of postimplant performance would include the expected effects of residual acoustic hearing and visual cues on speech understanding. An evaluation of auditory-visual performance might be particularly important because of the complementary interaction between the speech information relayed by visual cues and that contained in the low-frequency auditory signal. The goal of this study was to characterize the benefit provided by residual acoustic hearing to consonant identification under auditory-alone and auditory-visual conditions for CI users. Additional information regarding the expected role of residual hearing in overall communication performance by a CI listener could potentially lead to more informed decisions regarding cochlear implantation, particularly with respect to recommendations for or against bilateral implantation for an individual who is functioning bimodally. Eleven adults 23 to 75 years old with a unilateral CI and air-conduction thresholds in the nonimplanted ear equal to or better than 80 dB HL for at least one octave frequency between 250 and 1000 Hz participated in this study. Consonant identification was measured for conditions involving combinations of electric hearing (via the CI), acoustic hearing (via the nonimplanted ear), and speechreading (visual cues). The results suggest that the benefit to CI consonant-identification performance provided by the residual acoustic hearing is even greater when visual cues are also present. An analysis of consonant confusions suggests that this is because the voicing cues provided by the residual acoustic hearing are highly complementary with the mainly place-of-articulation cues provided by the visual stimulus. These findings highlight the need for a comprehensive prediction of trimodal (acoustic, electric, and visual) postimplant speech-reception performance to inform implantation decisions. The increased influence of residual acoustic hearing under auditory-visual conditions should be taken into account when considering surgical procedures or devices that are intended to preserve acoustic hearing in the implanted ear. This is particularly relevant when evaluating the candidacy of a current bimodal CI user for a second CI (i.e., bilateral implantation). Although recent developments in CI technology and surgical techniques have increased the likelihood of preserving residual acoustic hearing, preservation cannot be guaranteed in each individual case. Therefore, the potential gain to be derived from bilateral implantation needs to be weighed against the possible loss of the benefit provided by residual acoustic hearing.
Visually-guided attention enhances target identification in a complex auditory scene.
Best, Virginia; Ozmeral, Erol J; Shinn-Cunningham, Barbara G
2007-06-01
In auditory scenes containing many similar sound sources, sorting of acoustic information into streams becomes difficult, which can lead to disruptions in the identification of behaviorally relevant targets. This study investigated the benefit of providing simple visual cues for when and/or where a target would occur in a complex acoustic mixture. Importantly, the visual cues provided no information about the target content. In separate experiments, human subjects either identified learned birdsongs in the presence of a chorus of unlearned songs or recalled strings of spoken digits in the presence of speech maskers. A visual cue indicating which loudspeaker (from an array of five) would contain the target improved accuracy for both kinds of stimuli. A cue indicating which time segment (out of a possible five) would contain the target also improved accuracy, but much more for birdsong than for speech. These results suggest that in real world situations, information about where a target of interest is located can enhance its identification, while information about when to listen can also be helpful when targets are unfamiliar or extremely similar to their competitors.
Visually-guided Attention Enhances Target Identification in a Complex Auditory Scene
Ozmeral, Erol J.; Shinn-Cunningham, Barbara G.
2007-01-01
In auditory scenes containing many similar sound sources, sorting of acoustic information into streams becomes difficult, which can lead to disruptions in the identification of behaviorally relevant targets. This study investigated the benefit of providing simple visual cues for when and/or where a target would occur in a complex acoustic mixture. Importantly, the visual cues provided no information about the target content. In separate experiments, human subjects either identified learned birdsongs in the presence of a chorus of unlearned songs or recalled strings of spoken digits in the presence of speech maskers. A visual cue indicating which loudspeaker (from an array of five) would contain the target improved accuracy for both kinds of stimuli. A cue indicating which time segment (out of a possible five) would contain the target also improved accuracy, but much more for birdsong than for speech. These results suggest that in real world situations, information about where a target of interest is located can enhance its identification, while information about when to listen can also be helpful when targets are unfamiliar or extremely similar to their competitors. PMID:17453308
Effects of sensorineural hearing loss on visually guided attention in a multitalker environment.
Best, Virginia; Marrone, Nicole; Mason, Christine R; Kidd, Gerald; Shinn-Cunningham, Barbara G
2009-03-01
This study asked whether or not listeners with sensorineural hearing loss have an impaired ability to use top-down attention to enhance speech intelligibility in the presence of interfering talkers. Listeners were presented with a target string of spoken digits embedded in a mixture of five spatially separated speech streams. The benefit of providing simple visual cues indicating when and/or where the target would occur was measured in listeners with hearing loss, listeners with normal hearing, and a control group of listeners with normal hearing who were tested at a lower target-to-masker ratio to equate their baseline (no cue) performance with the hearing-loss group. All groups received robust benefits from the visual cues. The magnitude of the spatial-cue benefit, however, was significantly smaller in listeners with hearing loss. Results suggest that reduced utility of selective attention for resolving competition between simultaneous sounds contributes to the communication difficulties experienced by listeners with hearing loss in everyday listening situations.
Cued American English: A Variety in the Visual Mode
ERIC Educational Resources Information Center
Portolano, Marlana
2008-01-01
Cued American English (CAE) is a visual variety of English derived from a mode of communication called Cued Speech (CS). CS, or cueing, is a system of communication for use with the deaf, which consists of hand shapes, hand placements, and mouth shapes that signify the phonemic information conventionally conveyed through speech in spoken…
I can see what you are saying: Auditory labels reduce visual search times.
Cho, Kit W
2016-10-01
The present study explored the self-directed-speech effect, the finding that relative to silent reading of a label (e.g., DOG), saying it aloud reduces visual search reaction times (RTs) for locating a target picture among distractors. Experiment 1 examined whether this effect is due to a confound in the differences in the number of cues in self-directed speech (two) vs. silent reading (one) and tested whether self-articulation is required for the effect. The results showed that self-articulation is not required and that merely hearing the auditory label reduces visual search RTs relative to silent reading. This finding also rules out the number of cues confound. Experiment 2 examined whether hearing an auditory label activates more prototypical features of the label's referent and whether the auditory-label benefit is moderated by the target's imagery concordance (the degree to which the target picture matches the mental picture that is activated by a written label for the target). When the target imagery concordance was high, RTs following the presentation of a high prototypicality picture or auditory cue were comparable and shorter than RTs following a visual label or low prototypicality picture cue. However, when the target imagery concordance was low, RTs following an auditory cue were shorter than the comparable RTs following the picture cues and visual-label cue. The results suggest that an auditory label activates both prototypical and atypical features of a concept and can facilitate visual search RTs even when compared to picture primes. Copyright © 2016 Elsevier B.V. All rights reserved.
Direction of attentional focus in biofeedback treatment for /r/ misarticulation.
McAllister Byun, Tara; Swartz, Michelle T; Halpin, Peter F; Szeredi, Daniel; Maas, Edwin
2016-07-01
Maintaining an external direction of focus during practice is reported to facilitate acquisition of non-speech motor skills, but it is not known whether these findings also apply to treatment for speech errors. This question has particular relevance for treatment incorporating visual biofeedback, where clinician cueing can direct the learner's attention either internally (i.e., to the movements of the articulators) or externally (i.e., to the visual biofeedback display). This study addressed two objectives. First, it aimed to use single-subject experimental methods to collect additional evidence regarding the efficacy of visual-acoustic biofeedback treatment for children with /r/ misarticulation. Second, it compared the efficacy of this biofeedback intervention under two cueing conditions. In the external focus (EF) condition, participants' attention was directed exclusively to the external biofeedback display. In the internal focus (IF) condition, participants viewed a biofeedback display, but they also received articulatory cues encouraging an internal direction of attentional focus. Nine school-aged children were pseudo-randomly assigned to receive either IF or EF cues during 8 weeks of visual-acoustic biofeedback intervention. Accuracy in /r/ production at the word level was probed in three to five pre-treatment baseline sessions and in three post-treatment maintenance sessions. Outcomes were assessed using visual inspection and calculation of effect sizes for individual treatment trajectories. In addition, a mixed logistic model was used to examine across-subjects effects including phase (pre/post-treatment), /r/ variant (treated/untreated), and focus cue condition (internal/external). Six out of nine participants showed sustained improvement on at least one treated /r/ variant; these six participants were evenly divided across EF and IF treatment groups. Regression results indicated that /r/ productions were significantly more likely to be rated accurate post- than pre-treatment. Internal versus external direction of focus cues was not a significant predictor of accuracy, nor did it interact significantly with other predictors. The results are consistent with previous literature reporting that visual-acoustic biofeedback can produce measurable treatment gains in children who have not responded to previous intervention. These findings are also in keeping with previous research suggesting that biofeedback may be sufficient to establish an external attentional focus, independent of verbal cues provided. The finding that explicit articulator placement cues were not necessary for progress in treatment has implications for intervention practices for speech-sound disorders in children. © 2016 Royal College of Speech and Language Therapists.
EFFECTS AND INTERACTIONS OF AUDITORY AND VISUAL CUES IN ORAL COMMUNICATION.
ERIC Educational Resources Information Center
KEYS, JOHN W.; AND OTHERS
VISUAL AND AUDITORY CUES WERE TESTED, SEPARATELY AND JOINTLY, TO DETERMINE THE DEGREE OF THEIR CONTRIBUTION TO IMPROVING OVERALL SPEECH SKILLS OF THE AURALLY HANDICAPPED. EIGHT SOUND INTENSITY LEVELS (FROM 6 TO 15 DECIBELS) WERE USED IN PRESENTING PHONETICALLY BALANCED WORD LISTS AND MULTIPLE-CHOICE INTELLIGIBILITY LISTS TO A SAMPLE OF 24…
Visual Speech Perception in Children with Language Learning Impairments
ERIC Educational Resources Information Center
Knowland, Victoria C. P.; Evans, Sam; Snell, Caroline; Rosen, Stuart
2016-01-01
Purpose: The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method: In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with…
Mismatch Negativity with Visual-only and Audiovisual Speech
Ponton, Curtis W.; Bernstein, Lynne E.; Auer, Edward T.
2009-01-01
The functional organization of cortical speech processing is thought to be hierarchical, increasing in complexity and proceeding from primary sensory areas centrifugally. The current study used the mismatch negativity (MMN) obtained with electrophysiology (EEG) to investigate the early latency period of visual speech processing under both visual-only (VO) and audiovisual (AV) conditions. Current density reconstruction (CDR) methods were used to model the cortical MMN generator locations. MMNs were obtained with VO and AV speech stimuli at early latencies (approximately 82-87 ms peak in time waveforms relative to the acoustic onset) and in regions of the right lateral temporal and parietal cortices. Latencies were consistent with bottom-up processing of the visible stimuli. We suggest that a visual pathway extracts phonetic cues from visible speech, and that previously reported effects of AV speech in classical early auditory areas, given later reported latencies, could be attributable to modulatory feedback from visual phonetic processing. PMID:19404730
Cortical activity during cued picture naming predicts individual differences in stuttering frequency
Mock, Jeffrey R.; Foundas, Anne L.; Golob, Edward J.
2016-01-01
Objective Developmental stuttering is characterized by fluent speech punctuated by stuttering events, the frequency of which varies among individuals and contexts. Most stuttering events occur at the beginning of an utterance, suggesting neural dynamics associated with stuttering may be evident during speech preparation. Methods This study used EEG to measure cortical activity during speech preparation in men who stutter, and compared the EEG measures to individual differences in stuttering rate as well as to a fluent control group. Each trial contained a cue followed by an acoustic probe at one of two onset times (early or late), and then a picture. There were two conditions: a speech condition where cues induced speech preparation of the picture’s name and a control condition that minimized speech preparation. Results Across conditions stuttering frequency correlated to cue-related EEG beta power and auditory ERP slow waves from early onset acoustic probes. Conclusions The findings reveal two new cortical markers of stuttering frequency that were present in both conditions, manifest at different times, are elicited by different stimuli (visual cue, auditory probe), and have different EEG responses (beta power, ERP slow wave). Significance The cue-target paradigm evoked brain responses that correlated to pre-experimental stuttering rate. PMID:27472545
Mock, Jeffrey R; Foundas, Anne L; Golob, Edward J
2016-09-01
Developmental stuttering is characterized by fluent speech punctuated by stuttering events, the frequency of which varies among individuals and contexts. Most stuttering events occur at the beginning of an utterance, suggesting neural dynamics associated with stuttering may be evident during speech preparation. This study used EEG to measure cortical activity during speech preparation in men who stutter, and compared the EEG measures to individual differences in stuttering rate as well as to a fluent control group. Each trial contained a cue followed by an acoustic probe at one of two onset times (early or late), and then a picture. There were two conditions: a speech condition where cues induced speech preparation of the picture's name and a control condition that minimized speech preparation. Across conditions stuttering frequency correlated to cue-related EEG beta power and auditory ERP slow waves from early onset acoustic probes. The findings reveal two new cortical markers of stuttering frequency that were present in both conditions, manifest at different times, are elicited by different stimuli (visual cue, auditory probe), and have different EEG responses (beta power, ERP slow wave). The cue-target paradigm evoked brain responses that correlated to pre-experimental stuttering rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Bernstein, Lynne E.; Jiang, Jintao; Pantazis, Dimitrios; Lu, Zhong-Lin; Joshi, Anand
2011-01-01
The talking face affords multiple types of information. To isolate cortical sites with responsibility for integrating linguistically relevant visual speech cues, speech and non-speech face gestures were presented in natural video and point-light displays during fMRI scanning at 3.0T. Participants with normal hearing viewed the stimuli and also viewed localizers for the fusiform face area (FFA), the lateral occipital complex (LOC), and the visual motion (V5/MT) regions of interest (ROIs). The FFA, the LOC, and V5/MT were significantly less activated for speech relative to non-speech and control stimuli. Distinct activation of the posterior superior temporal sulcus and the adjacent middle temporal gyrus to speech, independent of media, was obtained in group analyses. Individual analyses showed that speech and non-speech stimuli were associated with adjacent but different activations, with the speech activations more anterior. We suggest that the speech activation area is the temporal visual speech area (TVSA), and that it can be localized with the combination of stimuli used in this study. PMID:20853377
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.
Borrie, Stephanie A
2015-03-01
This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Getzmann, Stephan; Wascher, Edmund
2017-02-01
Speech understanding in the presence of concurring sound is a major challenge especially for older persons. In particular, conversational turn-takings usually result in switch costs, as indicated by declined speech perception after changes in the relevant target talker. Here, we investigated whether visual cues indicating the future position of a target talker may reduce the costs of switching in younger and older adults. We employed a speech perception task, in which sequences of short words were simultaneously presented by three talkers, and analysed behavioural measures and event-related potentials (ERPs). Informative cues resulted in increased performance after a spatial change in target talker compared to uninformative cues, not indicating the future target position. Especially the older participants benefited from knowing the future target position in advance, indicated by reduced response times after informative cues. The ERP analysis revealed an overall reduced N2, and a reduced P3b to changes in the target talker location in older participants, suggesting reduced inhibitory control and context updating. On the other hand, a pronounced frontal late positive complex (f-LPC) to the informative cues indicated increased allocation of attentional resources to changes in target talker in the older group, in line with the decline-compensation hypothesis. Thus, knowing where to listen has the potential to compensate for age-related decline in attentional switching in a highly variable cocktail-party environment. Copyright © 2016 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias
2016-01-01
Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…
ERIC Educational Resources Information Center
Fraser, Sarah; Gagne, Jean-Pierre; Alepins, Majolaine; Dubois, Pascale
2010-01-01
Purpose: Using a dual-task paradigm, 2 experiments (Experiments 1 and 2) were conducted to assess differences in the amount of listening effort expended to understand speech in noise in audiovisual (AV) and audio-only (A-only) modalities. Experiment 1 had equivalent noise levels in both modalities, and Experiment 2 equated speech recognition…
2017-01-01
Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of the pSTS. Different pSTS regions prefer visually presented faces containing either a moving mouth or moving eyes, but only mouth-preferring regions respond strongly to voices. PMID:28179553
Zhu, Lin L; Beauchamp, Michael S
2017-03-08
Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of the pSTS. Different pSTS regions prefer visually presented faces containing either a moving mouth or moving eyes, but only mouth-preferring regions respond strongly to voices. Copyright © 2017 the authors 0270-6474/17/372697-12$15.00/0.
Visual form predictions facilitate auditory processing at the N1.
Paris, Tim; Kim, Jeesun; Davis, Chris
2017-02-20
Auditory-visual (AV) events often involve a leading visual cue (e.g. auditory-visual speech) that allows the perceiver to generate predictions about the upcoming auditory event. Electrophysiological evidence suggests that when an auditory event is predicted, processing is sped up, i.e., the N1 component of the ERP occurs earlier (N1 facilitation). However, it is not clear (1) whether N1 facilitation is based specifically on predictive rather than multisensory integration and (2) which particular properties of the visual cue it is based on. The current experiment used artificial AV stimuli in which visual cues predicted but did not co-occur with auditory cues. Visual form cues (high and low salience) and the auditory-visual pairing were manipulated so that auditory predictions could be based on form and timing or on timing only. The results showed that N1 facilitation occurred only for combined form and temporal predictions. These results suggest that faster auditory processing (as indicated by N1 facilitation) is based on predictive processing generated by a visual cue that clearly predicts both what and when the auditory stimulus will occur. Copyright © 2016. Published by Elsevier Ltd.
ERIC Educational Resources Information Center
Moradi, Shahram; Lidestam, Bjorn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Ronnberg, Jerker
2017-01-01
Purpose: We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels--in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands--in listeners with hearing impairment using hearing aids. Method: The study comprised 199…
Differential Gaze Patterns on Eyes and Mouth During Audiovisual Speech Segmentation
Lusk, Laina G.; Mitchel, Aaron D.
2016-01-01
Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel and Weiss, 2014) established that adults can utilize facial cues (i.e., visual prosody) to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014). Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation. PMID:26869959
ERIC Educational Resources Information Center
Ferati, Mexhid Adem
2012-01-01
To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…
Alm, Magnus; Behne, Dawn
2015-01-01
Gender and age have been found to affect adults’ audio-visual (AV) speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20–30 years) and middle-aged adults (50–60 years) with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy toward more visually dominated responses. PMID:26236274
What's in a Face? Visual Contributions to Speech Segmentation
ERIC Educational Resources Information Center
Mitchel, Aaron D.; Weiss, Daniel J.
2010-01-01
Recent research has demonstrated that adults successfully segment two interleaved artificial speech streams with incongruent statistics (i.e., streams whose combined statistics are noisier than the encapsulated statistics) only when provided with an indexical cue of speaker voice. In a series of five experiments, our study explores whether…
Visual stimuli in intervention approaches for pre-schoolers diagnosed with phonological delay.
Pedro, Cassandra Ferreira; Lousada, Marisa; Hall, Andreia; Jesus, Luis M T
2018-04-01
The aim of this study was to develop and content validate specific speech and language intervention picture cards: The Letter-Sound (L&S) cards. The present study was also focused on assessing the influence of these cards on letter-sound correspondences and speech sound production. An expert panel of six speech and language therapists analysed and discussed the L&S cards based on several criteria previously established. A Speech and Language Therapist carried out a 6-week therapeutic intervention with a group of seven Portuguese phonologically delayed pre-schoolers aged 5;3 to 6;5. The modified Bland-Altman method revealed good agreement among evaluators, that is the majority of the values was between the agreement limits. Additional outcome measures were collected before and after the therapeutic intervention process. Results indicate that the L&S cards facilitate the acquisition of letter-sound correspondences. Regarding speech sound production, some improvements were also observed at word level. The L&S cards are therefore likely to give phonetic cues, which are crucial for the correct production of therapeutic targets. These visual cues seemed to have helped children with phonological delay develop the above-mentioned skills.
[Intermodal timing cues for audio-visual speech recognition].
Hashimoto, Masahiro; Kumashiro, Masaharu
2004-06-01
The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.
High visual resolution matters in audiovisual speech perception, but only for some.
Alsius, Agnès; Wayne, Rachel V; Paré, Martin; Munhall, Kevin G
2016-07-01
The basis for individual differences in the degree to which visual speech input enhances comprehension of acoustically degraded speech is largely unknown. Previous research indicates that fine facial detail is not critical for visual enhancement when auditory information is available; however, these studies did not examine individual differences in ability to make use of fine facial detail in relation to audiovisual speech perception ability. Here, we compare participants based on their ability to benefit from visual speech information in the presence of an auditory signal degraded with noise, modulating the resolution of the visual signal through low-pass spatial frequency filtering and monitoring gaze behavior. Participants who benefited most from the addition of visual information (high visual gain) were more adversely affected by the removal of high spatial frequency information, compared to participants with low visual gain, for materials with both poor and rich contextual cues (i.e., words and sentences, respectively). Differences as a function of gaze behavior between participants with the highest and lowest visual gains were observed only for words, with participants with the highest visual gain fixating longer on the mouth region. Our results indicate that the individual variance in audiovisual speech in noise performance can be accounted for, in part, by better use of fine facial detail information extracted from the visual signal and increased fixation on mouth regions for short stimuli. Thus, for some, audiovisual speech perception may suffer when the visual input (in addition to the auditory signal) is less than perfect.
Direction of Attentional Focus in Biofeedback Treatment for /R/ Misarticulation
ERIC Educational Resources Information Center
McAllister Byun, Tara; Swartz, Michelle T.; Halpin, Peter F.; Szeredi, Daniel; Maas, Edwin
2016-01-01
Background: Maintaining an external direction of focus during practice is reported to facilitate acquisition of non-speech motor skills, but it is not known whether these findings also apply to treatment for speech errors. This question has particular relevance for treatment incorporating visual biofeedback, where clinician cueing can direct the…
Magnotti, John F; Beauchamp, Michael S
2017-02-01
Audiovisual speech integration combines information from auditory speech (talker's voice) and visual speech (talker's mouth movements) to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga), that are integrated to produce a fused percept ("da"). This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba). We describe a simplified model of causal inference in multisensory speech perception (CIMS) that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.
Mapping and Manipulating Facial Expression
ERIC Educational Resources Information Center
Theobald, Barry-John; Matthews, Iain; Mangini, Michael; Spies, Jeffrey R.; Brick, Timothy R.; Cohn, Jeffrey F.; Boker, Steven M.
2009-01-01
Nonverbal visual cues accompany speech to supplement the meaning of spoken words, signify emotional state, indicate position in discourse, and provide back-channel feedback. This visual information includes head movements, facial expressions and body gestures. In this article we describe techniques for manipulating both verbal and nonverbal facial…
Effects of aging on audio-visual speech integration.
Huyse, Aurélie; Leybaert, Jacqueline; Berthommier, Frédéric
2014-10-01
This study investigated the impact of aging on audio-visual speech integration. A syllable identification task was presented in auditory-only, visual-only, and audio-visual congruent and incongruent conditions. Visual cues were either degraded or unmodified. Stimuli were embedded in stationary noise alternating with modulated noise. Fifteen young adults and 15 older adults participated in this study. Results showed that older adults had preserved lipreading abilities when the visual input was clear but not when it was degraded. The impact of aging on audio-visual integration also depended on the quality of the visual cues. In the visual clear condition, the audio-visual gain was similar in both groups and analyses in the framework of the fuzzy-logical model of perception confirmed that older adults did not differ from younger adults in their audio-visual integration abilities. In the visual reduction condition, the audio-visual gain was reduced in the older group, but only when the noise was stationary, suggesting that older participants could compensate for the loss of lipreading abilities by using the auditory information available in the valleys of the noise. The fuzzy-logical model of perception confirmed the significant impact of aging on audio-visual integration by showing an increased weight of audition in the older group.
Monkeys and Humans Share a Common Computation for Face/Voice Integration
Chandrasekaran, Chandramouli; Lemus, Luis; Trubanova, Andrea; Gondan, Matthias; Ghazanfar, Asif A.
2011-01-01
Speech production involves the movement of the mouth and other regions of the face resulting in visual motion cues. These visual cues enhance intelligibility and detection of auditory speech. As such, face-to-face speech is fundamentally a multisensory phenomenon. If speech is fundamentally multisensory, it should be reflected in the evolution of vocal communication: similar behavioral effects should be observed in other primates. Old World monkeys share with humans vocal production biomechanics and communicate face-to-face with vocalizations. It is unknown, however, if they, too, combine faces and voices to enhance their perception of vocalizations. We show that they do: monkeys combine faces and voices in noisy environments to enhance their detection of vocalizations. Their behavior parallels that of humans performing an identical task. We explored what common computational mechanism(s) could explain the pattern of results we observed across species. Standard explanations or models such as the principle of inverse effectiveness and a “race” model failed to account for their behavior patterns. Conversely, a “superposition model”, positing the linear summation of activity patterns in response to visual and auditory components of vocalizations, served as a straightforward but powerful explanatory mechanism for the observed behaviors in both species. As such, it represents a putative homologous mechanism for integrating faces and voices across primates. PMID:21998576
Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen
2018-01-01
The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration.
Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen
2018-01-01
The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration. PMID:29780312
EEG activity evoked in preparation for multi-talker listening by adults and children.
Holmes, Emma; Kitterick, Padraig T; Summerfield, A Quentin
2016-06-01
Selective attention is critical for successful speech perception because speech is often encountered in the presence of other sounds, including the voices of competing talkers. Faced with the need to attend selectively, listeners perceive speech more accurately when they know characteristics of upcoming talkers before they begin to speak. However, the neural processes that underlie the preparation of selective attention for voices are not fully understood. The current experiments used electroencephalography (EEG) to investigate the time course of brain activity during preparation for an upcoming talker in young adults aged 18-27 years with normal hearing (Experiments 1 and 2) and in typically-developing children aged 7-13 years (Experiment 3). Participants reported key words spoken by a target talker when an opposite-gender distractor talker spoke simultaneously. The two talkers were presented from different spatial locations (±30° azimuth). Before the talkers began to speak, a visual cue indicated either the location (left/right) or the gender (male/female) of the target talker. Adults evoked preparatory EEG activity that started shortly after (<50 ms) the visual cue was presented and was sustained until the talkers began to speak. The location cue evoked similar preparatory activity in Experiments 1 and 2 with different samples of participants. The gender cue did not evoke preparatory activity when it predicted gender only (Experiment 1) but did evoke preparatory activity when it predicted the identity of a specific talker with greater certainty (Experiment 2). Location cues evoked significant preparatory EEG activity in children but gender cues did not. The results provide converging evidence that listeners evoke consistent preparatory brain activity for selecting a talker by their location (regardless of their gender or identity), but not by their gender alone. Copyright © 2016 Elsevier B.V. All rights reserved.
Huyse, Aurélie; Berthommier, Frédéric; Leybaert, Jacqueline
2013-01-01
The aim of the present study was to examine audiovisual speech integration in cochlear-implanted children and in normally hearing children exposed to degraded auditory stimuli. Previous studies have shown that speech perception in cochlear-implanted users is biased toward the visual modality when audition and vision provide conflicting information. Our main question was whether an experimentally designed degradation of the visual speech cue would increase the importance of audition in the response pattern. The impact of auditory proficiency was also investigated. A group of 31 children with cochlear implants and a group of 31 normally hearing children matched for chronological age were recruited. All children with cochlear implants had profound congenital deafness and had used their implants for at least 2 years. Participants had to perform an /aCa/ consonant-identification task in which stimuli were presented randomly in three conditions: auditory only, visual only, and audiovisual (congruent and incongruent McGurk stimuli). In half of the experiment, the visual speech cue was normal; in the other half (visual reduction) a degraded visual signal was presented, aimed at preventing lipreading of good quality. The normally hearing children received a spectrally reduced speech signal (simulating the input delivered by the cochlear implant). First, performance in visual-only and in congruent audiovisual modalities were decreased, showing that the visual reduction technique used here was efficient at degrading lipreading. Second, in the incongruent audiovisual trials, visual reduction led to a major increase in the number of auditory based responses in both groups. Differences between proficient and nonproficient children were found in both groups, with nonproficient children's responses being more visual and less auditory than those of proficient children. Further analysis revealed that differences between visually clear and visually reduced conditions and between groups were not only because of differences in unisensory perception but also because of differences in the process of audiovisual integration per se. Visual reduction led to an increase in the weight of audition, even in cochlear-implanted children, whose perception is generally dominated by vision. This result suggests that the natural bias in favor of vision is not immutable. Audiovisual speech integration partly depends on the experimental situation, which modulates the informational content of the sensory channels and the weight that is awarded to each of them. Consequently, participants, whether deaf with cochlear implants or having normal hearing, not only base their perception on the most reliable modality but also award it an additional weight.
Simulation of talking faces in the human brain improves auditory speech recognition
von Kriegstein, Katharina; Dogan, Özgür; Grüter, Martina; Giraud, Anne-Lise; Kell, Christian A.; Grüter, Thomas; Kleinschmidt, Andreas; Kiebel, Stefan J.
2008-01-01
Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face. PMID:18436648
Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Potton, Anita; Birtles, Deidre; Frostick, Caroline; Moore, Derek G.
2013-01-01
The use of visual cues during the processing of audiovisual (AV) speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6–9 months to 14–16 months of age. We used eye-tracking to examine whether individual differences in visual attention during AV processing of speech in 6–9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6–9 month old infants also participated in an event-related potential (ERP) AV task within the same experimental session. Language development was then followed-up at the age of 14–16 months, using two measures of language development, the Preschool Language Scale and the Oxford Communicative Development Inventory. The results show that those infants who were less efficient in auditory speech processing at the age of 6–9 months had lower receptive language scores at 14–16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audiovisually incongruent stimuli at 6–9 months were both significantly associated with language development at 14–16 months. These findings add to the understanding of individual differences in neural signatures of AV processing and associated looking behavior in infants. PMID:23882240
Król, Magdalena Ewa
2018-01-01
We investigated the effect of auditory noise added to speech on patterns of looking at faces in 40 toddlers. We hypothesised that noise would increase the difficulty of processing speech, making children allocate more attention to the mouth of the speaker to gain visual speech cues from mouth movements. We also hypothesised that this shift would cause a decrease in fixation time to the eyes, potentially decreasing the ability to monitor gaze. We found that adding noise increased the number of fixations to the mouth area, at the price of a decreased number of fixations to the eyes. Thus, to our knowledge, this is the first study demonstrating a mouth-eyes trade-off between attention allocated to social cues coming from the eyes and linguistic cues coming from the mouth. We also found that children with higher word recognition proficiency and higher average pupil response had an increased likelihood of fixating the mouth, compared to the eyes and the rest of the screen, indicating stronger motivation to decode the speech.
2018-01-01
We investigated the effect of auditory noise added to speech on patterns of looking at faces in 40 toddlers. We hypothesised that noise would increase the difficulty of processing speech, making children allocate more attention to the mouth of the speaker to gain visual speech cues from mouth movements. We also hypothesised that this shift would cause a decrease in fixation time to the eyes, potentially decreasing the ability to monitor gaze. We found that adding noise increased the number of fixations to the mouth area, at the price of a decreased number of fixations to the eyes. Thus, to our knowledge, this is the first study demonstrating a mouth-eyes trade-off between attention allocated to social cues coming from the eyes and linguistic cues coming from the mouth. We also found that children with higher word recognition proficiency and higher average pupil response had an increased likelihood of fixating the mouth, compared to the eyes and the rest of the screen, indicating stronger motivation to decode the speech. PMID:29558514
Motivation and appraisal in perception of poorly specified speech.
Lidestam, Björn; Beskow, Jonas
2006-04-01
Normal-hearing students (n = 72) performed sentence, consonant, and word identification in either A (auditory), V (visual), or AV (audiovisual) modality. The auditory signal had difficult speech-to-noise relations. Talker (human vs. synthetic), topic (no cue vs. cue-words), and emotion (no cue vs. facially displayed vs. cue-words) were varied within groups. After the first block, effects of modality, face, topic, and emotion on initial appraisal and motivation were assessed. After the entire session, effects of modality on longer-term appraisal and motivation were assessed. The results from both assessments showed that V identification was more positively appraised than A identification. Correlations were tentatively interpreted such that evaluation of self-rated performance possibly depends on subjective standard and is reflected on motivation (if below subjective standard, AV group), or on appraisal (if above subjective standard, A group). Suggestions for further research are presented.
Neural networks supporting audiovisual integration for speech: A large-scale lesion study.
Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius
2018-06-01
Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
Lidestam, Björn; Rönnberg, Jerker
2016-01-01
The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. PMID:27317667
Nolan, Peter; Hoskins, Sherria; Johnson, Julia; Powell, Vaughan; Chaudhuri, K Ray; Eglin, Roger
2012-01-01
A Smartphone speech-therapy application (STA) is being developed, intended for people with Parkinson's disease (PD) with reduced implicit volume cues. The STA offers visual volume feedback, addressing diminished auditory cues. Users are typically older adults, less familiar with new technology. Domain-specific implicit theories (ITs) have been shown to result in mastery or helpless behaviors. Studies manipulating participants' implicit theories of 'technology' (Study One), and 'ability to affect one's voice' (Study Two), were coordinated with iterative STA test-stages, using patients with PD with prior speech-therapist referrals. Across studies, findings suggest it is possible to manipulate patients' ITs related to engaging with a Smartphone STA. This potentially impacts initial application approach and overall effort using a technology-based therapy.
Audiovisual Perception of Congruent and Incongruent Dutch Front Vowels
ERIC Educational Resources Information Center
Valkenier, Bea; Duyne, Jurriaan Y.; Andringa, Tjeerd C.; Baskent, Deniz
2012-01-01
Purpose: Auditory perception of vowels in background noise is enhanced when combined with visually perceived speech features. The objective of this study was to investigate whether the influence of visual cues on vowel perception extends to incongruent vowels, in a manner similar to the McGurk effect observed with consonants. Method:…
Audiovisual integration in children listening to spectrally degraded speech.
Maidment, David W; Kang, Hi Jee; Stewart, Hannah J; Amitay, Sygal
2015-02-01
The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Children (n=69) and adults (n=15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in auditory-only or audiovisual conditions. The number of bands was adaptively varied to modulate the degradation of the auditory signal, with the number of bands required for approximately 79% correct identification calculated as the threshold. The youngest children (4- to 5-year-olds) did not benefit from accompanying visual information, in comparison to 6- to 11-year-old children and adults. Audiovisual gain also increased with age in the child sample. The current data suggest that children younger than 6 years of age do not fully utilize visual speech cues to enhance speech perception when the auditory signal is degraded. This evidence not only has implications for understanding the development of speech perception skills in children with normal hearing but may also inform the development of new treatment and intervention strategies that aim to remediate speech perception difficulties in pediatric cochlear implant users.
Dual-learning systems during speech category learning
Chandrasekaran, Bharath; Yi, Han-Gyol; Maddox, W. Todd
2013-01-01
Dual-systems models of visual category learning posit the existence of an explicit, hypothesis-testing ‘reflective’ system, as well as an implicit, procedural-based ‘reflexive’ system. The reflective and reflexive learning systems are competitive and neurally dissociable. Relatively little is known about the role of these domain-general learning systems in speech category learning. Given the multidimensional, redundant, and variable nature of acoustic cues in speech categories, our working hypothesis is that speech categories are learned reflexively. To this end, we examined the relative contribution of these learning systems to speech learning in adults. Native English speakers learned to categorize Mandarin tone categories over 480 trials. The training protocol involved trial-by-trial feedback and multiple talkers. Experiment 1 and 2 examined the effect of manipulating the timing (immediate vs. delayed) and information content (full vs. minimal) of feedback. Dual-systems models of visual category learning predict that delayed feedback and providing rich, informational feedback enhance reflective learning, while immediate and minimally informative feedback enhance reflexive learning. Across the two experiments, our results show feedback manipulations that targeted reflexive learning enhanced category learning success. In Experiment 3, we examined the role of trial-to-trial talker information (mixed vs. blocked presentation) on speech category learning success. We hypothesized that the mixed condition would enhance reflexive learning by not allowing an association between talker-related acoustic cues and speech categories. Our results show that the mixed talker condition led to relatively greater accuracies. Our experiments demonstrate that speech categories are optimally learned by training methods that target the reflexive learning system. PMID:24002965
Mistaking minds and machines: How speech affects dehumanization and anthropomorphism.
Schroeder, Juliana; Epley, Nicholas
2016-11-01
Treating a human mind like a machine is an essential component of dehumanization, whereas attributing a humanlike mind to a machine is an essential component of anthropomorphism. Here we tested how a cue closely connected to a person's actual mental experience-a humanlike voice-affects the likelihood of mistaking a person for a machine, or a machine for a person. We predicted that paralinguistic cues in speech are particularly likely to convey the presence of a humanlike mind, such that removing voice from communication (leaving only text) would increase the likelihood of mistaking the text's creator for a machine. Conversely, adding voice to a computer-generated script (resulting in speech) would increase the likelihood of mistaking the text's creator for a human. Four experiments confirmed these hypotheses, demonstrating that people are more likely to infer a human (vs. computer) creator when they hear a voice expressing thoughts than when they read the same thoughts in text. Adding human visual cues to text (i.e., seeing a person perform a script in a subtitled video clip), did not increase the likelihood of inferring a human creator compared with only reading text, suggesting that defining features of personhood may be conveyed more clearly in speech (Experiments 1 and 2). Removing the naturalistic paralinguistic cues that convey humanlike capacity for thinking and feeling, such as varied pace and intonation, eliminates the humanizing effect of speech (Experiment 4). We discuss implications for dehumanizing others through text-based media, and for anthropomorphizing machines through speech-based media. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Johari, Karim; Behroozmand, Roozbeh
2017-05-01
The predictive coding model suggests that neural processing of sensory information is facilitated for temporally-predictable stimuli. This study investigated how temporal processing of visually-presented sensory cues modulates movement reaction time and neural activities in speech and hand motor systems. Event-related potentials (ERPs) were recorded in 13 subjects while they were visually-cued to prepare to produce a steady vocalization of a vowel sound or press a button in a randomized order, and to initiate the cued movement following the onset of a go signal on the screen. Experiment was conducted in two counterbalanced blocks in which the time interval between visual cue and go signal was temporally-predictable (fixed delay at 1000 ms) or unpredictable (variable between 1000 and 2000 ms). Results of the behavioral response analysis indicated that movement reaction time was significantly decreased for temporally-predictable stimuli in both speech and hand modalities. We identified premotor ERP activities with a left-lateralized parietal distribution for hand and a frontocentral distribution for speech that were significantly suppressed in response to temporally-predictable compared with unpredictable stimuli. The premotor ERPs were elicited approximately -100 ms before movement and were significantly correlated with speech and hand motor reaction times only in response to temporally-predictable stimuli. These findings suggest that the motor system establishes a predictive code to facilitate movement in response to temporally-predictable sensory stimuli. Our data suggest that the premotor ERP activities are robust neurophysiological biomarkers of such predictive coding mechanisms. These findings provide novel insights into the temporal processing mechanisms of speech and hand motor systems.
Visual cues and listening effort: individual variability.
Picou, Erin M; Ricketts, Todd A; Hornsby, Benjamin W Y
2011-10-01
To investigate the effect of visual cues on listening effort as well as whether predictive variables such as working memory capacity (WMC) and lipreading ability affect the magnitude of listening effort. Twenty participants with normal hearing were tested using a paired-associates recall task in 2 conditions (quiet and noise) and 2 presentation modalities (audio only [AO] and auditory-visual [AV]). Signal-to-noise ratios were adjusted to provide matched speech recognition across audio-only and AV noise conditions. Also measured were subjective perceptions of listening effort and 2 predictive variables: (a) lipreading ability and (b) WMC. Objective and subjective results indicated that listening effort increased in the presence of noise, but on average the addition of visual cues did not significantly affect the magnitude of listening effort. Although there was substantial individual variability, on average participants who were better lipreaders or had larger WMCs demonstrated reduced listening effort in noise in AV conditions. Overall, the results support the hypothesis that integrating auditory and visual cues requires cognitive resources in some participants. The data indicate that low lipreading ability or low WMC is associated with relatively effortful integration of auditory and visual information in noise.
ERIC Educational Resources Information Center
von Feldt, James R.; Subtelny, Joanne
The Webster diacritical system provides a discrete symbol for each sound and designates the appropriate syllable to be stressed in any polysyllabic word; the symbol system presents cues for correct production, auditory discriminiation, and visual recognition of new words in print and as visual speech gestures. The Webster's Diacritical CAI Program…
Subjective scaling of spatial room acoustic parameters influenced by visual environmental cues
Valente, Daniel L.; Braasch, Jonas
2010-01-01
Although there have been numerous studies investigating subjective spatial impression in rooms, only a few of those studies have addressed the influence of visual cues on the judgment of auditory measures. In the psychophysical study presented here, video footage of five solo music∕speech performers was shown for four different listening positions within a general-purpose space. The videos were presented in addition to the acoustic signals, which were auralized using binaural room impulse responses (BRIR) that were recorded in the same general-purpose space. The participants were asked to adjust the direct-to-reverberant energy ratio (D∕R ratio) of the BRIR according to their expectation considering the visual cues. They were also directed to rate the apparent source width (ASW) and listener envelopment (LEV) for each condition. Visual cues generated by changing the sound-source position in the multi-purpose space, as well as the makeup of the sound stimuli affected the judgment of spatial impression. Participants also scaled the direct-to-reverberant energy ratio with greater direct sound energy than was measured in the acoustical environment. PMID:20968367
Brandão, Lenisa; Monção, Ana Maria; Andersson, Richard; Holmqvist, Kenneth
2014-01-01
The goal of this study was to investigate whether on-topic visual cues can serve as aids for the maintenance of discourse coherence and informativeness in autobiographical narratives of persons with Alzheimer's disease (AD). The experiment consisted of three randomized conversation conditions: one without prompts, showing a blank computer screen; an on-topic condition, showing a picture and a sentence about the conversation; and an off-topic condition, showing a picture and a sentence which were unrelated to the conversation. Speech was recorded while visual attention was examined using eye tracking to measure how long participants looked at cues and the face of the listener. Results suggest that interventions using visual cues in the form of images and written information are useful to improve discourse informativeness in AD. This study demonstrated the potential of using images and short written messages as means of compensating for the cognitive deficits which underlie uninformative discourse in AD. Future studies should further investigate the efficacy of language interventions based in the use of these compensation strategies for AD patients and their family members and friends.
Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect.
Van Engen, Kristin J; Xie, Zilong; Chandrasekaran, Bharath
2017-02-01
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners' auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants' susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners' McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.
Audiovisual integration for speech during mid-childhood: electrophysiological evidence.
Kaganovich, Natalya; Schumaker, Jennifer
2014-12-01
Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7-8-year-olds and 10-11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. Copyright © 2014 Elsevier Inc. All rights reserved.
Audiovisual integration for speech during mid-childhood: Electrophysiological evidence
Kaganovich, Natalya; Schumaker, Jennifer
2014-01-01
Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7–8-year-olds and 10–11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. PMID:25463815
Visual input enhances selective speech envelope tracking in auditory cortex at a "cocktail party".
Zion Golumbic, Elana; Cogan, Gregory B; Schroeder, Charles E; Poeppel, David
2013-01-23
Our ability to selectively attend to one auditory signal amid competing input streams, epitomized by the "Cocktail Party" problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared with responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker's face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a Cocktail Party setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive.
Flaherty, Mary; Dent, Micheal L.; Sawusch, James R.
2017-01-01
The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with “d” or “t” and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal. PMID:28562597
Flaherty, Mary; Dent, Micheal L; Sawusch, James R
2017-01-01
The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
Prosody production networks are modulated by sensory cues and social context.
Klasen, Martin; von Marschall, Clara; Isman, Güldehen; Zvyagintsev, Mikhail; Gur, Ruben C; Mathiak, Klaus
2018-03-05
The neurobiology of emotional prosody production is not well investigated. In particular, the effects of cues and social context are not known. The present study sought to differentiate cued from free emotion generation and the effect of social feedback from a human listener. Online speech filtering enabled fMRI during prosodic communication in 30 participants. Emotional vocalizations were a) free, b) auditorily cued, c) visually cued, or d) with interactive feedback. In addition to distributed language networks, cued emotions increased activity in auditory and - in case of visual stimuli - visual cortex. Responses were larger in pSTG at the right hemisphere and the ventral striatum when participants were listened to and received feedback from the experimenter. Sensory, language, and reward networks contributed to prosody production and were modulated by cues and social context. The right pSTG is a central hub for communication in social interactions - in particular for interpersonal evaluation of vocal emotions.
Giraud, Anne Lise; Truy, Eric
2002-01-01
Early visual cortex can be recruited by meaningful sounds in the absence of visual information. This occurs in particular in cochlear implant (CI) patients whose dependency on visual cues in speech comprehension is increased. Such cross-modal interaction mirrors the response of early auditory cortex to mouth movements (speech reading) and may reflect the natural expectancy of the visual counterpart of sounds, lip movements. Here we pursue the hypothesis that visual activations occur specifically in response to meaningful sounds. We performed PET in both CI patients and controls, while subjects listened either to their native language or to a completely unknown language. A recruitment of early visual cortex, the left posterior inferior temporal gyrus (ITG) and the left superior parietal cortex was observed in both groups. While no further activation occurred in the group of normal-hearing subjects, CI patients additionally recruited the right perirhinal/fusiform and mid-fusiform, the right temporo-occipito-parietal (TOP) junction and the left inferior prefrontal cortex (LIPF, Broca's area). This study confirms a participation of visual cortical areas in semantic processing of speech sounds. Observation of early visual activation in normal-hearing subjects shows that auditory-to-visual cross-modal effects can also be recruited under natural hearing conditions. In cochlear implant patients, speech activates the mid-fusiform gyrus in the vicinity of the so-called face area. This suggests that specific cross-modal interaction involving advanced stages in the visual processing hierarchy develops after cochlear implantation and may be the correlate of increased usage of lip-reading.
Francis, Alexander L
2010-02-01
Perception of speech in competing speech is facilitated by spatial separation of the target and distracting speech, but this benefit may arise at either a perceptual or a cognitive level of processing. Load theory predicts different effects of perceptual and cognitive (working memory) load on selective attention in flanker task contexts, suggesting that this paradigm may be used to distinguish levels of interference. Two experiments examined interference from competing speech during a word recognition task under different perceptual and working memory loads in a dual-task paradigm. Listeners identified words produced by a talker of one gender while ignoring a talker of the other gender. Perceptual load was manipulated using a nonspeech response cue, with response conditional upon either one or two acoustic features (pitch and modulation). Memory load was manipulated with a secondary task consisting of one or six visually presented digits. In the first experiment, the target and distractor were presented at different virtual locations (0 degrees and 90 degrees , respectively), whereas in the second, all the stimuli were presented from the same apparent location. Results suggest that spatial cues improve resistance to distraction in part by reducing working memory demand.
Audio-visual speech cue combination.
Arnold, Derek H; Tear, Morgan; Schindel, Ryan; Roseboom, Warrick
2010-04-16
Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process. Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation. Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.
Divided listening in noise in a mock-up of a military command post.
Abel, Sharon M; Nakashima, Ann; Smith, Ingrid
2012-04-01
This study investigated divided listening in noise in a mock-up of a vehicular command post. The effects of background noise from the vehicle, unattended speech of coworkers on speech understanding, and a visual cue that directed attention to the message source were examined. Sixteen normal-hearing males participated in sixteen listening conditions, defined by combinations of the absence/presence of vehicle and speech babble noises, availability of a vision cue, and number of channels (2 or 3, diotic or dichotic, and loudspeakers) over which concurrent series of call sign, color, and number phrases were presented. All wore a communications headset with integrated hearing protection. A computer keyboard was used to encode phrases beginning with an assigned call sign. Subjects achieved close to 100% correct phrase identification when presented over the headset (with or without vehicle noise) or over the loudspeakers, without vehicle noise. In contrast, the percentage correct phrase identification was significantly less by 30 to 35% when presented over loudspeakers with vehicle noise. Vehicle noise combined with babble noise decreased the accuracy by an additional 12% for dichotic listening. Vision cues increased phrase identification accuracy by 7% for diotic listening. Outcomes could be explained by the at-ear energy spectra of the speech and noise.
Gender differences in identifying emotions from auditory and visual stimuli.
Waaramaa, Teija
2017-12-01
The present study focused on gender differences in emotion identification from auditory and visual stimuli produced by two male and two female actors. Differences in emotion identification from nonsense samples, language samples and prolonged vowels were investigated. It was also studied whether auditory stimuli can convey the emotional content of speech without visual stimuli, and whether visual stimuli can convey the emotional content of speech without auditory stimuli. The aim was to get a better knowledge of vocal attributes and a more holistic understanding of the nonverbal communication of emotion. Females tended to be more accurate in emotion identification than males. Voice quality parameters played a role in emotion identification in both genders. The emotional content of the samples was best conveyed by nonsense sentences, better than by prolonged vowels or shared native language of the speakers and participants. Thus, vocal non-verbal communication tends to affect the interpretation of emotion even in the absence of language. The emotional stimuli were better recognized from visual stimuli than auditory stimuli by both genders. Visual information about speech may not be connected to the language; instead, it may be based on the human ability to understand the kinetic movements in speech production more readily than the characteristics of the acoustic cues.
Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech
ERIC Educational Resources Information Center
Pilling, Michael; Thomas, Sharon
2011-01-01
Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…
Motor excitability during visual perception of known and unknown spoken languages.
Swaminathan, Swathi; MacSweeney, Mairéad; Boyles, Rowan; Waters, Dafydd; Watkins, Kate E; Möttönen, Riikka
2013-07-01
It is possible to comprehend speech and discriminate languages by viewing a speaker's articulatory movements. Transcranial magnetic stimulation studies have shown that viewing speech enhances excitability in the articulatory motor cortex. Here, we investigated the specificity of this enhanced motor excitability in native and non-native speakers of English. Both groups were able to discriminate between speech movements related to a known (i.e., English) and unknown (i.e., Hebrew) language. The motor excitability was higher during observation of a known language than an unknown language or non-speech mouth movements, suggesting that motor resonance is enhanced specifically during observation of mouth movements that convey linguistic information. Surprisingly, however, the excitability was equally high during observation of a static face. Moreover, the motor excitability did not differ between native and non-native speakers. These findings suggest that the articulatory motor cortex processes several kinds of visual cues during speech communication. Crown Copyright © 2013. Published by Elsevier Inc. All rights reserved.
Ahrens, Merle-Marie; Veniero, Domenica; Gross, Joachim; Harvey, Monika; Thut, Gregor
2015-01-01
Many behaviourally relevant sensory events such as motion stimuli and speech have an intrinsic spatio-temporal structure. This will engage intentional and most likely unintentional (automatic) prediction mechanisms enhancing the perception of upcoming stimuli in the event stream. Here we sought to probe the anticipatory processes that are automatically driven by rhythmic input streams in terms of their spatial and temporal components. To this end, we employed an apparent visual motion paradigm testing the effects of pre-target motion on lateralized visual target discrimination. The motion stimuli either moved towards or away from peripheral target positions (valid vs. invalid spatial motion cueing) at a rhythmic or arrhythmic pace (valid vs. invalid temporal motion cueing). Crucially, we emphasized automatic motion-induced anticipatory processes by rendering the motion stimuli non-predictive of upcoming target position (by design) and task-irrelevant (by instruction), and by creating instead endogenous (orthogonal) expectations using symbolic cueing. Our data revealed that the apparent motion cues automatically engaged both spatial and temporal anticipatory processes, but that these processes were dissociated. We further found evidence for lateralisation of anticipatory temporal but not spatial processes. This indicates that distinct mechanisms may drive automatic spatial and temporal extrapolation of upcoming events from rhythmic event streams. This contrasts with previous findings that instead suggest an interaction between spatial and temporal attention processes when endogenously driven. Our results further highlight the need for isolating intentional from unintentional processes for better understanding the various anticipatory mechanisms engaged in processing behaviourally relevant stimuli with predictable spatio-temporal structure such as motion and speech. PMID:26623650
Johari, Karim; Behroozmand, Roozbeh
2017-08-01
Skilled movement is mediated by motor commands executed with extremely fine temporal precision. The question of how the brain incorporates temporal information to perform motor actions has remained unanswered. This study investigated the effect of stimulus temporal predictability on response timing of speech and hand movement. Subjects performed a randomized vowel vocalization or button press task in two counterbalanced blocks in response to temporally-predictable and unpredictable visual cues. Results indicated that speech and hand reaction time was decreased for predictable compared with unpredictable stimuli. This finding suggests that a temporal predictive code is established to capture temporal dynamics of sensory cues in order to produce faster movements in responses to predictable stimuli. In addition, results revealed a main effect of modality, indicating faster hand movement compared with speech. We suggest that this effect is accounted for by the inherent complexity of speech production compared with hand movement. Lastly, we found that movement inhibition was faster than initiation for both hand and speech, suggesting that movement initiation requires a longer processing time to coordinate activities across multiple regions in the brain. These findings provide new insights into the mechanisms of temporal information processing during initiation and inhibition of speech and hand movement. Copyright © 2017 Elsevier B.V. All rights reserved.
Audiovisual cues and perceptual learning of spectrally distorted speech.
Pilling, Michael; Thomas, Sharon
2011-12-01
Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties of a cochlear implant with a 6 mm place mismatch: Experiment I found that participants showed significantly greater improvement in perceiving noise-vocoded speech when training gave AV cues than when it gave auditory cues alone. Experiment 2 compared training with AV cues with training which gave written feedback. These two methods did not significantly differ in the pattern of training they produced. Suggestions are made about the types of circumstances in which the two training methods might be found to differ in facilitating auditory perceptual learning of speech.
Neural Entrainment to Rhythmically Presented Auditory, Visual, and Audio-Visual Speech in Children
Power, Alan James; Mead, Natasha; Barnes, Lisa; Goswami, Usha
2012-01-01
Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal “samples” of information from the speech stream at different rates, phase resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (“phase locking”). Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate) based on repetition of the syllable “ba,” presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a “talking head”). To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the “ba” stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a “ba” in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling, such as dyslexia. PMID:22833726
Visual Input Enhances Selective Speech Envelope Tracking in Auditory Cortex at a ‘Cocktail Party’
Golumbic, Elana Zion; Cogan, Gregory B.; Schroeder, Charles E.; Poeppel, David
2013-01-01
Our ability to selectively attend to one auditory signal amidst competing input streams, epitomized by the ‘Cocktail Party’ problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared to responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic (MEG) signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker’s face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a ‘Cocktail Party’ setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive. PMID:23345218
Brandão, Lenisa; Monção, Ana Maria; Andersson, Richard; Holmqvist, Kenneth
2014-01-01
Objective The goal of this study was to investigate whether on-topic visual cues can serve as aids for the maintenance of discourse coherence and informativeness in autobiographical narratives of persons with Alzheimer's disease (AD). Methods The experiment consisted of three randomized conversation conditions: one without prompts, showing a blank computer screen; an on-topic condition, showing a picture and a sentence about the conversation; and an off-topic condition, showing a picture and a sentence which were unrelated to the conversation. Speech was recorded while visual attention was examined using eye tracking to measure how long participants looked at cues and the face of the listener. Results Results suggest that interventions using visual cues in the form of images and written information are useful to improve discourse informativeness in AD. Conclusion This study demonstrated the potential of using images and short written messages as means of compensating for the cognitive deficits which underlie uninformative discourse in AD. Future studies should further investigate the efficacy of language interventions based in the use of these compensation strategies for AD patients and their family members and friends. PMID:29213914
Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric
2014-01-01
Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed. PMID:24904454
Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric
2014-01-01
Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.
Visual Grouping in Accordance With Utterance Planning Facilitates Speech Production.
Zhao, Liming; Paterson, Kevin B; Bai, Xuejun
2018-01-01
Research on language production has focused on the process of utterance planning and involved studying the synchronization between visual gaze and the production of sentences that refer to objects in the immediate visual environment. However, it remains unclear how the visual grouping of these objects might influence this process. To shed light on this issue, the present research examined the effects of the visual grouping of objects in a visual display on utterance planning in two experiments. Participants produced utterances of the form "The snail and the necklace are above/below/on the left/right side of the toothbrush" for objects containing these referents (e.g., a snail, a necklace and a toothbrush). These objects were grouped using classic Gestalt principles of color similarity (Experiment 1) and common region (Experiment 2) so that the induced perceptual grouping was congruent or incongruent with the required phrasal organization. The results showed that speech onset latencies were shorter in congruent than incongruent conditions. The findings therefore reveal that the congruency between the visual grouping of referents and the required phrasal organization can influence speech production. Such findings suggest that, when language is produced in a visual context, speakers make use of both visual and linguistic cues to plan utterances.
Graham, Susan A; San Juan, Valerie; Khu, Melanie
2017-05-01
When linguistic information alone does not clarify a speaker's intended meaning, skilled communicators can draw on a variety of cues to infer communicative intent. In this paper, we review research examining the developmental emergence of preschoolers' sensitivity to a communicative partner's perspective. We focus particularly on preschoolers' tendency to use cues both within the communicative context (i.e. a speaker's visual access to information) and within the speech signal itself (i.e. emotional prosody) to make on-line inferences about communicative intent. Our review demonstrates that preschoolers' ability to use visual and emotional cues of perspective to guide language interpretation is not uniform across tasks, is sometimes related to theory of mind and executive function skills, and, at certain points of development, is only revealed by implicit measures of language processing.
Won, Jong Ho; Lorenzi, Christian; Nie, Kaibao; Li, Xing; Jameyson, Elyse M; Drennan, Ward R; Rubinstein, Jay T
2012-08-01
Previous studies have demonstrated that normal-hearing listeners can understand speech using the recovered "temporal envelopes," i.e., amplitude modulation (AM) cues from frequency modulation (FM). This study evaluated this mechanism in cochlear implant (CI) users for consonant identification. Stimuli containing only FM cues were created using 1, 2, 4, and 8-band FM-vocoders to determine if consonant identification performance would improve as the recovered AM cues become more available. A consistent improvement was observed as the band number decreased from 8 to 1, supporting the hypothesis that (1) the CI sound processor generates recovered AM cues from broadband FM, and (2) CI users can use the recovered AM cues to recognize speech. The correlation between the intact and the recovered AM components at the output of the sound processor was also generally higher when the band number was low, supporting the consonant identification results. Moreover, CI subjects who were better at using recovered AM cues from broadband FM cues showed better identification performance with intact (unprocessed) speech stimuli. This suggests that speech perception performance variability in CI users may be partly caused by differences in their ability to use AM cues recovered from FM speech cues.
Chuen, Lorraine; Schutz, Michael
2016-07-01
An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities.
Two-year-olds can begin to acquire verb meanings in socially impoverished contexts.
Arunachalam, Sudha
2013-12-01
By two years of age, toddlers are adept at recruiting social, observational, and linguistic cues to discover the meanings of words. Here, we ask how they fare in impoverished contexts in which linguistic cues are provided, but no social or visual information is available. Novel verbs are presented in a stream of syntactically informative sentences, but the sentences are not embedded in a social context, and no visual access to the verb's referent is provided until the test phase. The results provide insight into how toddlers may benefit from overhearing contexts in which they are not directly attending to the ambient speech, and in which no conversational context, visual referent, or child-directed conversation is available. Copyright © 2013 Elsevier B.V. All rights reserved.
Effects of Amplification, Speechreading, and Classroom Environments on Reception of Speech
ERIC Educational Resources Information Center
Blair, James C.
1977-01-01
Compared with 18 hard-of-hearing students (7 to 14-years-old) were two sources of amplification (binaural ear-level hearing aids and R F auditory training units with environmental microphones on) in "ideal" and "typical" classroom noise levels, with and without visual speechreading cues provided. (Author/IM)
Holmes, Emma; Kitterick, Padraig T; Summerfield, A Quentin
2018-04-25
Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker's spatial location or their gender. Participants directed attention to location and gender simultaneously ("objects") at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.
Xia, Jing; Nooraei, Nazanin; Kalluri, Sridhar; Edwards, Brent
2015-04-01
This study investigated whether spatial separation between talkers helps reduce cognitive processing load, and how hearing impairment interacts with the cognitive load of individuals listening in multi-talker environments. A dual-task paradigm was used in which performance on a secondary task (visual tracking) served as a measure of the cognitive load imposed by a speech recognition task. Visual tracking performance was measured under four conditions in which the target and the interferers were distinguished by (1) gender and spatial location, (2) gender only, (3) spatial location only, and (4) neither gender nor spatial location. Results showed that when gender cues were available, a 15° spatial separation between talkers reduced the cognitive load of listening even though it did not provide further improvement in speech recognition (Experiment I). Compared to normal-hearing listeners, large individual variability in spatial release of cognitive load was observed among hearing-impaired listeners. Cognitive load was lower when talkers were spatially separated by 60° than when talkers were of different genders, even though speech recognition was comparable in these two conditions (Experiment II). These results suggest that a measure of cognitive load might provide valuable insight into the benefit of spatial cues in multi-talker environments.
Lim, Sung-joo; Holt, Lori L
2011-01-01
Although speech categories are defined by multiple acoustic dimensions, some are perceptually weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: Increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information, and players' responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5h across 5 days exhibited improvements in /r/-/l/ perception on par with 2-4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights. Copyright © 2011 Cognitive Science Society, Inc.
Lim, Sung-joo; Holt, Lori L.
2011-01-01
Although speech categories are defined by multiple acoustic dimensions, some are perceptually-weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information and players’ responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5 hours across 5 days exhibited improvements in /r/-/l/ perception on par with 2–4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights. PMID:21827533
Won, Jong Ho; Lorenzi, Christian; Nie, Kaibao; Li, Xing; Jameyson, Elyse M.; Drennan, Ward R.; Rubinstein, Jay T.
2012-01-01
Previous studies have demonstrated that normal-hearing listeners can understand speech using the recovered “temporal envelopes,” i.e., amplitude modulation (AM) cues from frequency modulation (FM). This study evaluated this mechanism in cochlear implant (CI) users for consonant identification. Stimuli containing only FM cues were created using 1, 2, 4, and 8-band FM-vocoders to determine if consonant identification performance would improve as the recovered AM cues become more available. A consistent improvement was observed as the band number decreased from 8 to 1, supporting the hypothesis that (1) the CI sound processor generates recovered AM cues from broadband FM, and (2) CI users can use the recovered AM cues to recognize speech. The correlation between the intact and the recovered AM components at the output of the sound processor was also generally higher when the band number was low, supporting the consonant identification results. Moreover, CI subjects who were better at using recovered AM cues from broadband FM cues showed better identification performance with intact (unprocessed) speech stimuli. This suggests that speech perception performance variability in CI users may be partly caused by differences in their ability to use AM cues recovered from FM speech cues. PMID:22894230
Atcherson, Samuel R; Mendel, Lisa Lucks; Baltimore, Wesley J; Patro, Chhayakanta; Lee, Sungmin; Pousson, Monique; Spann, M Joshua
2017-01-01
It is generally well known that speech perception is often improved with integrated audiovisual input whether in quiet or in noise. In many health-care environments, however, conventional surgical masks block visual access to the mouth and obscure other potential facial cues. In addition, these environments can be noisy. Although these masks may not alter the acoustic properties, the presence of noise in addition to the lack of visual input can have a deleterious effect on speech understanding. A transparent ("see-through") surgical mask may help to overcome this issue. To compare the effect of noise and various visual input conditions on speech understanding for listeners with normal hearing (NH) and hearing impairment using different surgical masks. Participants were assigned to one of three groups based on hearing sensitivity in this quasi-experimental, cross-sectional study. A total of 31 adults participated in this study: one talker, ten listeners with NH, ten listeners with moderate sensorineural hearing loss, and ten listeners with severe-to-profound hearing loss. Selected lists from the Connected Speech Test were digitally recorded with and without surgical masks and then presented to the listeners at 65 dB HL in five conditions against a background of four-talker babble (+10 dB SNR): without a mask (auditory only), without a mask (auditory and visual), with a transparent mask (auditory only), with a transparent mask (auditory and visual), and with a paper mask (auditory only). A significant difference was found in the spectral analyses of the speech stimuli with and without the masks; however, no more than ∼2 dB root mean square. Listeners with NH performed consistently well across all conditions. Both groups of listeners with hearing impairment benefitted from visual input from the transparent mask. The magnitude of improvement in speech perception in noise was greatest for the severe-to-profound group. Findings confirm improved speech perception performance in noise for listeners with hearing impairment when visual input is provided using a transparent surgical mask. Most importantly, the use of the transparent mask did not negatively affect speech perception performance in noise. American Academy of Audiology
Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M
2015-03-01
Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.
D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette
2016-08-01
Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.
Binaural enhancement for bilateral cochlear implant users.
Brown, Christopher A
2014-01-01
Bilateral cochlear implant (BCI) users receive limited binaural cues and, thus, show little improvement to speech intelligibility from spatial cues. The feasibility of a method for enhancing the binaural cues available to BCI users is investigated. This involved extending interaural differences of levels, which typically are restricted to high frequencies, into the low-frequency region. Speech intelligibility was measured in BCI users listening over headphones and with direct stimulation, with a target talker presented to one side of the head in the presence of a masker talker on the other side. Spatial separation was achieved by applying either naturally occurring binaural cues or enhanced cues. In this listening configuration, BCI patients showed greater speech intelligibility with the enhanced binaural cues than with naturally occurring binaural cues. In some situations, it is possible for BCI users to achieve greater speech intelligibility when binaural cues are enhanced by applying interaural differences of levels in the low-frequency region.
Goverts, S Theo; Huysmans, Elke; Kramer, Sophia E; de Groot, Annette M B; Houtgast, Tammo
2011-12-01
Researchers have used the distortion-sensitivity approach in the psychoacoustical domain to investigate the role of auditory processing abilities in speech perception in noise (van Schijndel, Houtgast, & Festen, 2001; Goverts & Houtgast, 2010). In this study, the authors examined the potential applicability of the distortion-sensitivity approach for investigating the role of linguistic abilities in speech understanding in noise. The authors applied the distortion-sensitivity approach by measuring the processing of visually presented masked text in a condition with manipulated syntactic, lexical, and semantic cues and while using the Text Reception Threshold (George et al., 2007; Kramer, Zekveld, & Houtgast, 2009; Zekveld, George, Kramer, Goverts, & Houtgast, 2007) method. Two groups that differed in linguistic abilities were studied: 13 native and 10 non-native speakers of Dutch, all typically hearing university students. As expected, the non-native subjects showed substantially reduced performance. The results of the distortion-sensitivity approach yielded differentiated results on the use of specific linguistic cues in the 2 groups. The results show the potential value of the distortion-sensitivity approach in studying the role of linguistic abilities in speech understanding in noise of individuals with hearing impairment.
Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T
2014-06-01
Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.
Crossmodal and incremental perception of audiovisual cues to emotional speech.
Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc
2010-01-01
In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores raise faster for positive than negative conditions. Finally, the gating results suggest that incongruent emotions are perceived as more intense than congruent emotions, as the former get more extreme recognition scores than the latter, already after a short period of exposure.
Wirtzfeld, Michael R; Ibrahim, Rasha A; Bruce, Ian C
2017-10-01
Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.
NASA Astrophysics Data System (ADS)
Toscano, Joseph Christopher
Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech processing, listeners encode continuous differences in acoustic cues, independent of phonological categories; (2) at post-perceptual stages, fine-grained acoustic information is preserved; and (3) there is preliminary evidence that listeners encode cues relative to context via feedback from categories. These results are discussed in relation to proposed models of speech perception and sources of contextual variability.
Language identification from visual-only speech signals
Ronquest, Rebecca E.; Levi, Susannah V.; Pisoni, David B.
2010-01-01
Our goal in the present study was to examine how observers identify English and Spanish from visual-only displays of speech. First, we replicated the recent findings of Soto-Faraco et al. (2007) with Spanish and English bilingual and monolingual observers using different languages and a different experimental paradigm (identification). We found that prior linguistic experience affected response bias but not sensitivity (Experiment 1). In two additional experiments, we investigated the visual cues that observers use to complete the language-identification task. The results of Experiment 2 indicate that some lexical information is available in the visual signal but that it is limited. Acoustic analyses confirmed that our Spanish and English stimuli differed acoustically with respect to linguistic rhythmic categories. In Experiment 3, we tested whether this rhythmic difference could be used by observers to identify the language when the visual stimuli is temporally reversed, thereby eliminating lexical information but retaining rhythmic differences. The participants performed above chance even in the backward condition, suggesting that the rhythmic differences between the two languages may aid language identification in visual-only speech signals. The results of Experiments 3A and 3B also confirm previous findings that increased stimulus length facilitates language identification. Taken together, the results of these three experiments replicate earlier findings and also show that prior linguistic experience, lexical information, rhythmic structure, and utterance length influence visual-only language identification. PMID:20675804
Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa
2014-01-01
To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038
Effects of frequency shifts and visual gender information on vowel category judgments
NASA Astrophysics Data System (ADS)
Glidden, Catherine; Assmann, Peter F.
2003-10-01
Visual morphing techniques were used together with a high-quality vocoder to study the audiovisual contribution of talker gender to the identification of frequency-shifted vowels. A nine-step continuum ranging from ``bit'' to ``bet'' was constructed from natural recorded syllables spoken by an adult female talker. Upward and downward frequency shifts in spectral envelope (scale factors of 0.85 and 1.0) were applied in combination with shifts in fundamental frequency, F0 (scale factors of 0.5 and 1.0). Downward frequency shifts generally resulted in malelike voices whereas upward shifts were perceived as femalelike. Two separate nine-step visual continua from ``bit'' to ``bet'' were also constructed, one from a male face and the other a female face, each producing the end-point words. Each step along the two visual continua was paired with the corresponding step on the acoustic continuum, creating natural audiovisual utterances. Category boundary shifts were found for both acoustic cues (F0 and formant frequency shifts) and visual cues (visual gender). The visual gender effect was larger when acoustic and visual information were matched appropriately. These results suggest that visual information provided by the speech signal plays an important supplemental role in talker normalization.
Peng, Shu-Chen; Lu, Nelson; Chatterjee, Monita
2009-01-01
Cochlear implant (CI) recipients have only limited access to fundamental frequency (F0) information, and thus exhibit deficits in speech intonation recognition. For speech intonation, F0 serves as the primary cue, and other potential acoustic cues (e.g. intensity properties) may also contribute. This study examined the effects of cooperating or conflicting acoustic cues on speech intonation recognition by adult CI and normal hearing (NH) listeners with full-spectrum and spectrally degraded speech stimuli. Identification of speech intonation that signifies question and statement contrasts was measured in 13 CI recipients and 4 NH listeners, using resynthesized bi-syllabic words, where F0 and intensity properties were systematically manipulated. The stimulus set was comprised of tokens whose acoustic cues (i.e. F0 contour and intensity patterns) were either cooperating or conflicting. Subjects identified if each stimulus is a 'statement' or a 'question' in a single-interval, 2-alternative forced-choice (2AFC) paradigm. Logistic models were fitted to the data, and estimated coefficients were compared under cooperating and conflicting conditions, between the subject groups (CI vs. NH), and under full-spectrum and spectrally degraded conditions for NH listeners. The results indicated that CI listeners' intonation recognition was enhanced by cooperating F0 contour and intensity cues, but was adversely affected by these cues being conflicting. On the other hand, with full-spectrum stimuli, NH listeners' intonation recognition was not affected by cues being cooperating or conflicting. The effects of cues being cooperating or conflicting were comparable between the CI group and NH listeners with spectrally degraded stimuli. These findings suggest the importance of taking multiple acoustic sources for speech recognition into consideration in aural rehabilitation for CI recipients. Copyright (C) 2009 S. Karger AG, Basel.
Peng, Shu-Chen; Lu, Nelson; Chatterjee, Monita
2009-01-01
Cochlear implant (CI) recipients have only limited access to fundamental frequency (F0) information, and thus exhibit deficits in speech intonation recognition. For speech intonation, F0 serves as the primary cue, and other potential acoustic cues (e.g., intensity properties) may also contribute. This study examined the effects of acoustic cues being cooperating or conflicting on speech intonation recognition by adult cochlear implant (CI), and normal-hearing (NH) listeners with full-spectrum and spectrally degraded speech stimuli. Identification of speech intonation that signifies question and statement contrasts was measured in 13 CI recipients and 4 NH listeners, using resynthesized bi-syllabic words, where F0 and intensity properties were systematically manipulated. The stimulus set was comprised of tokens whose acoustic cues, i.e., F0 contour and intensity patterns, were either cooperating or conflicting. Subjects identified if each stimulus is a “statement” or a “question” in a single-interval, two-alternative forced-choice (2AFC) paradigm. Logistic models were fitted to the data, and estimated coefficients were compared under cooperating and conflicting conditions, between the subject groups (CI vs. NH), and under full-spectrum and spectrally degraded conditions for NH listeners. The results indicated that CI listeners’ intonation recognition was enhanced by F0 contour and intensity cues being cooperating, but was adversely affected by these cues being conflicting. On the other hand, with full-spectrum stimuli, NH listeners’ intonation recognition was not affected by cues being cooperating or conflicting. The effects of cues being cooperating or conflicting were comparable between the CI group and NH listeners with spectrally-degraded stimuli. These findings suggest the importance of taking multiple acoustic sources for speech recognition into consideration in aural rehabilitation for CI recipients. PMID:19372651
Keitel, Anne; Daum, Moritz M.
2015-01-01
The anticipation of a speaker’s next turn is a key element of successful conversation. This can be achieved using a multitude of cues. In natural conversation, the most important cue for adults to anticipate the end of a turn (and therefore the beginning of the next turn) is the semantic and syntactic content. In addition, prosodic cues, such as intonation, or visual signals that occur before a speaker starts speaking (e.g., opening the mouth) help to identify the beginning and the end of a speaker’s turn. Early in life, prosodic cues seem to be more important than in adulthood. For example, it was previously shown that 3-year-old children anticipated more turns in observed conversations when intonation was available compared with when not, and this beneficial effect was present neither in younger children nor in adults (Keitel et al., 2013). In the present study, we investigated this effect in greater detail. Videos of conversations between puppets with either normal or flattened intonation were presented to children (1-year-olds and 3-year-olds) and adults. The use of puppets allowed the control of visual signals: the verbal signals (speech) started exactly at the same time as the visual signals (mouth opening). With respect to the children, our findings replicate the results of the previous study: 3-year-olds anticipated more turns with normal intonation than with flattened intonation, whereas 1-year-olds did not show this effect. In contrast to our previous findings, the adults showed the same intonation effect as the 3-year-olds. This suggests that adults’ cue use varies depending on the characteristics of a conversation. Our results further support the notion that the cues used to anticipate conversational turns differ in development. PMID:25713548
Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D; Senn, Pascal
2013-01-01
To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0-500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Higher frame rate (>7 fps), higher camera resolution (>640 × 480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Webcameras have the potential to improve telecommunication of hearing-impaired individuals.
Wiggins, Ian M; Anderson, Carly A; Kitterick, Pádraig T; Hartley, Douglas E H
2016-09-01
Functional near-infrared spectroscopy (fNIRS) is a silent, non-invasive neuroimaging technique that is potentially well suited to auditory research. However, the reliability of auditory-evoked activation measured using fNIRS is largely unknown. The present study investigated the test-retest reliability of speech-evoked fNIRS responses in normally-hearing adults. Seventeen participants underwent fNIRS imaging in two sessions separated by three months. In a block design, participants were presented with auditory speech, visual speech (silent speechreading), and audiovisual speech conditions. Optode arrays were placed bilaterally over the temporal lobes, targeting auditory brain regions. A range of established metrics was used to quantify the reproducibility of cortical activation patterns, as well as the amplitude and time course of the haemodynamic response within predefined regions of interest. The use of a signal processing algorithm designed to reduce the influence of systemic physiological signals was found to be crucial to achieving reliable detection of significant activation at the group level. For auditory speech (with or without visual cues), reliability was good to excellent at the group level, but highly variable among individuals. Temporal-lobe activation in response to visual speech was less reliable, especially in the right hemisphere. Consistent with previous reports, fNIRS reliability was improved by averaging across a small number of channels overlying a cortical region of interest. Overall, the present results confirm that fNIRS can measure speech-evoked auditory responses in adults that are highly reliable at the group level, and indicate that signal processing to reduce physiological noise may substantially improve the reliability of fNIRS measurements. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Winn, Matthew B; Won, Jong Ho; Moon, Il Joon
This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
Winn, Matthew B.; Won, Jong Ho; Moon, Il Joon
2016-01-01
Objectives This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). We hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. We further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Design Nineteen CI listeners and 10 listeners with normal hearing (NH) participated in a suite of tasks that included spectral ripple discrimination (SRD), temporal modulation detection (TMD), and syllable categorization, which was split into a spectral-cue-based task (targeting the /ba/-/da/ contrast) and a timing-cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated in order to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression in order to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for CI listeners. Results CI users were generally less successful at utilizing both spectral and temporal cues for categorization compared to listeners with normal hearing. For the CI listener group, SRD was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. TMD using 100 Hz and 10 Hz modulated noise was not correlated with the CI subjects’ categorization of VOT, nor with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. Conclusions When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart non-linguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (VOT) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language. PMID:27438871
A measure for assessing the effects of audiovisual speech integration.
Altieri, Nicholas; Townsend, James T; Wenger, Michael J
2014-06-01
We propose a measure of audiovisual speech integration that takes into account accuracy and response times. This measure should prove beneficial for researchers investigating multisensory speech recognition, since it relates to normal-hearing and aging populations. As an example, age-related sensory decline influences both the rate at which one processes information and the ability to utilize cues from different sensory modalities. Our function assesses integration when both auditory and visual information are available, by comparing performance on these audiovisual trials with theoretical predictions for performance under the assumptions of parallel, independent self-terminating processing of single-modality inputs. We provide example data from an audiovisual identification experiment and discuss applications for measuring audiovisual integration skills across the life span.
Tuning Neural Phase Entrainment to Speech.
Falk, Simone; Lanzilotti, Cosima; Schön, Daniele
2017-08-01
Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.
Basirat, Anahita
2017-01-01
Cochlear implant (CI) users frequently achieve good speech understanding based on phoneme and word recognition. However, there is a significant variability between CI users in processing prosody. The aim of this study was to examine the abilities of an excellent CI user to segment continuous speech using intonational cues. A post-lingually deafened adult CI user and 22 normal hearing (NH) subjects segmented phonemically identical and prosodically different sequences in French such as 'l'affiche' (the poster) versus 'la fiche' (the sheet), both [lafiʃ]. All participants also completed a minimal pair discrimination task. Stimuli were presented in auditory-only and audiovisual presentation modalities. The performance of the CI user in the minimal pair discrimination task was 97% in the auditory-only and 100% in the audiovisual condition. In the segmentation task, contrary to the NH participants, the performance of the CI user did not differ from the chance level. Visual speech did not improve word segmentation. This result suggests that word segmentation based on intonational cues is challenging when using CIs even when phoneme/word recognition is very well rehabilitated. This finding points to the importance of the assessment of CI users' skills in prosody processing and the need for specific interventions focusing on this aspect of speech communication.
How musical expertise shapes speech perception: evidence from auditory classification images.
Varnet, Léo; Wang, Tianyun; Peter, Chloe; Meunier, Fanny; Hoen, Michel
2015-09-24
It is now well established that extensive musical training percolates to higher levels of cognition, such as speech processing. However, the lack of a precise technique to investigate the specific listening strategy involved in speech comprehension has made it difficult to determine how musicians' higher performance in non-speech tasks contributes to their enhanced speech comprehension. The recently developed Auditory Classification Image approach reveals the precise time-frequency regions used by participants when performing phonemic categorizations in noise. Here we used this technique on 19 non-musicians and 19 professional musicians. We found that both groups used very similar listening strategies, but the musicians relied more heavily on the two main acoustic cues, at the first formant onset and at the onsets of the second and third formants onsets. Additionally, they responded more consistently to stimuli. These observations provide a direct visualization of auditory plasticity resulting from extensive musical training and shed light on the level of functional transfer between auditory processing and speech perception.
The role of left inferior frontal cortex during audiovisual speech perception in infants.
Altvater-Mackensen, Nicole; Grossmann, Tobias
2016-06-01
In the first year of life, infants' speech perception attunes to their native language. While the behavioral changes associated with native language attunement are fairly well mapped, the underlying mechanisms and neural processes are still only poorly understood. Using fNIRS and eye tracking, the current study investigated 6-month-old infants' processing of audiovisual speech that contained matching or mismatching auditory and visual speech cues. Our results revealed that infants' speech-sensitive brain responses in inferior frontal brain regions were lateralized to the left hemisphere. Critically, our results further revealed that speech-sensitive left inferior frontal regions showed enhanced responses to matching when compared to mismatching audiovisual speech, and that infants with a preference to look at the speaker's mouth showed an enhanced left inferior frontal response to speech compared to infants with a preference to look at the speaker's eyes. These results suggest that left inferior frontal regions play a crucial role in associating information from different modalities during native language attunement, fostering the formation of multimodal phonological categories. Copyright © 2016 Elsevier Inc. All rights reserved.
Treating speech subsystems in childhood apraxia of speech with tactual input: the PROMPT approach.
Dale, Philip S; Hayden, Deborah A
2013-11-01
Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT; Hayden, 2004; Hayden, Eigen, Walker, & Olsen, 2010)-a treatment approach for the improvement of speech sound disorders in children-uses tactile-kinesthetic- proprioceptive (TKP) cues to support and shape movements of the oral articulators. No research to date has systematically examined the efficacy of PROMPT for children with childhood apraxia of speech (CAS). Four children (ages 3;6 [years;months] to 4;8), all meeting the American Speech-Language-Hearing Association (2007) criteria for CAS, were treated using PROMPT. All children received 8 weeks of 2 × per week treatment, including at least 4 weeks of full PROMPT treatment that included TKP cues. During the first 4 weeks, 2 of the 4 children received treatment that included all PROMPT components except TKP cues. This design permitted both between-subjects and within-subjects comparisons to evaluate the effect of TKP cues. Gains in treatment were measured by standardized tests and by criterion-referenced measures based on the production of untreated probe words, reflecting change in speech movements and auditory perceptual accuracy. All 4 children made significant gains during treatment, but measures of motor speech control and untreated word probes provided evidence for more gain when TKP cues were included. PROMPT as a whole appears to be effective for treating children with CAS, and the inclusion of TKP cues appears to facilitate greater effect.
The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise.
Shen, Jing; Souza, Pamela E
2017-09-18
This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for speech recognition in noise. Last, we explored the individual factors that predict the amount of dynamic-pitch benefit for speech recognition in noise. Younger listeners with normal hearing and older listeners with varying levels of hearing sensitivity participated in the study, in which speech reception thresholds were measured with sentences in nonspeech noise. The younger listeners benefited more from dynamic pitch for speech recognition in temporally modulated noise than unmodulated noise. Older listeners were able to benefit from the dynamic-pitch cues but received less benefit from noise modulation than the younger listeners. For those older listeners with hearing loss, the amount of hearing loss strongly predicted the dynamic-pitch benefit for speech recognition in noise. Dynamic-pitch cues aid speech recognition in noise, particularly when noise has temporal modulation. Hearing loss negatively affects the dynamic-pitch benefit to older listeners with significant hearing loss.
The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise
Souza, Pamela E.
2017-01-01
Purpose This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for speech recognition in noise. Last, we explored the individual factors that predict the amount of dynamic-pitch benefit for speech recognition in noise. Method Younger listeners with normal hearing and older listeners with varying levels of hearing sensitivity participated in the study, in which speech reception thresholds were measured with sentences in nonspeech noise. Results The younger listeners benefited more from dynamic pitch for speech recognition in temporally modulated noise than unmodulated noise. Older listeners were able to benefit from the dynamic-pitch cues but received less benefit from noise modulation than the younger listeners. For those older listeners with hearing loss, the amount of hearing loss strongly predicted the dynamic-pitch benefit for speech recognition in noise. Conclusions Dynamic-pitch cues aid speech recognition in noise, particularly when noise has temporal modulation. Hearing loss negatively affects the dynamic-pitch benefit to older listeners with significant hearing loss. PMID:28800370
Perception of Speech Modulation Cues by 6-Month-Old Infants
ERIC Educational Resources Information Center
Cabrera, Laurianne; Bertoncini, Josiane; Lorenzi, Christian
2013-01-01
Purpose: The capacity of 6-month-old infants to discriminate a voicing contrast (/aba/--/apa/) on the basis of "amplitude modulation (AM) cues" and "frequency modulation (FM) cues" was evaluated. Method: Several vocoded speech conditions were designed to either degrade FM cues in 4 or 32 bands or degrade AM in 32 bands. Infants…
Schreitmüller, Stefan; Frenken, Miriam; Bentz, Lüder; Ortmann, Magdalene; Walger, Martin; Meister, Hartmut
Watching a talker's mouth is beneficial for speech reception (SR) in many communication settings, especially in noise and when hearing is impaired. Measures for audiovisual (AV) SR can be valuable in the framework of diagnosing or treating hearing disorders. This study addresses the lack of standardized methods in many languages for assessing lipreading, AV gain, and integration. A new method is validated that supplements a German speech audiometric test with visualizations of the synthetic articulation of an avatar that was used, for it is feasible to lip-sync auditory speech in a highly standardized way. Three hypotheses were formed according to the literature on AV SR that used live or filmed talkers. It was tested whether respective effects could be reproduced with synthetic articulation: (1) cochlear implant (CI) users have a higher visual-only SR than normal-hearing (NH) individuals, and younger individuals obtain higher lipreading scores than older persons. (2) Both CI and NH gain from presenting AV over unimodal (auditory or visual) sentences in noise. (3) Both CI and NH listeners efficiently integrate complementary auditory and visual speech features. In a controlled, cross-sectional study with 14 experienced CI users (mean age 47.4) and 14 NH individuals (mean age 46.3, similar broad age distribution), lipreading, AV gain, and integration of a German matrix sentence test were assessed. Visual speech stimuli were synthesized by the articulation of the Talking Head system "MASSY" (Modular Audiovisual Speech Synthesizer), which displayed standardized articulation with respect to the visibility of German phones. In line with the hypotheses and previous literature, CI users had a higher mean visual-only SR than NH individuals (CI, 38%; NH, 12%; p < 0.001). Age was correlated with lipreading such that within each group, younger individuals obtained higher visual-only scores than older persons (rCI = -0.54; p = 0.046; rNH = -0.78; p < 0.001). Both CI and NH benefitted by AV over unimodal speech as indexed by calculations of the measures visual enhancement and auditory enhancement (each p < 0.001). Both groups efficiently integrated complementary auditory and visual speech features as indexed by calculations of the measure integration enhancement (each p < 0.005). Given the good agreement between results from literature and the outcome of supplementing an existing validated auditory test with synthetic visual cues, the introduced method can be considered an interesting candidate for clinical and scientific applications to assess measures important for AV SR in a standardized manner. This could be beneficial for optimizing the diagnosis and treatment of individual listening and communication disorders, such as cochlear implantation.
ERIC Educational Resources Information Center
Campbell, Daniel J.; Shic, Frederick; Macari, Suzanne; Chawarska, Katarzyna
2014-01-01
Variability in attention towards direct gaze and child-directed speech may contribute to heterogeneity of clinical presentation in toddlers with autism spectrum disorders (ASD). To evaluate this hypothesis, we clustered sixty-five 20-month-old toddlers with ASD based on their visual responses to dyadic cues for engagement, identifying three…
Davis, Chris; Kislyuk, Daniel; Kim, Jeesun; Sams, Mikko
2008-11-25
We used whole-head magnetoencephalograpy (MEG) to record changes in neuromagnetic N100m responses generated in the left and right auditory cortex as a function of the match between visual and auditory speech signals. Stimuli were auditory-only (AO) and auditory-visual (AV) presentations of /pi/, /ti/ and /vi/. Three types of intensity matched auditory stimuli were used: intact speech (Normal), frequency band filtered speech (Band) and speech-shaped white noise (Noise). The behavioural task was to detect the /vi/ syllables which comprised 12% of stimuli. N100m responses were measured to averaged /pi/ and /ti/ stimuli. Behavioural data showed that identification of the stimuli was faster and more accurate for Normal than for Band stimuli, and for Band than for Noise stimuli. Reaction times were faster for AV than AO stimuli. MEG data showed that in the left hemisphere, N100m to both AO and AV stimuli was largest for the Normal, smaller for Band and smallest for Noise stimuli. In the right hemisphere, Normal and Band AO stimuli elicited N100m responses of quite similar amplitudes, but N100m amplitude to Noise was about half of that. There was a reduction in N100m for the AV compared to the AO conditions. The size of this reduction for each stimulus type was same in the left hemisphere but graded in the right (being largest to the Normal, smaller to the Band and smallest to the Noise stimuli). The N100m decrease for the Normal stimuli was significantly larger in the right than in the left hemisphere. We suggest that the effect of processing visual speech seen in the right hemisphere likely reflects suppression of the auditory response based on AV cues for place of articulation.
Léger, Agnès C.; Reed, Charlotte M.; Desloge, Joseph G.; Swaminathan, Jayaganesh; Braida, Louis D.
2015-01-01
Consonant-identification ability was examined in normal-hearing (NH) and hearing-impaired (HI) listeners in the presence of steady-state and 10-Hz square-wave interrupted speech-shaped noise. The Hilbert transform was used to process speech stimuli (16 consonants in a-C-a syllables) to present envelope cues, temporal fine-structure (TFS) cues, or envelope cues recovered from TFS speech. The performance of the HI listeners was inferior to that of the NH listeners both in terms of lower levels of performance in the baseline condition and in the need for higher signal-to-noise ratio to yield a given level of performance. For NH listeners, scores were higher in interrupted noise than in steady-state noise for all speech types (indicating substantial masking release). For HI listeners, masking release was typically observed for TFS and recovered-envelope speech but not for unprocessed and envelope speech. For both groups of listeners, TFS and recovered-envelope speech yielded similar levels of performance and consonant confusion patterns. The masking release observed for TFS and recovered-envelope speech may be related to level effects associated with the manner in which the TFS processing interacts with the interrupted noise signal, rather than to the contributions of TFS cues per se. PMID:26233038
Perception and the temporal properties of speech
NASA Astrophysics Data System (ADS)
Gordon, Peter C.
1991-11-01
Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.
Gated audiovisual speech identification in silence vs. noise: effects on time and accuracy
Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker
2013-01-01
This study investigated the degree to which audiovisual presentation (compared to auditory-only presentation) affected isolation point (IPs, the amount of time required for the correct identification of speech stimuli using a gating paradigm) in silence and noise conditions. The study expanded on the findings of Moradi et al. (under revision), using the same stimuli, but presented in an audiovisual instead of an auditory-only manner. The results showed that noise impeded the identification of consonants and words (i.e., delayed IPs and lowered accuracy), but not the identification of final words in sentences. In comparison with the previous study by Moradi et al., it can be concluded that the provision of visual cues expedited IPs and increased the accuracy of speech stimuli identification in both silence and noise. The implication of the results is discussed in terms of models for speech understanding. PMID:23801980
Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D.; Senn, Pascal
2013-01-01
Objective To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Methods Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280×720, 640×480, 320×240, 160×120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0–500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Results Higher frame rate (>7 fps), higher camera resolution (>640×480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Conclusion Webcameras have the potential to improve telecommunication of hearing-impaired individuals. PMID:23359119
ERIC Educational Resources Information Center
Varghese, Peter; Kalashnikova, Marina; Burnham, Denis
2018-01-01
Purpose: An important skill in the development of speech perception is to apply optimal weights to acoustic cues so that phonemic information is recovered from speech with minimum effort. Here, we investigated the development of acoustic cue weighting of amplitude rise time (ART) and formant rise time (FRT) cues in children as measured by mismatch…
Using speech sounds to test functional spectral resolution in listeners with cochlear implants
Winn, Matthew B.; Litovsky, Ruth Y.
2015-01-01
In this study, spectral properties of speech sounds were used to test functional spectral resolution in people who use cochlear implants (CIs). Specifically, perception of the /ba/-/da/ contrast was tested using two spectral cues: Formant transitions (a fine-resolution cue) and spectral tilt (a coarse-resolution cue). Higher weighting of the formant cues was used as an index of better spectral cue perception. Participants included 19 CI listeners and 10 listeners with normal hearing (NH), for whom spectral resolution was explicitly controlled using a noise vocoder with variable carrier filter widths to simulate electrical current spread. Perceptual weighting of the two cues was modeled with mixed-effects logistic regression, and was found to systematically vary with spectral resolution. The use of formant cues was greatest for NH listeners for unprocessed speech, and declined in the two vocoded conditions. Compared to NH listeners, CI listeners relied less on formant transitions, and more on spectral tilt. Cue-weighting results showed moderately good correspondence with word recognition scores. The current approach to testing functional spectral resolution uses auditory cues that are known to be important for speech categorization, and can thus potentially serve as the basis upon which CI processing strategies and innovations are tested. PMID:25786954
McMurray, Bob; Jongman, Allard
2012-01-01
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important is the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context-dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2880 fricative productions (Jongman, Wayland & Wong, 2000) spanning many talker- and vowel-contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values, and manipulated the information in the training set to contrast 1) models based on a small number of invariant cues; 2) models using all cues without compensation, and 3) models in which cues underwent compensation for contextual factors. Compensation was modeled by Computing Cues Relative to Expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners, and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed. PMID:21417542
Cracking the Language Code: Neural Mechanisms Underlying Speech Parsing
McNealy, Kristin; Mazziotta, John C.; Dapretto, Mirella
2013-01-01
Word segmentation, detecting word boundaries in continuous speech, is a critical aspect of language learning. Previous research in infants and adults demonstrated that a stream of speech can be readily segmented based solely on the statistical and speech cues afforded by the input. Using functional magnetic resonance imaging (fMRI), the neural substrate of word segmentation was examined on-line as participants listened to three streams of concatenated syllables, containing either statistical regularities alone, statistical regularities and speech cues, or no cues. Despite the participants’ inability to explicitly detect differences between the speech streams, neural activity differed significantly across conditions, with left-lateralized signal increases in temporal cortices observed only when participants listened to streams containing statistical regularities, particularly the stream containing speech cues. In a second fMRI study, designed to verify that word segmentation had implicitly taken place, participants listened to trisyllabic combinations that occurred with different frequencies in the streams of speech they just heard (“words,” 45 times; “partwords,” 15 times; “nonwords,” once). Reliably greater activity in left inferior and middle frontal gyri was observed when comparing words with partwords and, to a lesser extent, when comparing partwords with nonwords. Activity in these regions, taken to index the implicit detection of word boundaries, was positively correlated with participants’ rapid auditory processing skills. These findings provide a neural signature of on-line word segmentation in the mature brain and an initial model with which to study developmental changes in the neural architecture involved in processing speech cues during language learning. PMID:16855090
Developmental changes in sensitivity to vocal paralanguage
Friend, Margaret
2017-01-01
Developmental changes in children’s sensitivity to the role of acoustic variation in the speech stream in conveying speaker affect (vocal paralanguage) were examined. Four-, 7- and 10-year-olds heard utterances in three formats: low-pass filtered, reiterant, and normal speech. The availability of lexical and paralinguistic information varied across these three formats in a way that required children to base their judgments of speaker affect on different configurations of cues in each format. Across ages, the best performance was obtained when a rich array of acoustic cues was present and when there was no competing lexical information. Four-year-olds performed at chance when judgments had to be based solely on speech prosody in the filtered format and they were unable to selectively attend to paralanguage when discrepant lexical cues were present in normal speech. Seven-year-olds were significantly more sensitive to the paralinguistic role of speech prosody in filtered speech than were 4-year-olds and there was a trend toward greater attention to paralanguage when lexical and paralinguistic cues were inconsistent in normal speech. An integration of the ability to utilize prosodic cues to speaker affect with attention to paralanguage in cases of lexical/paralinguistic discrepancy was observed for 10-year-olds. The results are discussed in terms of the development of a perceptual bias emerging out of selective attention to language. PMID:28713218
Multimodal infant-directed communication: how caregivers combine tactile and linguistic cues.
Abu-Zhaya, Rana; Seidl, Amanda; Cristia, Alejandrina
2017-09-01
Both touch and speech independently have been shown to play an important role in infant development. However, little is known about how they may be combined in the input to the child. We examined the use of touch and speech together by having mothers read their 5-month-olds books about body parts and animals. Results suggest that speech+touch multimodal events are characterized by more exaggerated touch and speech cues. Further, our results suggest that maternal touches are aligned with speech and that mothers tend to touch their infants in locations that are congruent with names of body parts. Thus, our results suggest that tactile cues could potentially aid both infant word segmentation and word learning.
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion
Schutz, Michael
2017-01-01
Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy”) pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech. PMID:29249997
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion.
Schutz, Michael
2017-01-01
Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally "happy") pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers "trade off" cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music-widely recognized for its artistic significance-complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users
Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y.
2018-01-01
Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband “ripple” stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects’ spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. PMID:28601530
ERIC Educational Resources Information Center
Lent, Robert W.; And Others
1981-01-01
The efficacy of cue-controlled desensitization and systematic rational restructuring was compared with a placebo method and a waiting-list control in reducing public speaking and nontargeted anxieties. Cue-controlled desensitization was generally more effective than the other groups in reducing subjective speech anxiety. (Author)
ERIC Educational Resources Information Center
Russell, Richard K.; Wise, Fred
1976-01-01
This investigation compared the relative effectiveness of group-administered cue-controlled relaxation and group systematic desensitization in the treatment of speech anxiety. Also examined was the role of professional versus paraprofessional counselors in implementing the treatment program. A description of the cue-controlled relaxation technique…
Implicit Processing of Phonotactic Cues: Evidence from Electrophysiological and Vascular Responses
ERIC Educational Resources Information Center
Rossi, Sonja; Jurgenson, Ina B.; Hanulikova, Adriana; Telkemeyer, Silke; Wartenburger, Isabell; Obrig, Hellmuth
2011-01-01
Spoken word recognition is achieved via competition between activated lexical candidates that match the incoming speech input. The competition is modulated by prelexical cues that are important for segmenting the auditory speech stream into linguistic units. One such prelexical cue that listeners rely on in spoken word recognition is phonotactics.…
de Taillez, Tobias; Grimm, Giso; Kollmeier, Birger; Neher, Tobias
2018-06-01
To investigate the influence of an algorithm designed to enhance or magnify interaural difference cues on speech signals in noisy, spatially complex conditions using both technical and perceptual measurements. To also investigate the combination of interaural magnification (IM), monaural microphone directionality (DIR), and binaural coherence-based noise reduction (BC). Speech-in-noise stimuli were generated using virtual acoustics. A computational model of binaural hearing was used to analyse the spatial effects of IM. Predicted speech quality changes and signal-to-noise-ratio (SNR) improvements were also considered. Additionally, a listening test was carried out to assess speech intelligibility and quality. Listeners aged 65-79 years with and without sensorineural hearing loss (N = 10 each). IM increased the horizontal separation of concurrent directional sound sources without introducing any major artefacts. In situations with diffuse noise, however, the interaural difference cues were distorted. Preprocessing the binaural input signals with DIR reduced distortion. IM influenced neither speech intelligibility nor speech quality. The IM algorithm tested here failed to improve speech perception in noise, probably because of the dispersion and inconsistent magnification of interaural difference cues in complex environments.
Amplitude Rise Time Does Not Cue the /bɑ/–/wɑ/ Contrast for Adults or Children
Nittrouer, Susan; Lowenstein, Joanna H.; Tarr, Eric
2013-01-01
Purpose Previous research has demonstrated that children weight the acoustic cues to many phonemic decisions differently than do adults and gradually shift those strategies as they gain language experience. However, that research has focused on spectral and duration cues rather than on amplitude cues. In the current study, the authors examined amplitude rise time (ART; an amplitude cue) and formant rise time (FRT; a spectral cue) in the /bɑ/–/wɑ/ manner contrast for adults and children, and related those speech decisions to outcomes of nonspeech discrimination tasks. Method Twenty adults and 30 children (ages 4–5 years) labeled natural and synthetic speech stimuli manipulated to vary ARTs and FRTs, and discriminated nonspeech analogs that varied only by ART in an AX paradigm. Results Three primary results were obtained. First, listeners in both age groups based speech labeling judgments on FRT, not on ART. Second, the fundamental frequency of the natural speech samples did not influence labeling judgments. Third, discrimination performance for the nonspeech stimuli did not predict how listeners would perform with the speech stimuli. Conclusion Even though both adults and children are sensitive to ART, it was not weighted in phonemic judgments by these typical listeners. PMID:22992704
Acoustic differences between humorous and sincere communicative intentions.
Hoicka, Elena; Gattis, Merideth
2012-11-01
Previous studies indicate that the acoustic features of speech discriminate between positive and negative communicative intentions, such as approval and prohibition. Two studies investigated whether acoustic features of speech can discriminate between two positive communicative intentions: humour and sweet-sincerity, where sweet-sincerity involved being sincere in a positive, warm-hearted way. In Study 1, 22 mothers read a book containing humorous, sweet-sincere, and neutral-sincere images to their 19- to 24-month-olds. In Study 2, 41 mothers read a book containing humorous or sweet-sincere sentences and images to their 18- to 24-month-olds. Mothers used a higher mean F0 to communicate visual humour as compared to visual sincerity. Mothers used greater F0 mean, range, and standard deviation; greater intensity mean, range, and standard deviation; and a slower speech rate to communicate verbal humour as compared to verbal sweet-sincerity. Mothers used a rising linear contour to communicate verbal humour, but used no specific contour to express verbal sweet-sincerity. We conclude that speakers provide acoustic cues enabling listeners to distinguish between positive communicative intentions. ©2011 The British Psychological Society.
Cross-Frequency Integration for Consonant and Vowel Identification in Bimodal Hearing
ERIC Educational Resources Information Center
Kong, Ying-Yee; Braida, Louis D.
2011-01-01
Purpose: Improved speech recognition in binaurally combined acoustic-electric stimulation (otherwise known as "bimodal hearing") could arise when listeners integrate speech cues from the acoustic and electric hearing. The aims of this study were (a) to identify speech cues extracted in electric hearing and residual acoustic hearing in the…
Co-occurrence statistics as a language-dependent cue for speech segmentation.
Saksida, Amanda; Langus, Alan; Nespor, Marina
2017-05-01
To what extent can language acquisition be explained in terms of different associative learning mechanisms? It has been hypothesized that distributional regularities in spoken languages are strong enough to elicit statistical learning about dependencies among speech units. Distributional regularities could be a useful cue for word learning even without rich language-specific knowledge. However, it is not clear how strong and reliable the distributional cues are that humans might use to segment speech. We investigate cross-linguistic viability of different statistical learning strategies by analyzing child-directed speech corpora from nine languages and by modeling possible statistics-based speech segmentations. We show that languages vary as to which statistical segmentation strategies are most successful. The variability of the results can be partially explained by systematic differences between languages, such as rhythmical differences. The results confirm previous findings that different statistical learning strategies are successful in different languages and suggest that infants may have to primarily rely on non-statistical cues when they begin their process of speech segmentation. © 2016 John Wiley & Sons Ltd.
Dietz, Mathias; Hohmann, Volker; Jürgens, Tim
2015-01-01
For normal-hearing listeners, speech intelligibility improves if speech and noise are spatially separated. While this spatial release from masking has already been quantified in normal-hearing listeners in many studies, it is less clear how spatial release from masking changes in cochlear implant listeners with and without access to low-frequency acoustic hearing. Spatial release from masking depends on differences in access to speech cues due to hearing status and hearing device. To investigate the influence of these factors on speech intelligibility, the present study measured speech reception thresholds in spatially separated speech and noise for 10 different listener types. A vocoder was used to simulate cochlear implant processing and low-frequency filtering was used to simulate residual low-frequency hearing. These forms of processing were combined to simulate cochlear implant listening, listening based on low-frequency residual hearing, and combinations thereof. Simulated cochlear implant users with additional low-frequency acoustic hearing showed better speech intelligibility in noise than simulated cochlear implant users without acoustic hearing and had access to more spatial speech cues (e.g., higher binaural squelch). Cochlear implant listener types showed higher spatial release from masking with bilateral access to low-frequency acoustic hearing than without. A binaural speech intelligibility model with normal binaural processing showed overall good agreement with measured speech reception thresholds, spatial release from masking, and spatial speech cues. This indicates that differences in speech cues available to listener types are sufficient to explain the changes of spatial release from masking across these simulated listener types. PMID:26721918
Moradi, Shahram; Lidestam, Björn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Rönnberg, Jerker
2017-09-18
We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels-in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands-in listeners with hearing impairment using hearing aids. The study comprised 199 participants with hearing impairment (mean age = 61.1 years) with bilateral, symmetrical, mild-to-severe sensorineural hearing loss. Gated Swedish consonants and vowels were presented aurally and audiovisually to participants. Linear amplification was adjusted for each participant to assure audibility. The reading span test was used to measure participants' working memory capacity. Audiovisual presentation resulted in shortened isolation points and improved accuracy for consonants and vowels relative to auditory-only presentation. This benefit was more evident for consonants than vowels. In addition, correlations and subsequent analyses revealed that listeners with higher scores on the reading span test identified both consonants and vowels earlier in auditory-only presentation, but only vowels (not consonants) in audiovisual presentation. Consonants and vowels differed in terms of the benefits afforded from their associative visual cues, as indicated by the degree of audiovisual benefit and reduction in cognitive demands linked to the identification of consonants and vowels presented audiovisually.
Effects of Attention Cueing on Learning Speech Organ Operation through Mobile Phones
ERIC Educational Resources Information Center
Yang, Hui-Yu
2017-01-01
The studies regarding using a cross sectional view of speech organs enriched with attention cueing and written text to probe learners' learning efficiency and behavior through mobile phones is scant. The purpose of this study was to examine whether the presence of attention cueing can benefit learners with different amounts of prior knowledge in…
NASA Technical Reports Server (NTRS)
Wenzel, Elizabeth M.; Fisher, Scott S.; Stone, Philip K.; Foster, Scott H.
1991-01-01
The real time acoustic display capabilities are described which were developed for the Virtual Environment Workstation (VIEW) Project at NASA-Ames. The acoustic display is capable of generating localized acoustic cues in real time over headphones. An auditory symbology, a related collection of representational auditory 'objects' or 'icons', can be designed using ACE (Auditory Cue Editor), which links both discrete and continuously varying acoustic parameters with information or events in the display. During a given display scenario, the symbology can be dynamically coordinated in real time with 3-D visual objects, speech, and gestural displays. The types of displays feasible with the system range from simple warnings and alarms to the acoustic representation of multidimensional data or events.
NASA Astrophysics Data System (ADS)
Wenzel, Elizabeth M.; Fisher, Scott S.; Stone, Philip K.; Foster, Scott H.
1991-03-01
The real time acoustic display capabilities are described which were developed for the Virtual Environment Workstation (VIEW) Project at NASA-Ames. The acoustic display is capable of generating localized acoustic cues in real time over headphones. An auditory symbology, a related collection of representational auditory 'objects' or 'icons', can be designed using ACE (Auditory Cue Editor), which links both discrete and continuously varying acoustic parameters with information or events in the display. During a given display scenario, the symbology can be dynamically coordinated in real time with 3-D visual objects, speech, and gestural displays. The types of displays feasible with the system range from simple warnings and alarms to the acoustic representation of multidimensional data or events.
ERIC Educational Resources Information Center
Pons, Ferran; Biesanz, Jeremy C.; Kajikawa, Sachiyo; Fais, Laurel; Narayan, Chandan R.; Amano, Shigeaki; Werker, Janet F.
2012-01-01
Using an artificial language learning manipulation, Maye, Werker, and Gerken (2002) demonstrated that infants' speech sound categories change as a function of the distributional properties of the input. In a recent study, Werker et al. (2007) showed that Infant-directed Speech (IDS) input contains reliable acoustic cues that support distributional…
The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise
ERIC Educational Resources Information Center
Shen, Jung; Souza, Pamela E.
2017-01-01
Purpose: This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for…
François, Clément; Cunillera, Toni; Garcia, Enara; Laine, Matti; Rodriguez-Fornells, Antoni
2017-04-01
Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning. Copyright © 2016 Elsevier Ltd. All rights reserved.
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users.
Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y
2017-08-01
Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband "ripple" stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects' spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Seeing Emotion with Your Ears: Emotional Prosody Implicitly Guides Visual Attention to Faces
Rigoulot, Simon; Pell, Marc D.
2012-01-01
Interpersonal communication involves the processing of multimodal emotional cues, particularly facial expressions (visual modality) and emotional speech prosody (auditory modality) which can interact during information processing. Here, we investigated whether the implicit processing of emotional prosody systematically influences gaze behavior to facial expressions of emotion. We analyzed the eye movements of 31 participants as they scanned a visual array of four emotional faces portraying fear, anger, happiness, and neutrality, while listening to an emotionally-inflected pseudo-utterance (Someone migged the pazing) uttered in a congruent or incongruent tone. Participants heard the emotional utterance during the first 1250 milliseconds of a five-second visual array and then performed an immediate recall decision about the face they had just seen. The frequency and duration of first saccades and of total looks in three temporal windows ([0–1250 ms], [1250–2500 ms], [2500–5000 ms]) were analyzed according to the emotional content of faces and voices. Results showed that participants looked longer and more frequently at faces that matched the prosody in all three time windows (emotion congruency effect), although this effect was often emotion-specific (with greatest effects for fear). Effects of prosody on visual attention to faces persisted over time and could be detected long after the auditory information was no longer present. These data imply that emotional prosody is processed automatically during communication and that these cues play a critical role in how humans respond to related visual cues in the environment, such as facial expressions. PMID:22303454
Visual communication and the content and style of conversation.
Rutter, D R; Stephenson, G M; Dewey, M E
1981-02-01
Previous research suggests that visual communication plays a number of important roles in social interaction. In particular, it appears to influence the content of what people say in discussions, the style of their speech, and the outcomes they reach. However, the findings are based exclusively on comparisons between face-to-face conversations and audio conversations, in which subjects sit in separate rooms and speak over a microphone-headphone intercom which precludes visual communication. Interpretation is difficult, because visual communication is confounded with physical presence, which itself makes available certain cues denied to audio subjects. The purpose of this paper is to report two experiments in which the variables were separated and content and style were re-examined. The first made use of blind subjects, and again compared the face-to-face and audio conditions. The second returned to sighted subjects, and examined four experimental conditions: face-to-face; audio; a curtain condition in which subjects sat in the same room but without visual communication; and a video condition in which they sat in separate rooms and communicated over a television link. Neither visual communication nor physical presence proved to be critical variable. Instead, the two sources of cues combined, such that content and style were influenced by the aggregate of available cues. The more cueless the settings, the more task-oriented, depersonalized and unspontaneous the conversation. The findings also suggested that the primary effect of cuelessness is to influence verbal content, and that its influence on both style and outcome occurs indirectly, through the mediation of content.
Integrating mechanisms of visual guidance in naturalistic language production.
Coco, Moreno I; Keller, Frank
2015-05-01
Situated language production requires the integration of visual attention and linguistic processing. Previous work has not conclusively disentangled the role of perceptual scene information and structural sentence information in guiding visual attention. In this paper, we present an eye-tracking study that demonstrates that three types of guidance, perceptual, conceptual, and structural, interact to control visual attention. In a cued language production experiment, we manipulate perceptual (scene clutter) and conceptual guidance (cue animacy) and measure structural guidance (syntactic complexity of the utterance). Analysis of the time course of language production, before and during speech, reveals that all three forms of guidance affect the complexity of visual responses, quantified in terms of the entropy of attentional landscapes and the turbulence of scan patterns, especially during speech. We find that perceptual and conceptual guidance mediate the distribution of attention in the scene, whereas structural guidance closely relates to scan pattern complexity. Furthermore, the eye-voice span of the cued object and its perceptual competitor are similar; its latency mediated by both perceptual and structural guidance. These results rule out a strict interpretation of structural guidance as the single dominant form of visual guidance in situated language production. Rather, the phase of the task and the associated demands of cross-modal cognitive processing determine the mechanisms that guide attention.
Audiovisual perceptual learning with multiple speakers.
Mitchel, Aaron D; Gerfen, Chip; Weiss, Daniel J
2016-05-01
One challenge for speech perception is between-speaker variability in the acoustic parameters of speech. For example, the same phoneme (e.g. the vowel in "cat") may have substantially different acoustic properties when produced by two different speakers and yet the listener must be able to interpret these disparate stimuli as equivalent. Perceptual tuning, the use of contextual information to adjust phonemic representations, may be one mechanism that helps listeners overcome obstacles they face due to this variability during speech perception. Here we test whether visual contextual cues to speaker identity may facilitate the formation and maintenance of distributional representations for individual speakers, allowing listeners to adjust phoneme boundaries in a speaker-specific manner. We familiarized participants to an audiovisual continuum between /aba/ and /ada/. During familiarization, the "b-face" mouthed /aba/ when an ambiguous token was played, while the "D-face" mouthed /ada/. At test, the same ambiguous token was more likely to be identified as /aba/ when paired with a stilled image of the "b-face" than with an image of the "D-face." This was not the case in the control condition when the two faces were paired equally with the ambiguous token. Together, these results suggest that listeners may form speaker-specific phonemic representations using facial identity cues.
ERIC Educational Resources Information Center
Huber, Jessica E.
2007-01-01
Purpose: This study examined the response of the respiratory system to 3 cues used to elicit increased vocal loudness to determine whether the effects of cueing, shown previously in sentence tasks, were present in connected speech tasks and to describe differences among tasks. Method: Fifteen young men and 15 young women produced a 2-paragraph…
Neger, Thordis M.; Rietveld, Toni; Janse, Esther
2014-01-01
Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly) draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech. In the present study, 73 older adults (aged over 60 years) and 60 younger adults (aged between 18 and 30 years) performed a visual artificial grammar learning task and were presented with 60 meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory, and processing speed). Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly. PMID:25225475
Neger, Thordis M; Rietveld, Toni; Janse, Esther
2014-01-01
Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly) draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech. In the present study, 73 older adults (aged over 60 years) and 60 younger adults (aged between 18 and 30 years) performed a visual artificial grammar learning task and were presented with 60 meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory, and processing speed). Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly.
Context modulates attention to social scenes in toddlers with autism
Chawarska, Katarzyna; Macari, Suzanne; Shic, Frederick
2013-01-01
Background In typical development, the unfolding of social and communicative skills hinges upon the ability to allocate and sustain attention towards people, a skill present moments after birth. Deficits in social attention have been well documented in autism, though the underlying mechanisms are poorly understood. Methods In order to parse the factors that are responsible for limited social attention in toddlers with autism, we manipulated the context in which a person appeared in their visual field with regard to the presence of salient social (child-directed speech and eye contact) and nonsocial (distractor toys) cues for attention. Participants included 13- to 25-month-old toddlers with autism (AUT; n=54), developmental delay (DD; n=22), and typical development (TD; n=48). Their visual responses were recorded with an eye-tracker. Results In conditions devoid of eye contact and speech, the distribution of attention between key features of the social scene in toddlers with autism was comparable to that in DD and TD controls. However, when explicit dyadic cues were introduced, toddlers with autism showed decreased attention to the entire scene and, when they looked at the scene, they spent less time looking at the speaker’s face and monitoring her lip movements than the control groups. In toddlers with autism, decreased time spent exploring the entire scene was associated with increased symptom severity and lower nonverbal functioning; atypical language profiles were associated with decreased monitoring of the speaker’s face and her mouth. Conclusions While in certain contexts toddlers with autism attend to people and objects in a typical manner, they show decreased attentional response to dyadic cues for attention. Given that mechanisms supporting responsivity to dyadic cues are present shortly after birth and are highly consequential for development of social cognition and communication, these findings have important implications for the understanding of the underlying mechanisms of limited social monitoring and identifying pivotal targets for treatment. PMID:22428993
Masson-Carro, Ingrid; Goudbeek, Martijn; Krahmer, Emiel
2016-10-01
Past research has sought to elucidate how speakers and addressees establish common ground in conversation, yet few studies have focused on how visual cues such as co-speech gestures contribute to this process. Likewise, the effect of cognitive constraints on multimodal grounding remains to be established. This study addresses the relationship between the verbal and gestural modalities during grounding in referential communication. We report data from a collaborative task where repeated references were elicited, and a time constraint was imposed to increase cognitive load. Our results reveal no differential effects of repetition or cognitive load on the semantic-based gesture rate, suggesting that representational gestures and speech are closely coordinated during grounding. However, gestures and speech differed in their execution, especially under time pressure. We argue that speech and gesture are two complementary streams that might be planned in conjunction but that unfold independently in later stages of language production, with speakers emphasizing the form of their gestures, but not of their words, to better meet the goals of the collaborative task. Copyright © 2016 Cognitive Science Society, Inc.
Alerting prefixes for speech warning messages. [in helicopters
NASA Technical Reports Server (NTRS)
Bucher, N. M.; Voorhees, J. W.; Karl, R. L.; Werner, E.
1984-01-01
A major question posed by the design of an integrated voice information display/warning system for next-generation helicopter cockpits is whether an alerting prefix should precede voice warning messages; if so, the characteristics desirable in such a cue must also be addressed. Attention is presently given to the results of a study which ascertained pilot response time and response accuracy to messages preceded by either neutral cues or the cognitively appropriate semantic cues. Both verbal cues and messages were spoken in direct, phoneme-synthesized speech, and a training manipulation was included to determine the extent to which previous exposure to speech thus produced facilitates these messages' comprehension. Results are discussed in terms of the importance of human factors research in cockpit display design.
Tanaka, Yukari; Fukushima, Hirokata; Okanoya, Kazuo; Myowa-Yamakoshi, Masako
2014-10-17
Social learning in infancy is known to be facilitated by multimodal (e.g., visual, tactile, and verbal) cues provided by caregivers. In parallel with infants' development, recent research has revealed that maternal neural activity is altered through interaction with infants, for instance, to be sensitive to infant-directed speech (IDS). The present study investigated the effect of mother- infant multimodal interaction on maternal neural activity. Event-related potentials (ERPs) of mothers were compared to non-mothers during perception of tactile-related words primed by tactile cues. Only mothers showed ERP modulation when tactile cues were incongruent with the subsequent words, and only when the words were delivered with IDS prosody. Furthermore, the frequency of mothers' use of those words was correlated with the magnitude of ERP differentiation between congruent and incongruent stimuli presentations. These results suggest that mother-infant daily interactions enhance multimodal integration of the maternal brain in parenting contexts.
Tanaka, Yukari; Fukushima, Hirokata; Okanoya, Kazuo; Myowa-Yamakoshi, Masako
2014-01-01
Social learning in infancy is known to be facilitated by multimodal (e.g., visual, tactile, and verbal) cues provided by caregivers. In parallel with infants' development, recent research has revealed that maternal neural activity is altered through interaction with infants, for instance, to be sensitive to infant-directed speech (IDS). The present study investigated the effect of mother- infant multimodal interaction on maternal neural activity. Event-related potentials (ERPs) of mothers were compared to non-mothers during perception of tactile-related words primed by tactile cues. Only mothers showed ERP modulation when tactile cues were incongruent with the subsequent words, and only when the words were delivered with IDS prosody. Furthermore, the frequency of mothers' use of those words was correlated with the magnitude of ERP differentiation between congruent and incongruent stimuli presentations. These results suggest that mother-infant daily interactions enhance multimodal integration of the maternal brain in parenting contexts. PMID:25322936
Rhythmic grouping biases constrain infant statistical learning
Hay, Jessica F.; Saffran, Jenny R.
2012-01-01
Linguistic stress and sequential statistical cues to word boundaries interact during speech segmentation in infancy. However, little is known about how the different acoustic components of stress constrain statistical learning. The current studies were designed to investigate whether intensity and duration each function independently as cues to initial prominence (trochaic-based hypothesis) or whether, as predicted by the Iambic-Trochaic Law (ITL), intensity and duration have characteristic and separable effects on rhythmic grouping (ITL-based hypothesis) in a statistical learning task. Infants were familiarized with an artificial language (Experiments 1 & 3) or a tone stream (Experiment 2) in which there was an alternation in either intensity or duration. In addition to potential acoustic cues, the familiarization sequences also contained statistical cues to word boundaries. In speech (Experiment 1) and non-speech (Experiment 2) conditions, 9-month-old infants demonstrated discrimination patterns consistent with an ITL-based hypothesis: intensity signaled initial prominence and duration signaled final prominence. The results of Experiment 3, in which 6.5-month-old infants were familiarized with the speech streams from Experiment 1, suggest that there is a developmental change in infants’ willingness to treat increased duration as a cue to word offsets in fluent speech. Infants’ perceptual systems interact with linguistic experience to constrain how infants learn from their auditory environment. PMID:23730217
Lateralization of Frequency-Specific Networks for Covert Spatial Attention to Auditory Stimuli
Thorpe, Samuel; D'Zmura, Michael
2011-01-01
We conducted a cued spatial attention experiment to investigate the time–frequency structure of human EEG induced by attentional orientation of an observer in external auditory space. Seven subjects participated in a task in which attention was cued to one of two spatial locations at left and right. Subjects were instructed to report the speech stimulus at the cued location and to ignore a simultaneous speech stream originating from the uncued location. EEG was recorded from the onset of the directional cue through the offset of the inter-stimulus interval (ISI), during which attention was directed toward the cued location. Using a wavelet spectrum, each frequency band was then normalized by the mean level of power observed in the early part of the cue interval to obtain a measure of induced power related to the deployment of attention. Topographies of band specific induced power during the cue and inter-stimulus intervals showed peaks over symmetric bilateral scalp areas. We used a bootstrap analysis of a lateralization measure defined for symmetric groups of channels in each band to identify specific lateralization events throughout the ISI. Our results suggest that the deployment and maintenance of spatially oriented attention throughout a period of 1,100 ms is marked by distinct episodes of reliable hemispheric lateralization ipsilateral to the direction in which attention is oriented. An early theta lateralization was evident over posterior parietal electrodes and was sustained throughout the ISI. In the alpha and mu bands punctuated episodes of parietal power lateralization were observed roughly 500 ms after attentional deployment, consistent with previous studies of visual attention. In the beta band these episodes show similar patterns of lateralization over frontal motor areas. These results indicate that spatial attention involves similar mechanisms in the auditory and visual modalities. PMID:21630112
Information from multiple modalities helps 5-month-olds learn abstract rules.
Frank, Michael C; Slemmer, Jonathan A; Marcus, Gary F; Johnson, Scott P
2009-07-01
By 7 months of age, infants are able to learn rules based on the abstract relationships between stimuli (Marcus et al., 1999), but they are better able to do so when exposed to speech than to some other classes of stimuli. In the current experiments we ask whether multimodal stimulus information will aid younger infants in identifying abstract rules. We habituated 5-month-olds to simple abstract patterns (ABA or ABB) instantiated in coordinated looming visual shapes and speech sounds (Experiment 1), shapes alone (Experiment 2), and speech sounds accompanied by uninformative but coordinated shapes (Experiment 3). Infants showed evidence of rule learning only in the presence of the informative multimodal cues. We hypothesize that the additional evidence present in these multimodal displays was responsible for the success of younger infants in learning rules, congruent with both a Bayesian account and with the Intersensory Redundancy Hypothesis.
Testing the influence of external and internal cues on smoking motivation using a community sample.
Litvin, Erika B; Brandon, Thomas H
2010-02-01
Exposing smokers to either external cues (e.g., pictures of cigarettes) or internal cues (e.g., negative affect induction) can induce urge to smoke and other behavioral and physiological responses. However, little is known about whether the two types of cues interact when presented in close proximity, as is likely the case in the real word. Additionally, potential moderators of cue reactivity have rarely been examined. Finally, few cue-reactivity studies have used representative samples of smokers. In a randomized 2 x 2 crossed factorial between-subjects design, the current study tested the effects of a negative affect cue intended to produce anxiety (speech preparation task) and an external smoking cue on urge and behavioral reactivity in a community sample of adult smokers (N = 175), and whether trait impulsivity moderated the effects. Both types of cues produced main effects on urges to smoke, despite the speech task failing to increase anxiety significantly. The speech task increased smoking urge related to anticipation of negative affect relief, whereas the external smoking cues increased urges related to anticipation of pleasure; however, the cues did not interact. Impulsivity measures predicted urge and other smoking-related variables, but did not moderate cue-reactivity. Results suggest independent rather than synergistic effects of these contributors to smoking motivation. (PsycINFO Database Record (c) 2010 APA, all rights reserved).
Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang
2018-05-01
Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
The Auditory-Visual Speech Benefit on Working Memory in Older Adults with Hearing Impairment
Frtusova, Jana B.; Phillips, Natalie A.
2016-01-01
This study examined the effect of auditory-visual (AV) speech stimuli on working memory in older adults with poorer-hearing (PH) in comparison to age- and education-matched older adults with better hearing (BH). Participants completed a working memory n-back task (0- to 2-back) in which sequences of digits were presented in visual-only (i.e., speech-reading), auditory-only (A-only), and AV conditions. Auditory event-related potentials (ERP) were collected to assess the relationship between perceptual and working memory processing. The behavioral results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the PH group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the PH group showed a more robust AV benefit; however, the BH group showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the PH group to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed. PMID:27148106
Bidelman, Gavin M
2016-10-01
Musical training is associated with behavioral and neurophysiological enhancements in auditory processing for both musical and nonmusical sounds (e.g., speech). Yet, whether the benefits of musicianship extend beyond enhancements to auditory-specific skills and impact multisensory (e.g., audiovisual) processing has yet to be fully validated. Here, we investigated multisensory integration of auditory and visual information in musicians and nonmusicians using a double-flash illusion, whereby the presentation of multiple auditory stimuli (beeps) concurrent with a single visual object (flash) induces an illusory perception of multiple flashes. We parametrically varied the onset asynchrony between auditory and visual events (leads and lags of ±300 ms) to quantify participants' "temporal window" of integration, i.e., stimuli in which auditory and visual cues were fused into a single percept. Results show that musically trained individuals were both faster and more accurate at processing concurrent audiovisual cues than their nonmusician peers; nonmusicians had a higher susceptibility for responding to audiovisual illusions and perceived double flashes over an extended range of onset asynchronies compared to trained musicians. Moreover, temporal window estimates indicated that musicians' windows (<100 ms) were ~2-3× shorter than nonmusicians' (~200 ms), suggesting more refined multisensory integration and audiovisual binding. Collectively, findings indicate a more refined binding of auditory and visual cues in musically trained individuals. We conclude that experience-dependent plasticity of intensive musical experience extends beyond simple listening skills, improving multimodal processing and the integration of multiple sensory systems in a domain-general manner.
The role of reverberation-related binaural cues in the externalization of speech.
Catic, Jasmina; Santurette, Sébastien; Dau, Torsten
2015-08-01
The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners' ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones. The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient for the externalization of a lateral sound source. In contrast, for a frontal source, an increased amount of binaural cues from reflections was required in order to obtain well externalized sound images. It was demonstrated that the interaction between the interaural cues of the direct sound and the reverberation strongly affects the perception of externalization. An analysis of the short-term binaural cues showed that the amount of fluctuations of the binaural cues corresponded well to the externalization ratings obtained in the listening tests. The results further suggested that the precedence effect is involved in the auditory processing of the dynamic binaural cues that are utilized for externalization perception.
Role of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening.
Swaminathan, Jayaganesh; Mason, Christine R; Streeter, Timothy M; Best, Virginia; Roverud, Elin; Kidd, Gerald
2016-08-03
While conversing in a crowded social setting, a listener is often required to follow a target speech signal amid multiple competing speech signals (the so-called "cocktail party" problem). In such situations, separation of the target speech signal in azimuth from the interfering masker signals can lead to an improvement in target intelligibility, an effect known as spatial release from masking (SRM). This study assessed the contributions of two stimulus properties that vary with separation of sound sources, binaural envelope (ENV) and temporal fine structure (TFS), to SRM in normal-hearing (NH) human listeners. Target speech was presented from the front and speech maskers were either colocated with or symmetrically separated from the target in azimuth. The target and maskers were presented either as natural speech or as "noise-vocoded" speech in which the intelligibility was conveyed only by the speech ENVs from several frequency bands; the speech TFS within each band was replaced with noise carriers. The experiments were designed to preserve the spatial cues in the speech ENVs while retaining/eliminating them from the TFS. This was achieved by using the same/different noise carriers in the two ears. A phenomenological auditory-nerve model was used to verify that the interaural correlations in TFS differed across conditions, whereas the ENVs retained a high degree of correlation, as intended. Overall, the results from this study revealed that binaural TFS cues, especially for frequency regions below 1500 Hz, are critical for achieving SRM in NH listeners. Potential implications for studying SRM in hearing-impaired listeners are discussed. Acoustic signals received by the auditory system pass first through an array of physiologically based band-pass filters. Conceptually, at the output of each filter, there are two principal forms of temporal information: slowly varying fluctuations in the envelope (ENV) and rapidly varying fluctuations in the temporal fine structure (TFS). The importance of these two types of information in everyday listening (e.g., conversing in a noisy social situation; the "cocktail-party" problem) has not been established. This study assessed the contributions of binaural ENV and TFS cues for understanding speech in multiple-talker situations. Results suggest that, whereas the ENV cues are important for speech intelligibility, binaural TFS cues are critical for perceptually segregating the different talkers and thus for solving the cocktail party problem. Copyright © 2016 the authors 0270-6474/16/368250-08$15.00/0.
Adaptive spatial filtering improves speech reception in noise while preserving binaural cues.
Bissmeyer, Susan R S; Goldsworthy, Raymond L
2017-09-01
Hearing loss greatly reduces an individual's ability to comprehend speech in the presence of background noise. Over the past decades, numerous signal-processing algorithms have been developed to improve speech reception in these situations for cochlear implant and hearing aid users. One challenge is to reduce background noise while not introducing interaural distortion that would degrade binaural hearing. The present study evaluates a noise reduction algorithm, referred to as binaural Fennec, that was designed to improve speech reception in background noise while preserving binaural cues. Speech reception thresholds were measured for normal-hearing listeners in a simulated environment with target speech generated in front of the listener and background noise originating 90° to the right of the listener. Lateralization thresholds were also measured in the presence of background noise. These measures were conducted in anechoic and reverberant environments. Results indicate that the algorithm improved speech reception thresholds, even in highly reverberant environments. Results indicate that the algorithm also improved lateralization thresholds for the anechoic environment while not affecting lateralization thresholds for the reverberant environments. These results provide clear evidence that this algorithm can improve speech reception in background noise while preserving binaural cues used to lateralize sound.
A proposed mechanism for rapid adaptation to spectrally distorted speech.
Azadpour, Mahan; Balaban, Evan
2015-07-01
The mechanisms underlying perceptual adaptation to severely spectrally-distorted speech were studied by training participants to comprehend spectrally-rotated speech, which is obtained by inverting the speech spectrum. Spectral-rotation produces severe distortion confined to the spectral domain while preserving temporal trajectories. During five 1-hour training sessions, pairs of participants attempted to extract spoken messages from the spectrally-rotated speech of their training partner. Data on training-induced changes in comprehension of spectrally-rotated sentences and identification/discrimination of spectrally-rotated phonemes were used to evaluate the plausibility of three different classes of underlying perceptual mechanisms: (1) phonemic remapping (the formation of new phonemic categories that specifically incorporate spectrally-rotated acoustic information); (2) experience-dependent generation of a perceptual "inverse-transform" that compensates for spectral-rotation; and (3) changes in cue weighting (the identification of sets of acoustic cues least affected by spectral-rotation, followed by a rapid shift in perceptual emphasis to favour those cues, combined with the recruitment of the same type of "perceptual filling-in" mechanisms used to disambiguate speech-in-noise). Results exclusively support the third mechanism, which is the only one predicting that learning would specifically target temporally-dynamic cues that were transmitting phonetic information most stably in spite of spectral-distortion. No support was found for phonemic remapping or for inverse-transform generation.
The Voice of Emotion: Acoustic Properties of Six Emotional Expressions.
NASA Astrophysics Data System (ADS)
Baldwin, Carol May
Studies in the perceptual identification of emotional states suggested that listeners seemed to depend on a limited set of vocal cues to distinguish among emotions. Linguistics and speech science literatures have indicated that this small set of cues included intensity, fundamental frequency, and temporal properties such as speech rate and duration. Little research has been done, however, to validate these cues in the production of emotional speech, or to determine if specific dimensions of each cue are associated with the production of a particular emotion for a variety of speakers. This study addressed deficiencies in understanding of the acoustical properties of duration and intensity as components of emotional speech by means of speech science instrumentation. Acoustic data were conveyed in a brief sentence spoken by twelve English speaking adult male and female subjects, half with dramatic training, and half without such training. Simulated expressions included: happiness, surprise, sadness, fear, anger, and disgust. The study demonstrated that the acoustic property of mean intensity served as an important cue for a vocal taxonomy. Overall duration was rejected as an element for a general taxonomy due to interactions involving gender and role. Findings suggested a gender-related taxonomy, however, based on differences in the ways in which men and women use the duration cue in their emotional expressions. Results also indicated that speaker training may influence greater use of the duration cue in expressions of emotion, particularly for male actors. Discussion of these results provided linkages to (1) practical management of emotional interactions in clinical and interpersonal environments, (2) implications for differences in the ways in which males and females may be socialized to express emotions, and (3) guidelines for future perceptual studies of emotional sensitivity.
Prior Knowledge Guides Speech Segregation in Human Auditory Cortex.
Wang, Yuanye; Zhang, Jianfeng; Zou, Jiajie; Luo, Huan; Ding, Nai
2018-05-18
Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.
Ambert-Dahan, Emmanuèle; Giraud, Anne-Lise; Mecheri, Halima; Sterkers, Olivier; Mosnier, Isabelle; Samson, Séverine
2017-10-01
Visual processing has been extensively explored in deaf subjects in the context of verbal communication, through the assessment of speech reading and sign language abilities. However, little is known about visual emotional processing in adult progressive deafness, and after cochlear implantation. The goal of our study was thus to assess the influence of acquired post-lingual progressive deafness on the recognition of dynamic facial emotions that were selected to express canonical fear, happiness, sadness, and anger. A total of 23 adults with post-lingual deafness separated into two groups; those assessed either before (n = 10) and those assessed after (n = 13) cochlear implantation (CI); and 13 normal hearing (NH) individuals participated in the current study. Participants were asked to rate the expression of the four cardinal emotions, and to evaluate both their emotional valence (unpleasant-pleasant) and arousal potential (relaxing-stimulating). We found that patients with deafness were impaired in the recognition of sad faces, and that patients equipped with a CI were additionally impaired in the recognition of happiness and fear (but not anger). Relative to controls, all patients with deafness showed a deficit in perceiving arousal expressed in faces, while valence ratings remained unaffected. The current results show for the first time that acquired and progressive deafness is associated with a reduction of emotional sensitivity to visual stimuli. This negative impact of progressive deafness on the perception of dynamic facial cues for emotion recognition contrasts with the proficiency of deaf subjects with and without CIs in processing visual speech cues (Rouger et al., 2007; Strelnikov et al., 2009; Lazard and Giraud, 2017). Altogether these results suggest there to be a trade-off between the processing of linguistic and non-linguistic visual stimuli. Copyright © 2017. Published by Elsevier B.V.
Index of FAA Office of Aviation Medicine Reports: 1961 through 1989
1990-01-01
ballistocardiographic research and the current state of the art . AD455651 64-13 Gogel, W. C.: The size cue to visually perceived distance. AD456655 64-14 Capps...aeromedical aspects of marihuana . AD775889 73-13 Tobias, J. V., and Irons, F. M.: Reception of distorted speech. AD777564 73-14 Thackray, R. I., Jones...Mertens, H. W., and Steen, J. A.: Interaction between marihuana and altitude on a complex behavioral task in baboons. ADA020680/5GI 75-7 Melton, C. E
Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel
2015-01-01
Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
The Timing and Effort of Lexical Access in Natural and Degraded Speech
Wagner, Anita E.; Toffanin, Paolo; Başkent, Deniz
2016-01-01
Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners’ quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal situations speech is perceived as an undemanding task. Degradation of the signal or the receiver channel can quickly bring this well-adjusted timing out of balance and lead to increase in mental effort. Incomplete and effortful processing at the early pre-lexical stages has its consequences on lexical processing as it adds uncertainty to the forming and revising of lexical hypotheses. PMID:27065901
Development of a test battery for evaluating speech perception in complex listening environments.
Brungart, Douglas S; Sheffield, Benjamin M; Kubli, Lina R
2014-08-01
In the real world, spoken communication occurs in complex environments that involve audiovisual speech cues, spatially separated sound sources, reverberant listening spaces, and other complicating factors that influence speech understanding. However, most clinical tools for assessing speech perception are based on simplified listening environments that do not reflect the complexities of real-world listening. In this study, speech materials from the QuickSIN speech-in-noise test by Killion, Niquette, Gudmundsen, Revit, and Banerjee [J. Acoust. Soc. Am. 116, 2395-2405 (2004)] were modified to simulate eight listening conditions spanning the range of auditory environments listeners encounter in everyday life. The standard QuickSIN test method was used to estimate 50% speech reception thresholds (SRT50) in each condition. A method of adjustment procedure was also used to obtain subjective estimates of the lowest signal-to-noise ratio (SNR) where the listeners were able to understand 100% of the speech (SRT100) and the highest SNR where they could detect the speech but could not understand any of the words (SRT0). The results show that the modified materials maintained most of the efficiency of the QuickSIN test procedure while capturing performance differences across listening conditions comparable to those reported in previous studies that have examined the effects of audiovisual cues, binaural cues, room reverberation, and time compression on the intelligibility of speech.
Poon, Matthew; Schutz, Michael
2015-01-01
Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music.
Familiar units prevail over statistical cues in word segmentation.
Poulin-Charronnat, Bénédicte; Perruchet, Pierre; Tillmann, Barbara; Peereman, Ronald
2017-09-01
In language acquisition research, the prevailing position is that listeners exploit statistical cues, in particular transitional probabilities between syllables, to discover words of a language. However, other cues are also involved in word discovery. Assessing the weight learners give to these different cues leads to a better understanding of the processes underlying speech segmentation. The present study evaluated whether adult learners preferentially used known units or statistical cues for segmenting continuous speech. Before the exposure phase, participants were familiarized with part-words of a three-word artificial language. This design allowed the dissociation of the influence of statistical cues and familiar units, with statistical cues favoring word segmentation and familiar units favoring (nonoptimal) part-word segmentation. In Experiment 1, performance in a two-alternative forced choice (2AFC) task between words and part-words revealed part-word segmentation (even though part-words were less cohesive in terms of transitional probabilities and less frequent than words). By contrast, an unfamiliarized group exhibited word segmentation, as usually observed in standard conditions. Experiment 2 used a syllable-detection task to remove the likely contamination of performance by memory and strategy effects in the 2AFC task. Overall, the results suggest that familiar units overrode statistical cues, ultimately questioning the need for computation mechanisms of transitional probabilities (TPs) in natural language speech segmentation.
Adapted cuing technique: facilitating sequential phoneme production.
Klick, S L
1994-09-01
ACT is a visual cuing technique designed to facilitate dyspraxic speech by highlighting the sequential production of phonemes. In using ACT, cues are presented in such a way as to suggest sequential, coarticulatory movement in an overall pattern of motion. While using ACT, the facilitator's hand moves forward and back along the side of her (or his) own face. Finger movements signal specific speech sounds in formations loosely based on the manual alphabet for the hearing impaired. The best movements suggest the flowing, interactive nature of coarticulated phonemes. The synergistic nature of speech is suggested by coordinated hand motions which tighten and relax, move quickly or slowly, reflecting the motions of the vocal tract at various points during production of phonemic sequences. General principles involved in using ACT include a primary focus on speech-in-motion, the monitoring and fading of cues, and the presentation of stimuli based on motor-task analysis of phonemic sequences. Phonemic sequences are cued along three dimensions: place, manner, and vowel-related mandibular motion. Cuing vowels is a central feature of ACT. Two parameters of vowel production, focal point of resonance and mandibular closure, are cued. The facilitator's hand motions reflect the changing shape of the vocal tract and the trajectory of the tongue that result from the coarticulation of vowels and consonants. Rigid presentation of the phonemes is secondary to the facilitator's primary focus on presenting the overall sequential movement. The facilitator's goal is to self-tailor ACT in response to the changing needs and abilities of the client.(ABSTRACT TRUNCATED AT 250 WORDS)
The role of temporal speech cues in facilitating the fluency of adults who stutter.
Park, Jin; Logan, Kenneth J
2015-12-01
Adults who stutter speak more fluently during choral speech contexts than they do during solo speech contexts. The underlying mechanisms for this effect remain unclear, however. In this study, we examined the extent to which the choral speech effect depended on presentation of intact temporal speech cues. We also examined whether speakers who stutter followed choral signals more closely than typical speakers did. 8 adults who stuttered and 8 adults who did not stutter read 60 sentences aloud during a solo speaking condition and three choral speaking conditions (240 total sentences), two of which featured either temporally altered or indeterminate word duration patterns. Effects of these manipulations on speech fluency, rate, and temporal entrainment with the choral speech signal were assessed. Adults who stutter spoke more fluently in all choral speaking conditions than they did when speaking solo. They also spoke slower and exhibited closer temporal entrainment with the choral signal during the mid- to late-stages of sentence production than the adults who did not stutter. Both groups entrained more closely with unaltered choral signals than they did with altered choral signals. Findings suggest that adults who stutter make greater use of speech-related information in choral signals when talking than adults with typical fluency do. The presence of fluency facilitation during temporally altered choral speech and conversation babble, however, suggests that temporal/gestural cueing alone cannot account for fluency facilitation in speakers who stutter. Other potential fluency enhancing mechanisms are discussed. The reader will be able to (a) summarize competing views on stuttering as a speech timing disorder, (b) describe the extent to which adults who stutter depend on an accurate rendering of temporal information in order to benefit from choral speech, and (c) discuss possible explanations for fluency facilitation in the presence of inaccurate or indeterminate temporal cues. Copyright © 2015 Elsevier Inc. All rights reserved.
The minor third communicates sadness in speech, mirroring its use in music.
Curtis, Meagan E; Bharucha, Jamshed J
2010-06-01
There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.
Toni, Ivan; Hagoort, Peter; Kelly, Spencer D.; Özyürek, Aslı
2015-01-01
Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture. Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech–gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts. PMID:24652857
Neural processing of amplitude and formant rise time in dyslexia.
Peter, Varghese; Kalashnikova, Marina; Burnham, Denis
2016-06-01
This study aimed to investigate how children with dyslexia weight amplitude rise time (ART) and formant rise time (FRT) cues in phonetic discrimination. Passive mismatch responses (MMR) were recorded for a/ba/-/wa/contrast in a multiple deviant odd-ball paradigm to identify the neural response to cue weighting in 17 children with dyslexia and 17 age-matched control children. The deviant stimuli had either partial or full ART or FRT cues. The results showed that ART did not generate an MMR in either group, whereas both partial and full FRT cues generated MMR in control children while only full FRT cues generated MMR in children with dyslexia. These findings suggest that children, both controls and those with dyslexia, discriminate speech based on FRT cues and not ART cues. However, control children have greater sensitivity to FRT cues in speech compared to children with dyslexia. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Cox, Robyn M; Alexander, Genevieve C; Johnson, Jani; Rivera, Izel
2011-01-01
We investigated the prevalence of cochlear dead regions in listeners with hearing losses similar to those of many hearing aid wearers, and explored the impact of these dead regions on speech perception. Prevalence of dead regions was assessed using the Threshold Equalizing Noise test (TEN(HL)). Speech recognition was measured using high-frequency emphasis (HFE) Quick Speech In Noise (QSIN) test stimuli and low-pass filtered HFE QSIN stimuli. About one third of subjects tested positive for a dead region at one or more frequencies. Also, groups without and with dead regions both benefited from additional high-frequency speech cues. PMID:21522068
Psychophysics of complex auditory and speech stimuli
NASA Astrophysics Data System (ADS)
Pastore, Richard E.
1993-10-01
A major focus on the primary project is the use of different procedures to provide converging evidence on the nature of perceptual spaces for speech categories. Completed research examined initial voiced consonants, with results providing strong evidence that different stimulus properties may cue a phoneme category in different vowel contexts. Thus, /b/ is cued by a rising second format (F2) with the vowel /a/, requiring both F2 and F3 to be rising with /i/, and is independent of the release burst for these vowels. Furthermore, cues for phonetic contrasts are not necessarily symmetric, and the strong dependence of prior speech research on classification procedures may have led to errors. Thus, the opposite (falling F2 and F3) transitions lead somewhat ambiguous percepts (i.e., not /b/) which may be labeled consistently (as /d/ or /g/), but requires a release burst to achieve high category quality and similarity to category exemplars. Ongoing research is examining cues in other vowel contexts and issuing procedures to evaluate the nature of interaction between cues for categories of both speech and music.
Brooks, Cassandra J.; Chan, Yu Man; Anderson, Andrew J.; McKendrick, Allison M.
2018-01-01
Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information. PMID:29867415
Brooks, Cassandra J; Chan, Yu Man; Anderson, Andrew J; McKendrick, Allison M
2018-01-01
Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information.
Neural pathways for visual speech perception
Bernstein, Lynne E.; Liebenthal, Einat
2014-01-01
This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611
Salomon, G; Parving, A
1985-01-01
It is reasoned that for compensation or epidemiological studies an evaluation of hearing disability and the concomitant handicap must include the ability to perceive visual cues. A scaling procedure for hearing- and audiovisual communication handicap is presented. The procedure deviates in two ways from previous handicap assessments: (1) It is based on individual self-assessment of semantic speech perception but can be implemented by means of professional audiological test procedures. (2) The system does not make use of pure-tone auditory thresholds as a predominant audiological principle, but is based on speech perception. The interrelationship between auditory and audiovisual handicap is evaluated. A total score including audio- and audiovisual perception handicap is proposed and a suggestion for disability percentages is presented.
Bone, Daniel; Lee, Chi-Chun; Black, Matthew P.; Williams, Marian E.; Lee, Sungbok; Levitt, Pat; Narayanan, Shrikanth
2015-01-01
Purpose The purpose of this study was to examine relationships between prosodic speech cues and autism spectrum disorder (ASD) severity, hypothesizing a mutually interactive relationship between the speech characteristics of the psychologist and the child. The authors objectively quantified acoustic-prosodic cues of the psychologist and of the child with ASD during spontaneous interaction, establishing a methodology for future large-sample analysis. Method Speech acoustic-prosodic features were semiautomatically derived from segments of semistructured interviews (Autism Diagnostic Observation Schedule, ADOS; Lord, Rutter, DiLavore, & Risi, 1999; Lord et al., 2012) with 28 children who had previously been diagnosed with ASD. Prosody was quantified in terms of intonation, volume, rate, and voice quality. Research hypotheses were tested via correlation as well as hierarchical and predictive regression between ADOS severity and prosodic cues. Results Automatically extracted speech features demonstrated prosodic characteristics of dyadic interactions. As rated ASD severity increased, both the psychologist and the child demonstrated effects for turn-end pitch slope, and both spoke with atypical voice quality. The psychologist’s acoustic cues predicted the child’s symptom severity better than did the child’s acoustic cues. Conclusion The psychologist, acting as evaluator and interlocutor, was shown to adjust his or her behavior in predictable ways based on the child’s social-communicative impairments. The results support future study of speech prosody of both interaction partners during spontaneous conversation, while using automatic computational methods that allow for scalable analysis on much larger corpora. PMID:24686340
Speech perception in medico-legal assessment of hearing disabilities.
Pedersen, Ellen Raben; Juhl, Peter Møller; Wetke, Randi; Andersen, Ture Dammann
2016-10-01
Examination of Danish data for medico-legal compensations regarding hearing disabilities. The study purposes are: (1) to investigate whether discrimination scores (DSs) relate to patients' subjective experience of their hearing and communication ability (the latter referring to audio-visual perception), (2) to compare DSs from different discrimination tests (auditory/audio-visual perception and without/with noise), and (3) to relate different handicap measures in the scaling used for compensation purposes in Denmark. Data from a 15 year period (1999-2014) were collected and analysed. The data set includes 466 patients, from which 50 were omitted due to suspicion of having exaggerated their hearing disabilities. The DSs relate well to the patients' subjective experience of their speech perception ability. By comparing DSs for different test setups it was found that adding noise entails a relatively more difficult listening condition than removing visual cues. The hearing and communication handicap degrees were found to agree, whereas the measured handicap degrees tended to be higher than the self-assessed handicap degrees. The DSs can be used to assess patients' hearing and communication abilities. The difference in the obtained handicap degrees emphasizes the importance of collecting self-assessed as well as measured handicap degrees.
Language development at 18 months is related to multimodal communicative strategies at 12 months.
Igualada, Alfonso; Bosch, Laura; Prieto, Pilar
2015-05-01
The present study investigated the degree to which an infants' use of simultaneous gesture-speech combinations during controlled social interactions predicts later language development. Nineteen infants participated in a declarative pointing task involving three different social conditions: two experimental conditions (a) available, when the adult was visually attending to the infant but did not attend to the object of reference jointly with the child, and (b) unavailable, when the adult was not visually attending to neither the infant nor the object; and (c) a baseline condition, when the adult jointly engaged with the infant's object of reference. At 12 months of age measures related to infants' speech-only productions, pointing-only gestures, and simultaneous pointing-speech combinations were obtained in each of the three social conditions. Each child's lexical and grammatical output was assessed at 18 months of age through parental report. Results revealed a significant interaction between social condition and type of communicative production. Specifically, only simultaneous pointing-speech combinations increased in frequency during the available condition compared to baseline, while no differences were found for speech-only and pointing-only productions. Moreover, simultaneous pointing-speech combinations in the available condition at 12 months positively correlated with lexical and grammatical development at 18 months of age. The ability to selectively use this multimodal communicative strategy to engage the adult in joint attention by drawing his attention toward an unseen event or object reveals 12-month-olds' clear understanding of referential cues that are relevant for language development. This strategy to successfully initiate and maintain joint attention is related to language development as it increases learning opportunities from social interactions. Copyright © 2015 Elsevier Inc. All rights reserved.
Poon, Matthew; Schutz, Michael
2015-01-01
Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound “happier” than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of “balanced” major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach’s Well-Tempered Clavier (book 1), as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma (“A,” “B,” “C,” etc.). Consistent with predictions derived from speech, we found major-key (nominally “happy”) pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally “sad”) pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music. PMID:26578990
The perception of stress and intonation in children with a cochlear implant and a hearing aid.
Hegarty, Lauren; Faulkner, Andrew
2013-11-01
This study investigated whether low frequency information from a hearing aid improved the perception of stress and intonation by English-speaking children with cochlear implants. As pitch information is limited for cochlear implant users, this study also investigated if users rely more on the cues of duration and amplitude to perceive stress and intonation. Nine children with bimodal stimulation (cochlear implant and hearing aid) participated in two experiments. The first measured the just audible change in F0 (pitch) and amplitude for a speech-like word 'baba'. The second experiment examined the children's ability to identify focus in natural and manipulated sentences. Overall, group results did not show a bimodal advantage in perceiving stress and intonation. However, the children were significantly better at perceiving focus in sentences with natural speech compared with manipulated speech in both the CI and bimodal conditions. The results suggest that in the absence of pitch cues, amplitude and duration cues are used to perceive stress and intonation. However, the majority of children only perceived amplitude changes greater than the changes typically found in speech, implying duration cues were the most valuable. Taken together the findings suggest that for children with cochlear implants, cues to F0 may not be essential for prosody perception and in the absence of cues to F0 and amplitude, duration may offer an alternative cue. Although a bimodal advantage was not demonstrated for all participants, it is recommended that if clinically appropriate, a contralateral hearing aid is fitted and trialled to exploit any residual hearing.
The Sound of Intellect: Speech Reveals a Thoughtful Mind, Increasing a Job Candidate's Appeal.
Schroeder, Juliana; Epley, Nicholas
2015-06-01
A person's mental capacities, such as intellect, cannot be observed directly and so are instead inferred from indirect cues. We predicted that a person's intellect would be conveyed most strongly through a cue closely tied to actual thinking: his or her voice. Hypothetical employers (Experiments 1-3b) and professional recruiters (Experiment 4) watched, listened to, or read job candidates' pitches about why they should be hired. These evaluators rated a candidate as more competent, thoughtful, and intelligent when they heard a pitch rather than read it and, as a result, had a more favorable impression of the candidate and were more interested in hiring the candidate. Adding voice to written pitches, by having trained actors (Experiment 3a) or untrained adults (Experiment 3b) read them, produced the same results. Adding visual cues to audio pitches did not alter evaluations of the candidates. For conveying one's intellect, it is important that one's voice, quite literally, be heard. © The Author(s) 2015.
ERIC Educational Resources Information Center
Bone, Daniel; Lee, Chi-Chun; Black, Matthew P.; Williams, Marian E.; Lee, Sungbok; Levitt, Pat; Narayanan, Shrikanth
2014-01-01
Purpose: The purpose of this study was to examine relationships between prosodic speech cues and autism spectrum disorder (ASD) severity, hypothesizing a mutually interactive relationship between the speech characteristics of the psychologist and the child. The authors objectively quantified acoustic-prosodic cues of the psychologist and of the…
A phone-assistive device based on Bluetooth technology for cochlear implant users.
Qian, Haifeng; Loizou, Philipos C; Dorman, Michael F
2003-09-01
Hearing-impaired people, and particularly hearing-aid and cochlear-implant users, often have difficulty communicating over the telephone. The intelligibility of telephone speech is considerably lower than the intelligibility of face-to-face speech. This is partly because of lack of visual cues, limited telephone bandwidth, and background noise. In addition, cellphones may cause interference with the hearing aid or cochlear implant. To address these problems that hearing-impaired people experience with telephones, this paper proposes a wireless phone adapter that can be used to route the audio signal directly to the hearing aid or cochlear implant processor. This adapter is based on Bluetooth technology. The favorable features of this new wireless technology make the adapter superior to traditional assistive listening devices. A hardware prototype was built and software programs were written to implement the headset profile in the Bluetooth specification. Three cochlear implant users were tested with the proposed phone-adapter and reported good speech quality.
Common cues to emotion in the dynamic facial expressions of speech and song.
Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline
2015-01-01
Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.
Hu, Yi
2010-05-01
Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions.
Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Litovsky, Ruth Y
2014-09-01
Most contemporary cochlear implant (CI) processing strategies discard acoustic temporal fine structure (TFS) information, and this may contribute to the observed deficits in bilateral CI listeners' ability to localize sounds when compared to normal hearing listeners. Additionally, for best speech envelope representation, most contemporary speech processing strategies use high-rate carriers (≥900 Hz) that exceed the limit for interaural pulse timing to provide useful binaural information. Many bilateral CI listeners are sensitive to interaural time differences (ITDs) in low-rate (<300 Hz) constant-amplitude pulse trains. This study explored the trade-off between superior speech temporal envelope representation with high-rate carriers and binaural pulse timing sensitivity with low-rate carriers. The effects of carrier pulse rate and pulse timing on ITD discrimination, ITD lateralization, and speech recognition in quiet were examined in eight bilateral CI listeners. Stimuli consisted of speech tokens processed at different electrical stimulation rates, and pulse timings that either preserved or did not preserve acoustic TFS cues. Results showed that CI listeners were able to use low-rate pulse timing cues derived from acoustic TFS when presented redundantly on multiple electrodes for ITD discrimination and lateralization of speech stimuli.
When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.
Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola
2017-11-01
Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha
2015-03-01
While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed more by visual distractors when confronted with incongruent audiovisual stimuli. To cope with this multimodal conflict, CI users activate the left inferior frontal gyrus to adopt a top-down cognitive modulation pathway, whereas normal hearing individuals primarily adopt a bottom-up strategy.
An evaluation of two methods for increasing self-initiated verbalizations in autistic children.
Matson, J L; Sevin, J A; Box, M L; Francis, K L; Sevin, B M
1993-01-01
Three children with autism and mental retardation were treated for deficits in self-initiated speech. A novel treatment package employing visual cue fading was compared with a graduated time-delay procedure previously shown to be effective for increasing self-initiated language. Both treatments included training multiple self-initiated verbalizations using multiple therapists and settings. Both treatments were effective, with no differences in measures of acquisition of target phrases, maintenance of behavioral gains, acquisition with additional therapists and settings, and social validity. PMID:8407687
Preschoolers' real-time coordination of vocal and facial emotional information.
Berman, Jared M J; Chambers, Craig G; Graham, Susan A
2016-02-01
An eye-tracking methodology was used to examine the time course of 3- and 5-year-olds' ability to link speech bearing different acoustic cues to emotion (i.e., happy-sounding, neutral, and sad-sounding intonation) to photographs of faces reflecting different emotional expressions. Analyses of saccadic eye movement patterns indicated that, for both 3- and 5-year-olds, sad-sounding speech triggered gaze shifts to a matching (sad-looking) face from the earliest moments of speech processing. However, it was not until approximately 800ms into a happy-sounding utterance that preschoolers began to use the emotional cues from speech to identify a matching (happy-looking) face. Complementary analyses based on conscious/controlled behaviors (children's explicit points toward the faces) indicated that 5-year-olds, but not 3-year-olds, could successfully match happy-sounding and sad-sounding vocal affect to a corresponding emotional face. Together, the findings clarify developmental patterns in preschoolers' implicit versus explicit ability to coordinate emotional cues across modalities and highlight preschoolers' greater sensitivity to sad-sounding speech as the auditory signal unfolds in time. Copyright © 2015 Elsevier Inc. All rights reserved.
An Exploration of Rhythmic Grouping of Speech Sequences by French- and German-Learning Infants
Abboub, Nawal; Boll-Avetisyan, Natalie; Bhatara, Anjali; Höhle, Barbara; Nazzi, Thierry
2016-01-01
Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias—called the Iambic-Trochaic Law (ITL)–has been claimed to be an universal property of the auditory system applying in both the music and the language domains. Recent studies have shown that language experience can modulate the effects of the ITL on rhythmic perception of both speech and non-speech sequences in adults, and of non-speech sequences in 7.5-month-old infants. The goal of the present study was to explore whether language experience also modulates infants’ grouping of speech. To do so, we presented sequences of syllables to monolingual French- and German-learning 7.5-month-olds. Using the Headturn Preference Procedure (HPP), we examined whether they were able to perceive a rhythmic structure in sequences of syllables that alternated in duration, pitch, or intensity. Our findings show that both French- and German-learning infants perceived a rhythmic structure when it was cued by duration or pitch but not intensity. Our findings also show differences in how these infants use duration and pitch cues to group syllable sequences, suggesting that pitch cues were the easier ones to use. Moreover, performance did not differ across languages, failing to reveal early language effects on rhythmic perception. These results contribute to our understanding of the origin of rhythmic perception and perceptual mechanisms shared across music and speech, which may bootstrap language acquisition. PMID:27378887
Donaldson, Gail S; Dawson, Patricia K; Borden, Lamar Z
2011-01-01
Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many cochlear implant (CI) users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 wks during the main study; a subset of five subjects used Fidelity120 for three additional months after the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency, vowel F2 frequency, and consonant place of articulation; overall transmitted information for vowels and consonants; and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle, and basal regions of the implanted array using a psychophysical pitch-ranking task. With one exception, there was no effect of strategy (HiRes versus Fidelity120) on the speech measures tested, either during the main study (N = 10) or after extended use of Fidelity120 (N = 5). The exception was a small but significant advantage for HiRes over Fidelity120 for consonant perception during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 wks or longer experience with Fidelity120. Another three subjects exhibited initial decrements in spectral cue perception with Fidelity120 at the 8-wk time point; however, evidence from one subject suggested that such decrements may resolve with additional experience. Place-pitch thresholds were inversely related to improvements in vowel F2 frequency perception with Fidelity120 relative to HiRes. However, no relationship was observed between place-pitch thresholds and the other spectral measures (vowel F1 frequency or consonant place of articulation). Findings suggest that Fidelity120 supports small improvements in the perception of spectral speech cues in some Advanced Bionics CI users; however, many users show no clear benefit. Benefits are more likely to occur for vowel spectral cues (related to F1 and F2 frequency) than for consonant spectral cues (related to place of articulation). There was an inconsistent relationship between place-pitch sensitivity and improvements in spectral cue perception with Fidelity120 relative to HiRes. This may partly reflect the small number of sites at which place-pitch thresholds were measured. Contrary to some previous reports, there was no clear evidence that Fidelity120 supports improved sentence recognition in noise.
A novel speech processing algorithm based on harmonicity cues in cochlear implant
NASA Astrophysics Data System (ADS)
Wang, Jian; Chen, Yousheng; Zhang, Zongping; Chen, Yan; Zhang, Weifeng
2017-08-01
This paper proposed a novel speech processing algorithm in cochlear implant, which used harmonicity cues to enhance tonal information in Mandarin Chinese speech recognition. The input speech was filtered by a 4-channel band-pass filter bank. The frequency ranges for the four bands were: 300-621, 621-1285, 1285-2657, and 2657-5499 Hz. In each pass band, temporal envelope and periodicity cues (TEPCs) below 400 Hz were extracted by full wave rectification and low-pass filtering. The TEPCs were modulated by a sinusoidal carrier, the frequency of which was fundamental frequency (F0) and its harmonics most close to the center frequency of each band. Signals from each band were combined together to obtain an output speech. Mandarin tone, word, and sentence recognition in quiet listening conditions were tested for the extensively used continuous interleaved sampling (CIS) strategy and the novel F0-harmonic algorithm. Results found that the F0-harmonic algorithm performed consistently better than CIS strategy in Mandarin tone, word, and sentence recognition. In addition, sentence recognition rate was higher than word recognition rate, as a result of contextual information in the sentence. Moreover, tone 3 and 4 performed better than tone 1 and tone 2, due to the easily identified features of the former. In conclusion, the F0-harmonic algorithm could enhance tonal information in cochlear implant speech processing due to the use of harmonicity cues, thereby improving Mandarin tone, word, and sentence recognition. Further study will focus on the test of the F0-harmonic algorithm in noisy listening conditions.
Ihlefeld, Antje; Litovsky, Ruth Y
2012-01-01
Spatial release from masking refers to a benefit for speech understanding. It occurs when a target talker and a masker talker are spatially separated. In those cases, speech intelligibility for target speech is typically higher than when both talkers are at the same location. In cochlear implant listeners, spatial release from masking is much reduced or absent compared with normal hearing listeners. Perhaps this reduced spatial release occurs because cochlear implant listeners cannot effectively attend to spatial cues. Three experiments examined factors that may interfere with deploying spatial attention to a target talker masked by another talker. To simulate cochlear implant listening, stimuli were vocoded with two unique features. First, we used 50-Hz low-pass filtered speech envelopes and noise carriers, strongly reducing the possibility of temporal pitch cues; second, co-modulation was imposed on target and masker utterances to enhance perceptual fusion between the two sources. Stimuli were presented over headphones. Experiments 1 and 2 presented high-fidelity spatial cues with unprocessed and vocoded speech. Experiment 3 maintained faithful long-term average interaural level differences but presented scrambled interaural time differences with vocoded speech. Results show a robust spatial release from masking in Experiments 1 and 2, and a greatly reduced spatial release in Experiment 3. Faithful long-term average interaural level differences were insufficient for producing spatial release from masking. This suggests that appropriate interaural time differences are necessary for restoring spatial release from masking, at least for a situation where there are few viable alternative segregation cues.
NASA Astrophysics Data System (ADS)
Liberman, A. M.
1984-08-01
This report (1 January-30 June) is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: Sources of variability in early speech development; Invariance: Functional or descriptive?; Brief comments on invariance in phonetic perception; Phonetic category boundaries are flexible; On categorizing asphasic speech errors; Universal and language particular aspects of vowel-to-vowel coarticulation; Functional specific articulatory cooperation following jaw perturbation; during speech: Evidence for coordinative structures; Formant integration and the perception of nasal vowel height; Relative power of cues: FO shifts vs. voice timing; Laryngeal management at utterance-internal word boundary in American English; Closure duration and release burst amplitude cues to stop consonant manner and place of articulation; Effects of temporal stimulus properties on perception of the (sl)-(spl) distinction; The physics of controlled conditions: A reverie about locomotion; On the perception of intonation from sinusoidal sentences; Speech Perception; Speech Articulation; Motor Control; Speech Development.
Collaborative Signaling of Informational Structures by Dynamic Speech Rate.
ERIC Educational Resources Information Center
Koiso, Hanae; Shimojima, Atsushi; Katagiri, Yasuhiro
1998-01-01
Investigated the functions of dynamic speech rates as contextualization cues in conversational Japanese, examining five spontaneous task-oriented dialogs and analyzing the potential of speech-rate changes in signaling the structure of the information being exchanged. Results found a correlation between speech decelerations and the openings of new…
NASA Technical Reports Server (NTRS)
Khan, M. Javed; Rossi, Marcia; Heath, Bruce; Ali, Syed F.; Ward, Marcus
2006-01-01
The effects of out-of-the-window cues on learning a straight-in landing approach and a level 360deg turn by novice pilots on a flight simulator have been investigated. The treatments consisted of training with and without visual cues as well as density of visual cues. The performance of the participants was then evaluated through similar but more challenging tasks. It was observed that the participants in the landing study who trained with visual cues performed poorly than those who trained without the cues. However the performance of those who trained with a faded-cues sequence performed slightly better than those who trained without visual cues. In the level turn study it was observed that those who trained with the visual cues performed better than those who trained without visual cues. The study also showed that those participants who trained with a lower density of cues performed better than those who trained with a higher density of visual cues.
On how the brain decodes vocal cues about speaker confidence.
Jiang, Xiaoming; Pell, Marc D
2015-05-01
In speech communication, listeners must accurately decode vocal cues that refer to the speaker's mental state, such as their confidence or 'feeling of knowing'. However, the time course and neural mechanisms associated with online inferences about speaker confidence are unclear. Here, we used event-related potentials (ERPs) to examine the temporal neural dynamics underlying a listener's ability to infer speaker confidence from vocal cues during speech processing. We recorded listeners' real-time brain responses while they evaluated statements wherein the speaker's tone of voice conveyed one of three levels of confidence (confident, close-to-confident, unconfident) or were spoken in a neutral manner. Neural responses time-locked to event onset show that the perceived level of speaker confidence could be differentiated at distinct time points during speech processing: unconfident expressions elicited a weaker P2 than all other expressions of confidence (or neutral-intending utterances), whereas close-to-confident expressions elicited a reduced negative response in the 330-500 msec and 550-740 msec time window. Neutral-intending expressions, which were also perceived as relatively confident, elicited a more delayed, larger sustained positivity than all other expressions in the 980-1270 msec window for this task. These findings provide the first piece of evidence of how quickly the brain responds to vocal cues signifying the extent of a speaker's confidence during online speech comprehension; first, a rough dissociation between unconfident and confident voices occurs as early as 200 msec after speech onset. At a later stage, further differentiation of the exact level of speaker confidence (i.e., close-to-confident, very confident) is evaluated via an inferential system to determine the speaker's meaning under current task settings. These findings extend three-stage models of how vocal emotion cues are processed in speech comprehension (e.g., Schirmer & Kotz, 2006) by revealing how a speaker's mental state (i.e., feeling of knowing) is simultaneously inferred from vocal expressions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Individual Sensitivity to Spectral and Temporal Cues in Listeners With Hearing Impairment
Wright, Richard A.; Blackburn, Michael C.; Tatman, Rachael; Gallun, Frederick J.
2015-01-01
Purpose The present study was designed to evaluate use of spectral and temporal cues under conditions in which both types of cues were available. Method Participants included adults with normal hearing and hearing loss. We focused on 3 categories of speech cues: static spectral (spectral shape), dynamic spectral (formant change), and temporal (amplitude envelope). Spectral and/or temporal dimensions of synthetic speech were systematically manipulated along a continuum, and recognition was measured using the manipulated stimuli. Level was controlled to ensure cue audibility. Discriminant function analysis was used to determine to what degree spectral and temporal information contributed to the identification of each stimulus. Results Listeners with normal hearing were influenced to a greater extent by spectral cues for all stimuli. Listeners with hearing impairment generally utilized spectral cues when the information was static (spectral shape) but used temporal cues when the information was dynamic (formant transition). The relative use of spectral and temporal dimensions varied among individuals, especially among listeners with hearing loss. Conclusion Information about spectral and temporal cue use may aid in identifying listeners who rely to a greater extent on particular acoustic cues and applying that information toward therapeutic interventions. PMID:25629388
ERIC Educational Resources Information Center
Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve
2018-01-01
To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
[Visual cuing effect for haptic angle judgment].
Era, Ataru; Yokosawa, Kazuhiko
2009-08-01
We investigated whether visual cues are useful for judging haptic angles. Participants explored three-dimensional angles with a virtual haptic feedback device. For visual cues, we use a location cue, which synchronizes haptic exploration, and a space cue, which specifies the haptic space. In Experiment 1, angles were judged more correctly with both cues, but were overestimated with a location cue only. In Experiment 2, the visual cues emphasized depth, and overestimation with location cues occurred, but space cues had no influence. The results showed that (a) when both cues are presented, haptic angles are judged more correctly. (b) Location cues facilitate only motion information, and not depth information. (c) Haptic angles are apt to be overestimated when there is both haptic and visual information.
Harrison, Neil R; Woodhouse, Rob
2016-05-01
Previous research has demonstrated that threatening, compared to neutral pictures, can bias attention towards non-emotional auditory targets. Here we investigated which subcomponents of attention contributed to the influence of emotional visual stimuli on auditory spatial attention. Participants indicated the location of an auditory target, after brief (250 ms) presentation of a spatially non-predictive peripheral visual cue. Responses to targets were faster at the location of the preceding visual cue, compared to at the opposite location (cue validity effect). The cue validity effect was larger for targets following pleasant and unpleasant cues compared to neutral cues, for right-sided targets. For unpleasant cues, the crossmodal cue validity effect was driven by delayed attentional disengagement, and for pleasant cues, it was driven by enhanced engagement. We conclude that both pleasant and unpleasant visual cues influence the distribution of attention across modalities and that the associated attentional mechanisms depend on the valence of the visual cue.
Is Birdsong More Like Speech or Music?
Shannon, Robert V
2016-04-01
Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. Copyright © 2016 Elsevier Ltd. All rights reserved.
High-frequency neural activity predicts word parsing in ambiguous speech streams.
Kösem, Anne; Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie
2016-12-01
During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept. Copyright © 2016 the American Physiological Society.
High-frequency neural activity predicts word parsing in ambiguous speech streams
Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie
2016-01-01
During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept. PMID:27605528
Cao, Beiming; Kim, Myungjong; Mau, Ted; Wang, Jun
2017-01-01
Individuals with larynx (vocal folds) impaired have problems in controlling their glottal vibration, producing whispered speech with extreme hoarseness. Standard automatic speech recognition using only acoustic cues is typically ineffective for whispered speech because the corresponding spectral characteristics are distorted. Articulatory cues such as the tongue and lip motion may help in recognizing whispered speech since articulatory motion patterns are generally not affected. In this paper, we investigated whispered speech recognition for patients with reconstructed larynx using articulatory movement data. A data set with both acoustic and articulatory motion data was collected from a patient with surgically reconstructed larynx using an electromagnetic articulograph. Two speech recognition systems, Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network-HMM (DNN-HMM), were used in the experiments. Experimental results showed adding either tongue or lip motion data to acoustic features such as mel-frequency cepstral coefficient (MFCC) significantly reduced the phone error rates on both speech recognition systems. Adding both tongue and lip data achieved the best performance. PMID:29423453
Donaldson, Gail S.; Dawson, Patricia K.; Borden, Lamar Z.
2010-01-01
Objectives Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many CI users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral-cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. Design A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 weeks during the main study; a subset of five subjects used Fidelity120 for 3 additional months following the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency (Vow F1), vowel F2 frequency (Vow F2) and consonant place of articulation (Con PLC); overall transmitted information for vowels (Vow STIM) and consonants (Con STIM); and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle and basal regions of the implanted array using a psychophysical pitch-ranking task. Results With one exception, there was no effect of strategy (HiRes vs. Fidelity120) on the speech measures tested, either during the main study (n=10) or after extended use of Fidelity120 (n=5). The exception was a small but significant advantage for HiRes over Fidelity120 for the Con STIM measure during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 weeks or longer experience with Fidelity120. Another 3 subjects exhibited initial decrements in spectral cue perception with Fidelity120 at the 8 week time point; however, evidence from one subject suggested that such decrements may resolve with additional experience. Place-pitch thresholds were inversely related to improvements in Vow F2 perception with Fidelity120 relative to HiRes. However, no relationship was observed between place-pitch thresholds and the other spectral measures (Vow F1 or Con PLC). Conclusions Findings suggest that Fidelity120 supports small improvements in the perception of spectral speech cues in some Advanced Bionics CI users; however, many users show no clear benefit. Benefits are more likely to occur for vowel spectral cues (related to F1 and F2 frequency) than for consonant spectral cues (related to place of articulation). There was an inconsistent relationship between place-pitch sensitivity and improvements in spectral cue perception with Fidelity120 relative to HiRes. This may partly reflect the small number of sites at which place-pitch thresholds were measured. Contrary to some previous reports, there was no clear evidence that Fidelity120 supports improved sentence recognition in noise. PMID:21084987
Normative Data on Audiovisual Speech Integration Using Sentence Recognition and Capacity Measures
Altieri, Nicholas; Hudock, Daniel
2016-01-01
Objective The ability to use visual speech cues and integrate them with auditory information is important, especially in noisy environments and for hearing-impaired (HI) listeners. Providing data on measures of integration skills that encompass accuracy and processing speed will benefit researchers and clinicians. Design The study consisted of two experiments: First, accuracy scores were obtained using CUNY sentences, and capacity measures that assessed reaction-time distributions were obtained from a monosyllabic word recognition task. Study Sample We report data on two measures of integration obtained from a sample comprised of 86 young and middle-age adult listeners: Results To summarize our results, capacity showed a positive correlation with accuracy measures of audiovisual benefit obtained from sentence recognition. More relevant, factor analysis indicated that a single-factor model captured audiovisual speech integration better than models containing more factors. Capacity exhibited strong loadings on the factor, while the accuracy-based measures from sentence recognition exhibited weaker loadings. Conclusions Results suggest that a listener’s integration skills may be assessed optimally using a measure that incorporates both processing speed and accuracy. PMID:26853446
Normative data on audiovisual speech integration using sentence recognition and capacity measures.
Altieri, Nicholas; Hudock, Daniel
2016-01-01
The ability to use visual speech cues and integrate them with auditory information is important, especially in noisy environments and for hearing-impaired (HI) listeners. Providing data on measures of integration skills that encompass accuracy and processing speed will benefit researchers and clinicians. The study consisted of two experiments: First, accuracy scores were obtained using City University of New York (CUNY) sentences, and capacity measures that assessed reaction-time distributions were obtained from a monosyllabic word recognition task. We report data on two measures of integration obtained from a sample comprised of 86 young and middle-age adult listeners: To summarize our results, capacity showed a positive correlation with accuracy measures of audiovisual benefit obtained from sentence recognition. More relevant, factor analysis indicated that a single-factor model captured audiovisual speech integration better than models containing more factors. Capacity exhibited strong loadings on the factor, while the accuracy-based measures from sentence recognition exhibited weaker loadings. Results suggest that a listener's integration skills may be assessed optimally using a measure that incorporates both processing speed and accuracy.
O'Brien, Amanda; Schlosser, Ralf W; Shane, Howard C; Abramson, Jennifer; Allen, Anna A; Flynn, Suzanne; Yu, Christina; Dimery, Katherine
2016-12-01
Using augmented input might be an effective means for supplementing spoken language for children with autism who have difficulties following spoken directives. This study aimed to (a) explore whether JIT-delivered scene cues (photos, video clips) via the Apple Watch ® enable children with autism to carry out directives they were unable to implement with speech alone, and (b) test the feasibility of the Apple Watch ® (with a focus on display size). Results indicated that the hierarchical JIT supports enabled five children with autism to carry out the majority of directives. Hence, the relatively small display size of the Apple Watch does not seem to hinder children with autism to glean critical information from visual supports.
Individual Sensitivity to Spectral and Temporal Cues in Listeners with Hearing Impairment
ERIC Educational Resources Information Center
Souza, Pamela E.; Wright, Richard A.; Blackburn, Michael C.; Tatman, Rachael; Gallun, Frederick J.
2015-01-01
Purpose: The present study was designed to evaluate use of spectral and temporal cues under conditions in which both types of cues were available. Method: Participants included adults with normal hearing and hearing loss. We focused on 3 categories of speech cues: static spectral (spectral shape), dynamic spectral (formant change), and temporal…
Do Older Listeners With Hearing Loss Benefit From Dynamic Pitch for Speech Recognition in Noise?
Shen, Jing; Souza, Pamela E
2017-10-12
Dynamic pitch, the variation in the fundamental frequency of speech, aids older listeners' speech perception in noise. It is unclear, however, whether some older listeners with hearing loss benefit from strengthened dynamic pitch cues for recognizing speech in certain noise scenarios and how this relative benefit may be associated with individual factors. We first examined older individuals' relative benefit between natural and strong dynamic pitches for better speech recognition in noise. Further, we reported the individual factors of the 2 groups of listeners who benefit differently from natural and strong dynamic pitches. Speech reception thresholds of 13 older listeners with mild-moderate hearing loss were measured using target speech with 3 levels of dynamic pitch strength. Individuals' ability to benefit from dynamic pitch was defined as the speech reception threshold difference between speeches with and without dynamic pitch cues. The relative benefit of natural versus strong dynamic pitch varied across individuals. However, this relative benefit remained consistent for the same individuals across those background noises with temporal modulation. Those listeners who benefited more from strong dynamic pitch reported better subjective speech perception abilities. Strong dynamic pitch may be more beneficial than natural dynamic pitch for some older listeners to recognize speech better in noise, particularly when the noise has temporal modulation.
Zekveld, Adriana A; Kramer, Sophia E; Rönnberg, Jerker; Rudner, Mary
2018-06-19
Speech understanding may be cognitively demanding, but it can be enhanced when semantically related text cues precede auditory sentences. The present study aimed to determine whether (a) providing text cues reduces pupil dilation, a measure of cognitive load, during listening to sentences, (b) repeating the sentences aloud affects recall accuracy and pupil dilation during recall of cue words, and (c) semantic relatedness between cues and sentences affects recall accuracy and pupil dilation during recall of cue words. Sentence repetition following text cues and recall of the text cues were tested. Twenty-six participants (mean age, 22 years) with normal hearing listened to masked sentences. On each trial, a set of four-word cues was presented visually as text preceding the auditory presentation of a sentence whose meaning was either related or unrelated to the cues. On each trial, participants first read the cue words, then listened to a sentence. Following this they spoke aloud either the cue words or the sentence, according to instruction, and finally on all trials orally recalled the cues. Peak pupil dilation was measured throughout listening and recall on each trial. Additionally, participants completed a test measuring the ability to perceive degraded verbal text information and three working memory tests (a reading span test, a size-comparison span test, and a test of memory updating). Cue words that were semantically related to the sentence facilitated sentence repetition but did not reduce pupil dilation. Recall was poorer and there were more intrusion errors when the cue words were related to the sentences. Recall was also poorer when sentences were repeated aloud. Both behavioral effects were associated with greater pupil dilation. Larger reading span capacity and smaller size-comparison span were associated with larger peak pupil dilation during listening. Furthermore, larger reading span and greater memory updating ability were both associated with better cue recall overall. Although sentence-related word cues facilitate sentence repetition, our results indicate that they do not reduce cognitive load during listening in noise with a concurrent memory load. As expected, higher working memory capacity was associated with better recall of the cues. Unexpectedly, however, semantic relatedness with the sentence reduced word cue recall accuracy and increased intrusion errors, suggesting an effect of semantic confusion. Further, speaking the sentence aloud also reduced word cue recall accuracy, probably due to articulatory suppression. Importantly, imposing a memory load during listening to sentences resulted in the absence of formerly established strong effects of speech intelligibility on the pupil dilation response. This nullified intelligibility effect demonstrates that the pupil dilation response to a cognitive (memory) task can completely overshadow the effect of perceptual factors on the pupil dilation response. This highlights the importance of taking cognitive task load into account during auditory testing.This is an open access article distributed under the Creative Commons Attribution License 4.0 (CCBY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Intensive Treatment with Ultrasound Visual Feedback for Speech Sound Errors in Childhood Apraxia
Preston, Jonathan L.; Leece, Megan C.; Maas, Edwin
2016-01-01
Ultrasound imaging is an adjunct to traditional speech therapy that has shown to be beneficial in the remediation of speech sound errors. Ultrasound biofeedback can be utilized during therapy to provide clients with additional knowledge about their tongue shapes when attempting to produce sounds that are erroneous. The additional feedback may assist children with childhood apraxia of speech (CAS) in stabilizing motor patterns, thereby facilitating more consistent and accurate productions of sounds and syllables. However, due to its specialized nature, ultrasound visual feedback is a technology that is not widely available to clients. Short-term intensive treatment programs are one option that can be utilized to expand access to ultrasound biofeedback. Schema-based motor learning theory suggests that short-term intensive treatment programs (massed practice) may assist children in acquiring more accurate motor patterns. In this case series, three participants ages 10–14 years diagnosed with CAS attended 16 h of speech therapy over a 2-week period to address residual speech sound errors. Two participants had distortions on rhotic sounds, while the third participant demonstrated lateralization of sibilant sounds. During therapy, cues were provided to assist participants in obtaining a tongue shape that facilitated a correct production of the erred sound. Additional practice without ultrasound was also included. Results suggested that all participants showed signs of acquisition of sounds in error. Generalization and retention results were mixed. One participant showed generalization and retention of sounds that were treated; one showed generalization but limited retention; and the third showed no evidence of generalization or retention. Individual characteristics that may facilitate generalization are discussed. Short-term intensive treatment programs using ultrasound biofeedback may result in the acquisition of more accurate motor patterns and improved articulation of sounds previously in error, with varying levels of generalization and retention. PMID:27625603
Comparison of different speech tasks among adults who stutter and adults who do not stutter
Ritto, Ana Paula; Costa, Julia Biancalana; Juste, Fabiola Staróbole; de Andrade, Claudia Regina Furquim
2016-01-01
OBJECTIVES: In this study, we compared the performance of both fluent speakers and people who stutter in three different speaking situations: monologue speech, oral reading and choral reading. This study follows the assumption that the neuromotor control of speech can be influenced by external auditory stimuli in both speakers who stutter and speakers who do not stutter. METHOD: Seventeen adults who stutter and seventeen adults who do not stutter were assessed in three speaking tasks: monologue, oral reading (solo reading aloud) and choral reading (reading in unison with the evaluator). Speech fluency and rate were measured for each task. RESULTS: The participants who stuttered had a lower frequency of stuttering during choral reading than during monologue and oral reading. CONCLUSIONS: According to the dual premotor system model, choral speech enhanced fluency by providing external cues for the timing of each syllable compensating for deficient internal cues. PMID:27074176
NASA Astrophysics Data System (ADS)
Liberman, A. M.
1980-06-01
This report (1 April - 30 June) is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: The perceptual equivalance of two acoustic cues for a speech contrast is specific to phonetic perception; Duplex perception of acoustic patterns as speech and nonspeech; Evidence for phonetic processing of cues to place of articulation: Perceived manner affects perceived place; Some articulatory correlates of perceptual isochrony; Effects of utterance continuity on phonetic judgments; Laryngeal adjustments in stuttering: A glottographic observation using a modified reaction paradigm; Missing -ing in reading: Letter detection errors on word endings; Speaking rate; syllable stress, and vowel identity; Sonority and syllabicity: Acoustic correlates of perception, Influence of vocalic context on perception of the (S)-(s) distinction.
PSQM-based RR and NR video quality metrics
NASA Astrophysics Data System (ADS)
Lu, Zhongkang; Lin, Weisi; Ong, Eeping; Yang, Xiaokang; Yao, Susu
2003-06-01
This paper presents a new and general concept, PQSM (Perceptual Quality Significance Map), to be used in measuring the visual distortion. It makes use of the selectivity characteristic of HVS (Human Visual System) that it pays more attention to certain area/regions of visual signal due to one or more of the following factors: salient features in image/video, cues from domain knowledge, and association of other media (e.g., speech or audio). PQSM is an array whose elements represent the relative perceptual-quality significance levels for the corresponding area/regions for images or video. Due to its generality, PQSM can be incorporated into any visual distortion metrics: to improve effectiveness or/and efficiency of perceptual metrics; or even to enhance a PSNR-based metric. A three-stage PQSM estimation method is also proposed in this paper, with an implementation of motion, texture, luminance, skin-color and face mapping. Experimental results show the scheme can improve the performance of current image/video distortion metrics.
Language familiarity modulates relative attention to the eyes and mouth of a talker.
Barenholtz, Elan; Mavica, Lauren; Lewkowicz, David J
2016-02-01
We investigated whether the audiovisual speech cues available in a talker's mouth elicit greater attention when adults have to process speech in an unfamiliar language vs. a familiar language. Participants performed a speech-encoding task while watching and listening to videos of a talker in a familiar language (English) or an unfamiliar language (Spanish or Icelandic). Attention to the mouth increased in monolingual subjects in response to an unfamiliar language condition but did not in bilingual subjects when the task required speech processing. In the absence of an explicit speech-processing task, subjects attended equally to the eyes and mouth in response to both familiar and unfamiliar languages. Overall, these results demonstrate that language familiarity modulates selective attention to the redundant audiovisual speech cues in a talker's mouth in adults. When our findings are considered together with similar findings from infants, they suggest that this attentional strategy emerges very early in life. Copyright © 2015 Elsevier B.V. All rights reserved.
Hierarchical acquisition of visual specificity in spatial contextual cueing.
Lie, Kin-Pou
2015-01-01
Spatial contextual cueing refers to visual search performance's being improved when invariant associations between target locations and distractor spatial configurations are learned incidentally. Using the instance theory of automatization and the reverse hierarchy theory of visual perceptual learning, this study explores the acquisition of visual specificity in spatial contextual cueing. Two experiments in which detailed visual features were irrelevant for distinguishing between spatial contexts found that spatial contextual cueing was visually generic in difficult trials when the trials were not preceded by easy trials (Experiment 1) but that spatial contextual cueing progressed to visual specificity when difficult trials were preceded by easy trials (Experiment 2). These findings support reverse hierarchy theory, which predicts that even when detailed visual features are irrelevant for distinguishing between spatial contexts, spatial contextual cueing can progress to visual specificity if the stimuli remain constant, the task is difficult, and difficult trials are preceded by easy trials. However, these findings are inconsistent with instance theory, which predicts that when detailed visual features are irrelevant for distinguishing between spatial contexts, spatial contextual cueing will not progress to visual specificity. This study concludes that the acquisition of visual specificity in spatial contextual cueing is more plausibly hierarchical, rather than instance-based.
Fogerty, Daniel; Ahlstrom, Jayne B.; Bologna, William J.; Dubno, Judy R.
2015-01-01
This study investigated how single-talker modulated noise impacts consonant and vowel cues to sentence intelligibility. Younger normal-hearing, older normal-hearing, and older hearing-impaired listeners completed speech recognition tests. All listeners received spectrally shaped speech matched to their individual audiometric thresholds to ensure sufficient audibility with the exception of a second younger listener group who received spectral shaping that matched the mean audiogram of the hearing-impaired listeners. Results demonstrated minimal declines in intelligibility for older listeners with normal hearing and more evident declines for older hearing-impaired listeners, possibly related to impaired temporal processing. A correlational analysis suggests a common underlying ability to process information during vowels that is predictive of speech-in-modulated noise abilities. Whereas, the ability to use consonant cues appears specific to the particular characteristics of the noise and interruption. Performance declines for older listeners were mostly confined to consonant conditions. Spectral shaping accounted for the primary contributions of audibility. However, comparison with the young spectral controls who received identical spectral shaping suggests that this procedure may reduce wideband temporal modulation cues due to frequency-specific amplification that affected high-frequency consonants more than low-frequency vowels. These spectral changes may impact speech intelligibility in certain modulation masking conditions. PMID:26093436
Johnson, Erin Phinney; Pennington, Bruce F.; Lowenstein, Joanna H.; Nittrouer, Susan
2011-01-01
Purpose Children with speech sound disorder (SSD) and reading disability (RD) have poor phonological awareness, a problem believed to arise largely from deficits in processing the sensory information in speech, specifically individual acoustic cues. However, such cues are details of acoustic structure. Recent theories suggest that listeners also need to be able to integrate those details to perceive linguistically relevant form. This study examined abilities of children with SSD, RD, and SSD+RD not only to process acoustic cues but also to recover linguistically relevant form from the speech signal. Method Ten- to 11-year-olds with SSD (n = 17), RD (n = 16), SSD+RD (n = 17), and Controls (n = 16) were tested to examine their sensitivity to (1) voice onset times (VOT); (2) spectral structure in fricative-vowel syllables; and (3) vocoded sentences. Results Children in all groups performed similarly with VOT stimuli, but children with disorders showed delays on other tasks, although the specifics of their performance varied. Conclusion Children with poor phonemic awareness not only lack sensitivity to acoustic details, but are also less able to recover linguistically relevant forms. This is contrary to one of the main current theories of the relation between spoken and written language development. PMID:21329941
Masking release for words in amplitude-modulated noise as a function of modulation rate and task
Buss, Emily; Whittle, Lisa N.; Grose, John H.; Hall, Joseph W.
2009-01-01
For normal-hearing listeners, masked speech recognition can improve with the introduction of masker amplitude modulation. The present experiments tested the hypothesis that this masking release is due in part to an interaction between the temporal distribution of cues necessary to perform the task and the probability of those cues temporally coinciding with masker modulation minima. Stimuli were monosyllabic words masked by speech-shaped noise, and masker modulation was introduced via multiplication with a raised sinusoid of 2.5–40 Hz. Tasks included detection, three-alternative forced-choice identification, and open-set identification. Overall, there was more masking release associated with the closed than the open-set tasks. The best rate of modulation also differed as a function of task; whereas low modulation rates were associated with best performance for the detection and three-alternative identification tasks, performance improved with modulation rate in the open-set task. This task-by-rate interaction was also observed when amplitude-modulated speech was presented in a steady masker, and for low- and high-pass filtered speech presented in modulated noise. These results were interpreted as showing that the optimal rate of amplitude modulation depends on the temporal distribution of speech cues and the information required to perform a particular task. PMID:19603883
Common cues to emotion in the dynamic facial expressions of speech and song
Livingstone, Steven R.; Thompson, William F.; Wanderley, Marcelo M.; Palmer, Caroline
2015-01-01
Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech–song differences. Vocalists’ jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech–song. Vocalists’ emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists’ facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production. PMID:25424388
Stekelenburg, Jeroen J; Keetels, Mirjam; Vroomen, Jean
2018-05-01
Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect. Here, we examined whether written text, like visual speech, can induce an illusory change in the perception of speech sounds on both the behavioural and neural levels. In a sound categorization task, we found that both text and visual speech changed the identity of speech sounds from an /aba/-/ada/ continuum, but the size of this audiovisual effect was considerably smaller for text than visual speech. To examine at which level in the information processing hierarchy these multisensory interactions occur, we recorded electroencephalography in an audiovisual mismatch negativity (MMN, a component of the event-related potential reflecting preattentive auditory change detection) paradigm in which deviant text or visual speech was used to induce an illusory change in a sequence of ambiguous sounds halfway between /aba/ and /ada/. We found that only deviant visual speech induced an MMN, but not deviant text, which induced a late P3-like positive potential. These results demonstrate that text has much weaker effects on sound processing than visual speech does, possibly because text has different biological roots than visual speech. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Melodic Contour Identification and Music Perception by Cochlear Implant Users
Galvin, John J.; Fu, Qian-Jie; Shannon, Robert V.
2013-01-01
Research and outcomes with cochlear implants (CIs) have revealed a dichotomy in the cues necessary for speech and music recognition. CI devices typically transmit 16–22 spectral channels, each modulated slowly in time. This coarse representation provides enough information to support speech understanding in quiet and rhythmic perception in music, but not enough to support speech understanding in noise or melody recognition. Melody recognition requires some capacity for complex pitch perception, which in turn depends strongly on access to spectral fine structure cues. Thus, temporal envelope cues are adequate for speech perception under optimal listening conditions, while spectral fine structure cues are needed for music perception. In this paper, we present recent experiments that directly measure CI users’ melodic pitch perception using a melodic contour identification (MCI) task. While normal-hearing (NH) listeners’ performance was consistently high across experiments, MCI performance was highly variable across CI users. CI users’ MCI performance was significantly affected by instrument timbre, as well as by the presence of a competing instrument. In general, CI users had great difficulty extracting melodic pitch from complex stimuli. However, musically-experienced CI users often performed as well as NH listeners, and MCI training in less experienced subjects greatly improved performance. With fixed constraints on spectral resolution, such as it occurs with hearing loss or an auditory prosthesis, training and experience can provide a considerable improvements in music perception and appreciation. PMID:19673835
Experience with a second language affects the use of fundamental frequency in speech segmentation
Broersma, Mirjam; Cho, Taehong; Kim, Sahyang; Martínez-García, Maria Teresa; Connell, Katrina
2017-01-01
This study investigates whether listeners’ experience with a second language learned later in life affects their use of fundamental frequency (F0) as a cue to word boundaries in the segmentation of an artificial language (AL), particularly when the cues to word boundaries conflict between the first language (L1) and second language (L2). F0 signals phrase-final (and thus word-final) boundaries in French but word-initial boundaries in English. Participants were functionally monolingual French listeners, functionally monolingual English listeners, bilingual L1-English L2-French listeners, and bilingual L1-French L2-English listeners. They completed the AL-segmentation task with F0 signaling word-final boundaries or without prosodic cues to word boundaries (monolingual groups only). After listening to the AL, participants completed a forced-choice word-identification task in which the foils were either non-words or part-words. The results show that the monolingual French listeners, but not the monolingual English listeners, performed better in the presence of F0 cues than in the absence of such cues. Moreover, bilingual status modulated listeners’ use of F0 cues to word-final boundaries, with bilingual French listeners performing less accurately than monolingual French listeners on both word types but with bilingual English listeners performing more accurately than monolingual English listeners on non-words. These findings not only confirm that speech segmentation is modulated by the L1, but also newly demonstrate that listeners’ experience with the L2 (French or English) affects their use of F0 cues in speech segmentation. This suggests that listeners’ use of prosodic cues to word boundaries is adaptive and non-selective, and can change as a function of language experience. PMID:28738093
Auditory and Motor Rhythm Awareness in Adults with Dyslexia
ERIC Educational Resources Information Center
Thomson, Jennifer M.; Fryer, Ben; Maltby, James; Goswami, Usha
2006-01-01
Children with developmental dyslexia appear to be insensitive to basic auditory cues to speech rhythm and stress. For example, they experience difficulties in processing duration and amplitude envelope onset cues. Here we explored the sensitivity of adults with developmental dyslexia to the same cues. In addition, relations with expressive and…
Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.
Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne
2016-05-01
We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.
ERIC Educational Resources Information Center
Koehlinger, Keegan M.
2015-01-01
Clinical Question: Would a preschool-aged child with childhood apraxia of speech (CAS) benefit from a singular approach--such as motor planning, sensory cueing, linguistic and rhythmic--or a combined approach in order to increase intelligibility of spoken language? Method: Systematic Review. Study Sources: ASHA Wire, Google Scholar, Speech Bite.…
The Benefit of a Visually Guided Beamformer in a Dynamic Speech Task
Roverud, Elin; Streeter, Timothy; Mason, Christine R.; Kidd, Gerald
2017-01-01
The aim of this study was to evaluate the performance of a visually guided hearing aid (VGHA) under conditions designed to capture some aspects of “real-world” communication settings. The VGHA uses eye gaze to steer the acoustic look direction of a highly directional beamforming microphone array. Although the VGHA has been shown to enhance speech intelligibility for fixed-location, frontal targets, it is currently not known whether these benefits persist in the face of frequent changes in location of the target talker that are typical of conversational turn-taking. Participants were 14 young adults, 7 with normal hearing and 7 with bilateral sensorineural hearing impairment. Target stimuli were sequences of 12 question–answer pairs that were embedded in a mixture of competing conversations. The participant’s task was to respond via a key press after each answer indicating whether it was correct or not. Spatialization of the stimuli and microphone array processing were done offline using recorded impulse responses, before presentation over headphones. The look direction of the array was steered according to the eye movements of the participant as they followed a visual cue presented on a widescreen monitor. Performance was compared for a “dynamic” condition in which the target stimulus moved between three locations, and a “fixed” condition with a single target location. The benefits of the VGHA over natural binaural listening observed in the fixed condition were reduced in the dynamic condition, largely because visual fixation was less accurate. PMID:28758567
Smets, Karolien; Moors, Pieter; Reynvoet, Bert
2016-01-01
Performance in a non-symbolic comparison task in which participants are asked to indicate the larger numerosity of two dot arrays, is assumed to be supported by the Approximate Number System (ANS). This system allows participants to judge numerosity independently from other visual cues. Supporting this idea, previous studies indicated that numerosity can be processed when visual cues are controlled for. Consequently, distinct types of visual cue control are assumed to be interchangeable. However, a previous study showed that the type of visual cue control affected performance using a simultaneous presentation of the stimuli in numerosity comparison. In the current study, we explored whether the influence of the type of visual cue control on performance disappeared when sequentially presenting each stimulus in numerosity comparison. While the influence of the applied type of visual cue control was significantly more evident in the simultaneous condition, sequentially presenting the stimuli did not completely exclude the influence of distinct types of visual cue control. Altogether, these results indicate that the implicit assumption that it is possible to compare performances across studies with a differential visual cue control is unwarranted and that the influence of the type of visual cue control partly depends on the presentation format of the stimuli. PMID:26869967
Predicting Intelligibility Gains in Dysarthria through Automated Speech Feature Analysis
ERIC Educational Resources Information Center
Fletcher, Annalise R.; Wisler, Alan A.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.
2017-01-01
Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, &…
Differences in Talker Recognition by Preschoolers and Adults
ERIC Educational Resources Information Center
Creel, Sarah C.; Jimenez, Sofia R.
2012-01-01
Talker variability in speech influences language processing from infancy through adulthood and is inextricably embedded in the very cues that identify speech sounds. Yet little is known about developmental changes in the processing of talker information. On one account, children have not yet learned to separate speech sound variability from…
Voice Modulations in German Ironic Speech
ERIC Educational Resources Information Center
Scharrer, Lisa; Christmann, Ursula; Knoll, Monja
2011-01-01
Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic…
The Neural Basis of Speech Parsing in Children and Adults
ERIC Educational Resources Information Center
McNealy, Kristin; Mazziotta, John C.; Dapretto, Mirella
2010-01-01
Word segmentation, detecting word boundaries in continuous speech, is a fundamental aspect of language learning that can occur solely by the computation of statistical and speech cues. Fifty-four children underwent functional magnetic resonance imaging (fMRI) while listening to three streams of concatenated syllables that contained either high…
Visual Cues, Verbal Cues and Child Development
ERIC Educational Resources Information Center
Valentini, Nadia
2004-01-01
In this article, the author discusses two strategies--visual cues (modeling) and verbal cues (short, accurate phrases) which are related to teaching motor skills in maximizing learning in physical education classes. Both visual and verbal cues are strong influences in facilitating and promoting day-to-day learning. Both strategies reinforce…
Keshavarz, Behrang; Campos, Jennifer L; DeLucia, Patricia R; Oberfeld, Daniel
2017-04-01
Estimating time to contact (TTC) involves multiple sensory systems, including vision and audition. Previous findings suggested that the ratio of an object's instantaneous optical size/sound intensity to its instantaneous rate of change in optical size/sound intensity (τ) drives TTC judgments. Other evidence has shown that heuristic-based cues are used, including final optical size or final sound pressure level. Most previous studies have used decontextualized and unfamiliar stimuli (e.g., geometric shapes on a blank background). Here we evaluated TTC estimates by using a traffic scene with an approaching vehicle to evaluate the weights of visual and auditory TTC cues under more realistic conditions. Younger (18-39 years) and older (65+ years) participants made TTC estimates in three sensory conditions: visual-only, auditory-only, and audio-visual. Stimuli were presented within an immersive virtual-reality environment, and cue weights were calculated for both visual cues (e.g., visual τ, final optical size) and auditory cues (e.g., auditory τ, final sound pressure level). The results demonstrated the use of visual τ as well as heuristic cues in the visual-only condition. TTC estimates in the auditory-only condition, however, were primarily based on an auditory heuristic cue (final sound pressure level), rather than on auditory τ. In the audio-visual condition, the visual cues dominated overall, with the highest weight being assigned to visual τ by younger adults, and a more equal weighting of visual τ and heuristic cues in older adults. Overall, better characterizing the effects of combined sensory inputs, stimulus characteristics, and age on the cues used to estimate TTC will provide important insights into how these factors may affect everyday behavior.
Stuart, Samuel; Lord, Sue; Galna, Brook; Rochester, Lynn
2018-04-01
Gait impairment is a core feature of Parkinson's disease (PD) with implications for falls risk. Visual cues improve gait in PD, but the underlying mechanisms are unclear. Evidence suggests that attention and vision play an important role; however, the relative contribution from each is unclear. Measurement of visual exploration (specifically saccade frequency) during gait allows for real-time measurement of attention and vision. Understanding how visual cues influence visual exploration may allow inferences of the underlying mechanisms to response which could help to develop effective therapeutics. This study aimed to examine saccade frequency during gait in response to a visual cue in PD and older adults and investigate the roles of attention and vision in visual cue response in PD. A mobile eye-tracker measured saccade frequency during gait in 55 people with PD and 32 age-matched controls. Participants walked in a straight line with and without a visual cue (50 cm transverse lines) presented under single task and dual-task (concurrent digit span recall). Saccade frequency was reduced when walking in PD compared to controls; however, visual cues ameliorated saccadic deficit. Visual cues significantly increased saccade frequency in both PD and controls under both single task and dual-task. Attention rather than visual function was central to saccade frequency and gait response to visual cues in PD. In conclusion, this study highlights the impact of visual cues on visual exploration when walking and the important role of attention in PD. Understanding these complex features will help inform intervention development. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
2016-01-01
Speech segmentation is supported by multiple sources of information that may either inform language processing specifically, or serve learning more broadly. The Iambic/Trochaic Law (ITL), where increased duration indicates the end of a group and increased emphasis indicates the beginning of a group, has been proposed as a domain-general mechanism that also applies to language. However, language background has been suggested to modulate use of the ITL, meaning that these perceptual grouping preferences may instead be a consequence of language exposure. To distinguish between these accounts, we exposed native-English and native-Japanese listeners to sequences of speech (Experiment 1) and nonspeech stimuli (Experiment 2), and examined segmentation using a 2AFC task. Duration was manipulated over 3 conditions: sequences contained either an initial-item duration increase, or a final-item duration increase, or items of uniform duration. In Experiment 1, language background did not affect the use of duration as a cue for segmenting speech in a structured artificial language. In Experiment 2, the same results were found for grouping structured sequences of visual shapes. The results are consistent with proposals that duration information draws upon a domain-general mechanism that can apply to the special case of language acquisition. PMID:27893268
Circadian timed episodic-like memory - a bee knows what to do when, and also where.
Pahl, Mario; Zhu, Hong; Pix, Waltraud; Tautz, Juergen; Zhang, Shaowu
2007-10-01
This study investigates how the colour, shape and location of patterns could be memorized within a time frame. Bees were trained to visit two Y-mazes, one of which presented yellow vertical (rewarded) versus horizontal (non-rewarded) gratings at one site in the morning, while another presented blue horizontal (rewarded) versus vertical (non-rewarded) gratings at another site in the afternoon. The bees could perform well in the learning tests and various transfer tests, in which (i) all contextual cues from the learning test were present; (ii) the colour cues of the visual patterns were removed, but the location cue, the orientation of the visual patterns and the temporal cue still existed; (iii) the location cue was removed, but other contextual cues, i.e. the colour and orientation of the visual patterns and the temporal cue still existed; (iv) the location cue and the orientation cue of the visual patterns were removed, but the colour cue and temporal cue still existed; (v) the location cue, and the colour cue of the visual patterns were removed, but the orientation cue and the temporal cue still existed. The results reveal that the honeybee can recall the memory of the correct visual patterns by using spatial and/or temporal information. The relative importance of different contextual cues is compared and discussed. The bees' ability to integrate elements of circadian time, place and visual stimuli is akin to episodic-like memory; we have therefore named this kind of memory circadian timed episodic-like memory.
The perception of sentence stress in cochlear implant recipients.
Meister, Hartmut; Landwehr, Markus; Pyschny, Verena; Wagner, Petra; Walger, Martin
2011-01-01
Sentence stress is a vital attribute of speech since it indicates the importance of specific words within an utterance. Basic acoustic correlates of stress are syllable duration, intensity, and fundamental frequency (F0). Objectives of the study were to determine cochlear implant (CI) users' perception of the acoustic correlates and to uncover which cues are used for stress identification. Several experiments addressed the discrimination of changes in syllable duration, intensity, and F0 as well as stress identification based on these cues. Moreover, the discrimination of combined cues and identification of stress in conversational speech was examined. Both natural utterances and artificial manipulations of the acoustic cues were used as stimuli. Discrimination of syllable duration did not differ significantly between CI recipients and a control group of normal-hearing listeners. In contrast, CI users performed significantly worse on tasks of discrimination and stress identification based on F0 as well as on intensity. Results from these measurements were significantly correlated with the ability to identify stress in conversational speech. Discrimination performance for covarying F0 and intensity changes was more strongly correlated to identification performance than was found for discrimination of either F0 or intensity alone. Syllable duration was not related to stress identification in natural utterances. The outcome emphasizes the importance of both F0 and intensity for CI users' identification of sentence-based stress. Both cues were used separately for stress perception, but combining the cues provided extra benefit for most of the subjects.
Timing in audiovisual speech perception: A mini review and new psychophysical data.
Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory
2016-02-01
Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.
Timing in Audiovisual Speech Perception: A Mini Review and New Psychophysical Data
Venezia, Jonathan H.; Thurman, Steven M.; Matchin, William; George, Sahara E.; Hickok, Gregory
2015-01-01
Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually-relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (∼35% identification of /apa/ compared to ∼5% in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually-relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (∼130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309
Buss, Emily; Leibold, Lori J.; Porter, Heather L.; Grose, John H.
2017-01-01
Children perform more poorly than adults on a wide range of masked speech perception paradigms, but this effect is particularly pronounced when the masker itself is also composed of speech. The present study evaluated two factors that might contribute to this effect: the ability to perceptually isolate the target from masker speech, and the ability to recognize target speech based on sparse cues (glimpsing). Speech reception thresholds (SRTs) were estimated for closed-set, disyllabic word recognition in children (5–16 years) and adults in a one- or two-talker masker. Speech maskers were 60 dB sound pressure level (SPL), and they were either presented alone or in combination with a 50-dB-SPL speech-shaped noise masker. There was an age effect overall, but performance was adult-like at a younger age for the one-talker than the two-talker masker. Noise tended to elevate SRTs, particularly for older children and adults, and when summed with the one-talker masker. Removing time-frequency epochs associated with a poor target-to-masker ratio markedly improved SRTs, with larger effects for younger listeners; the age effect was not eliminated, however. Results were interpreted as indicating that development of speech-in-speech recognition is likely impacted by development of both perceptual masking and the ability recognize speech based on sparse cues. PMID:28464682
Improving visual spatial working memory in younger and older adults: effects of cross-modal cues.
Curtis, Ashley F; Turner, Gary R; Park, Norman W; Murtha, Susan J E
2017-11-06
Spatially informative auditory and vibrotactile (cross-modal) cues can facilitate attention but little is known about how similar cues influence visual spatial working memory (WM) across the adult lifespan. We investigated the effects of cues (spatially informative or alerting pre-cues vs. no cues), cue modality (auditory vs. vibrotactile vs. visual), memory array size (four vs. six items), and maintenance delay (900 vs. 1800 ms) on visual spatial location WM recognition accuracy in younger adults (YA) and older adults (OA). We observed a significant interaction between spatially informative pre-cue type, array size, and delay. OA and YA benefitted equally from spatially informative pre-cues, suggesting that attentional orienting prior to WM encoding, regardless of cue modality, is preserved with age. Contrary to predictions, alerting pre-cues generally impaired performance in both age groups, suggesting that maintaining a vigilant state of arousal by facilitating the alerting attention system does not help visual spatial location WM.
Francis, Alexander L; Driscoll, Courtney
2006-09-01
We examined the effect of perceptual training on a well-established hemispheric asymmetry in speech processing. Eighteen listeners were trained to use a within-category difference in voice onset time (VOT) to cue talker identity. Successful learners (n=8) showed faster response times for stimuli presented only to the left ear than for those presented only to the right. The development of a left-ear/right-hemisphere advantage for processing a prototypically phonetic cue supports a model of speech perception in which lateralization is driven by functional demands (talker identification vs. phonetic categorization) rather than by acoustic stimulus properties alone.
Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin
2016-01-01
Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
Replacing maladaptive speech with verbal labeling responses: an analysis of generalized responding.
Foxx, R M; Faw, G D; McMorrow, M J; Kyle, M S; Bittle, R G
1988-01-01
We taught three mentally handicapped students to answer questions with verbal labels and evaluated the generalized effects of this training on their maladaptive speech (e.g., echolalia) and correct responding to untrained questions. The students received cues-pause-point training on an initial question set followed by generalization assessments on a different set in another setting. Probes were conducted on novel questions in three other settings to determine the strength and spread of the generalization effect. A multiple baseline across subjects design revealed that maladaptive speech was replaced with correct labels (answers) to questions in the training and all generalization settings. These results replicate and extend previous research that suggested that cues-pause-point procedures may be useful in replacing maladaptive speech patterns by teaching students to use their verbal labeling repertoires. PMID:3225258
Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age
Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik
2015-01-01
Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259
Ma, Joan K-Y; Whitehill, Tara L; So, Susanne Y-S
2010-08-01
Speech produced by individuals with hypokinetic dysarthria associated with Parkinson's disease (PD) is characterized by a number of features including impaired speech prosody. The purpose of this study was to investigate intonation contrasts produced by this group of speakers. Speech materials with a question-statement contrast were collected from 14 Cantonese speakers with PD. Twenty listeners then classified the productions as either questions or statements. Acoustic analyses of F0, duration, and intensity were conducted to determine which acoustic cues distinguished the production of questions from statements, and which cues appeared to be exploited by listeners in identifying intonational contrasts. The results show that listeners identified statements with a high degree of accuracy, but the accuracy of question identification ranged from 0.56% to 96% across the 14 speakers. The speakers with PD used similar acoustic cues as nondysarthric Cantonese speakers to mark the question-statement contrast, although the contrasts were not observed in all speakers. Listeners mainly used F0 cues at the final syllable for intonation identification. These data contribute to the researchers' understanding of intonation marking in speakers with PD, with specific application to the production and perception of intonation in a lexical tone language.
Effects of Loudness Cues on Respiration in Individuals with Parkinson’s disease
Sadagopan, Neeraja; Huber, Jessica E.
2012-01-01
Individuals with Parkinson’s disease (PD) demonstrate low vocal intensity (hypophonia) which results in reduced speech intelligibility. We examined the effects of three cues to increase loudness on respiratory support in individuals with PD. Kinematic data from the rib cage and abdomen were collected using respiratory plethysmography while participants read a short passage. Individuals with PD and normal age- and sex-matched controls (OC) increased sound pressure level (SPL) to a similar extent. As compared to OC, individuals with PD used larger rib cage volume excursions in all conditions. Further, they did not slow their rate of speech in noise as OC speakers did. Respiratory strategies used to support increased loudness varied with the cue, but the two groups did not differ in the strategies used. When asked to target a specific loudness, both groups used more abdominal effort than at comfortable loudness. Speaking in background noise resulted in the largest increase in SPL with the most efficient respiratory patterns, suggesting natural or implicit cues may be best when treating hypophonia in individuals with PD. Data demonstrate the possibility that both vocal loudness and speech rate are impacted by cognitive mechanisms (attention or self-perception) in individuals with PD. PMID:17266087
Transition Probabilities and Different Levels of Prominence in Segmentation
ERIC Educational Resources Information Center
Ordin, Mikhail; Nespor, Marina
2013-01-01
A large body of empirical research demonstrates that people exploit a wide variety of cues for the segmentation of continuous speech in artificial languages, including rhythmic properties, phrase boundary cues, and statistical regularities. However, less is known regarding how the different cues interact. In this study we addressed the question of…
Development in Children's Interpretation of Pitch Cues to Emotions
ERIC Educational Resources Information Center
Quam, Carolyn; Swingley, Daniel
2012-01-01
Young infants respond to positive and negative speech prosody (A. Fernald, 1993), yet 4-year-olds rely on lexical information when it conflicts with paralinguistic cues to approval or disapproval (M. Friend, 2003). This article explores this surprising phenomenon, testing one hundred eighteen 2- to 5-year-olds' use of isolated pitch cues to…
Speaker recognition with temporal cues in acoustic and electric hearing
NASA Astrophysics Data System (ADS)
Vongphoe, Michael; Zeng, Fan-Gang
2005-08-01
Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
Zabierek, Kristina C; Gabor, Caitlin R
2016-09-01
Prey may use multiple sensory channels to detect predators, whose cues may differ in altered sensory environments, such as turbid conditions. Depending on the environment, prey may use cues in an additive/complementary manner or in a compensatory manner. First, to determine whether the purely aquatic Barton Springs salamander, Eurycea sosorum, show an antipredator response to visual cues, we examined their activity when exposed to either visual cues of a predatory fish (Lepomis cyanellus) or a non-predatory fish (Etheostoma lepidum). Salamanders decreased activity in response to predator visual cues only. Then, we examined the antipredator response of these salamanders to all matched and mismatched combinations of chemical and visual cues of the same predatory and non-predatory fish in clear and low turbidity conditions. Salamanders decreased activity in response to predator chemical cues matched with predator visual cues or mismatched with non-predator visual cues. Salamanders also increased latency to first move to predator chemical cues mismatched with non-predator visual cues. Salamanders decreased activity and increased latency to first move more in clear as opposed to turbid conditions in all treatment combinations. Our results indicate that salamanders under all conditions and treatments preferentially rely on chemical cues to determine antipredator behavior, although visual cues are potentially utilized in conjunction for latency to first move. Our results also have potential conservation implications, as decreased antipredator behavior was seen in turbid conditions. These results reveal complexity of antipredator behavior in response to multiple cues under different environmental conditions, which is especially important when considering endangered species. Copyright © 2016 Elsevier B.V. All rights reserved.
Video-assisted segmentation of speech and audio track
NASA Astrophysics Data System (ADS)
Pandit, Medha; Yusoff, Yusseri; Kittler, Josef; Christmas, William J.; Chilton, E. H. S.
1999-08-01
Video database research is commonly concerned with the storage and retrieval of visual information invovling sequence segmentation, shot representation and video clip retrieval. In multimedia applications, video sequences are usually accompanied by a sound track. The sound track contains potential cues to aid shot segmentation such as different speakers, background music, singing and distinctive sounds. These different acoustic categories can be modeled to allow for an effective database retrieval. In this paper, we address the problem of automatic segmentation of audio track of multimedia material. This audio based segmentation can be combined with video scene shot detection in order to achieve partitioning of the multimedia material into semantically significant segments.
Sutojo, Sarinah; van de Par, Steven; Schoenmaker, Esther
2018-06-01
In situations with competing talkers or in the presence of masking noise, speech intelligibility can be improved by spatially separating the target speaker from the interferers. This advantage is generally referred to as spatial release from masking (SRM) and different mechanisms have been suggested to explain it. One proposed mechanism to benefit from spatial cues is the binaural masking release, which is purely stimulus driven. According to this mechanism, the spatial benefit results from differences in the binaural cues of target and masker, which need to appear simultaneously in time and frequency to improve the signal detection. In an alternative proposed mechanism, the differences in the interaural cues improve the segregation of auditory streams, a process, which involves top-down processing rather than being purely stimulus driven. Other than the cues that produce binaural masking release, the interaural cue differences between target and interferer required to improve stream segregation do not have to appear simultaneously in time and frequency. This study is concerned with the contribution of binaural masking release to SRM for three masker types that differ with respect to the amount of energetic masking they exert. Speech intelligibility was measured, employing a stimulus manipulation that inhibits binaural masking release, and analyzed with a metric to account for the number of better-ear glimpses. Results indicate that the contribution of the stimulus-driven binaural masking release plays a minor role while binaural stream segregation and the availability of glimpses in the better ear had a stronger influence on improving the speech intelligibility. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
The Perception of "Sine-Wave Speech" by Adults with Developmental Dyslexia.
ERIC Educational Resources Information Center
Rosner, Burton S.; Talcott, Joel B.; Witton, Caroline; Hogg, James D.; Richardson, Alexandra J.; Hansen, Peter C.; Stein, John F.
2003-01-01
"Sine-wave speech" sentences contain only four frequency-modulated sine waves, lacking many acoustic cues present in natural speech. Adults with (n=19) and without (n=14) dyslexia were asked to reproduce orally sine-wave utterances in successive trials. Results suggest comprehension of sine-wave sentences is impaired in some adults with…
Application of the Envelope Difference Index to Spectrally Sparse Speech
ERIC Educational Resources Information Center
Souza, Pamela; Hoover, Eric; Gallun, Frederick
2012-01-01
Purpose: Amplitude compression is a common hearing aid processing strategy that can improve speech audibility and loudness comfort but also has the potential to alter important cues carried by the speech envelope. In previous work, a measure of envelope change, the Envelope Difference Index (EDI; Fortune, Woodruff, & Preves, 1994), was moderately…
Replacing Maladaptive Speech with Verbal Labeling Responses: An Analysis of Generalized Responding.
ERIC Educational Resources Information Center
Foxx, R. M.; And Others
1988-01-01
Three mentally handicapped students (aged 13, 36, and 40) with maladaptive speech received training to answer questions with verbal labels. The results of their cues-pause-point training showed that the students replaced their maladaptive speech with correct labels (answers) to questions in the training setting and three generalization settings.…
Probabilistic Phonotactics as a Cue for Recognizing Spoken Cantonese Words in Speech
ERIC Educational Resources Information Center
Yip, Michael C. W.
2017-01-01
Previous experimental psycholinguistic studies suggested that the probabilistic phonotactics information might likely to hint the locations of word boundaries in continuous speech and hence posed an interesting solution to the empirical question on how we recognize/segment individual spoken word in speech. We investigated this issue by using…
Improving Understanding of Emotional Speech Acoustic Content
NASA Astrophysics Data System (ADS)
Tinnemore, Anna
Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.
Getzmann, Stephan; Jasny, Julian; Falkenstein, Michael
2017-02-01
Verbal communication in a "cocktail-party situation" is a major challenge for the auditory system. In particular, changes in target speaker usually result in declined speech perception. Here, we investigated whether speech cues indicating a subsequent change in target speaker reduce the costs of switching in younger and older adults. We employed event-related potential (ERP) measures and a speech perception task, in which sequences of short words were simultaneously presented by four speakers. Changes in target speaker were either unpredictable or semantically cued by a word within the target stream. Cued changes resulted in a less decreased performance than uncued changes in both age groups. The ERP analysis revealed shorter latencies in the change-related N400 and late positive complex (LPC) after cued changes, suggesting an acceleration in context updating and attention switching. Thus, both younger and older listeners used semantic cues to prepare changes in speaker setting. Copyright © 2016 Elsevier Inc. All rights reserved.
Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin'ya
2013-01-01
It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.
Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap
Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin’Ya
2013-01-01
It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap. PMID:23658549
Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J
2004-09-01
The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.
ERIC Educational Resources Information Center
Haskins Labs., New Haven, CT.
Research reports on the nature of speech, instrumentation for the investigation of speech, and practical applications of research are included in this status report for the April 1-June 30, 1980, period. The reports deal with the following topics: (1) the perceptual equivalent of two acoustic cues for a speech contrast is specific to phonetic…
Ansari, M S; Rangasayee, R; Ansari, M A H
2017-03-01
Poor auditory speech perception in geriatrics is attributable to neural de-synchronisation due to structural and degenerative changes of ageing auditory pathways. The speech-evoked auditory brainstem response may be useful for detecting alterations that cause loss of speech discrimination. Therefore, this study aimed to compare the speech-evoked auditory brainstem response in adult and geriatric populations with normal hearing. The auditory brainstem responses to click sounds and to a 40 ms speech sound (the Hindi phoneme |da|) were compared in 25 young adults and 25 geriatric people with normal hearing. The latencies and amplitudes of transient peaks representing neural responses to the onset, offset and sustained portions of the speech stimulus in quiet and noisy conditions were recorded. The older group had significantly smaller amplitudes and longer latencies for the onset and offset responses to |da| in noisy conditions. Stimulus-to-response times were longer and the spectral amplitude of the sustained portion of the stimulus was reduced. The overall stimulus level caused significant shifts in latency across the entire speech-evoked auditory brainstem response in the older group. The reduction in neural speech processing in older adults suggests diminished subcortical responsiveness to acoustically dynamic spectral cues. However, further investigations are needed to encode temporal cues at the brainstem level and determine their relationship to speech perception for developing a routine tool for clinical decision-making.
Drolet, Matthis; Schubotz, Ricarda I; Fischer, Julia
2013-06-01
Context has been found to have a profound effect on the recognition of social stimuli and correlated brain activation. The present study was designed to determine whether knowledge about emotional authenticity influences emotion recognition expressed through speech intonation. Participants classified emotionally expressive speech in an fMRI experimental design as sad, happy, angry, or fearful. For some trials, stimuli were cued as either authentic or play-acted in order to manipulate participant top-down belief about authenticity, and these labels were presented both congruently and incongruently to the emotional authenticity of the stimulus. Contrasting authentic versus play-acted stimuli during uncued trials indicated that play-acted stimuli spontaneously up-regulate activity in the auditory cortex and regions associated with emotional speech processing. In addition, a clear interaction effect of cue and stimulus authenticity showed up-regulation in the posterior superior temporal sulcus and the anterior cingulate cortex, indicating that cueing had an impact on the perception of authenticity. In particular, when a cue indicating an authentic stimulus was followed by a play-acted stimulus, additional activation occurred in the temporoparietal junction, probably pointing to increased load on perspective taking in such trials. While actual authenticity has a significant impact on brain activation, individual belief about stimulus authenticity can additionally modulate the brain response to differences in emotionally expressive speech.
Children perceive speech onsets by ear and eye*
JERGER, SUSAN; DAMIAN, MARKUS F.; TYE-MURRAY, NANCY; ABDI, HERVÉ
2016-01-01
Adults use vision to perceive low-fidelity speech; yet how children acquire this ability is not well understood. The literature indicates that children show reduced sensitivity to visual speech from kindergarten to adolescence. We hypothesized that this pattern reflects the effects of complex tasks and a growth period with harder-to-utilize cognitive resources, not lack of sensitivity. We investigated sensitivity to visual speech in children via the phonological priming produced by low-fidelity (non-intact onset) auditory speech presented audiovisually (see dynamic face articulate consonant/rhyme b/ag; hear non-intact onset/rhyme: −b/ag) vs. auditorily (see still face; hear exactly same auditory input). Audiovisual speech produced greater priming from four to fourteen years, indicating that visual speech filled in the non-intact auditory onsets. The influence of visual speech depended uniquely on phonology and speechreading. Children – like adults – perceive speech onsets multimodally. Findings are critical for incorporating visual speech into developmental theories of speech perception. PMID:26752548
Visual cue-specific craving is diminished in stressed smokers.
Cochran, Justinn R; Consedine, Nathan S; Lee, John M J; Pandit, Chinmay; Sollers, John J; Kydd, Robert R
2017-09-01
Craving among smokers is increased by stress and exposure to smoking-related visual cues. However, few experimental studies have tested both elicitors concurrently and considered how exposures may interact to influence craving. The current study examined craving in response to stress and visual cue exposure, separately and in succession, in order to better understand the relationship between craving elicitation and the elicitor. Thirty-nine smokers (21 males) who forwent smoking for 30 minutes were randomized to complete a stress task and a visual cue task in counterbalanced orders (creating the experimental groups); for the cue task, counterbalanced blocks of neutral, motivational control, and smoking images were presented. Self-reported craving was assessed after each block of visual stimuli and stress task, and after a recovery period following each task. As expected, the stress and smoking images generated greater craving than neutral or motivational control images (p < .001). Interactions indicated craving in those who completed the stress task first differed from those who completed the visual cues task first (p < .05), such that stress task craving was greater than all image type craving (all p's < .05) only if the visual cue task was completed first. Conversely, craving was stable across image types when the stress task was completed first. Findings indicate when smokers are stressed, visual cues have little additive effect on craving, and different types of visual cues elicit comparable craving. These findings may imply that once stressed, smokers will crave cigarettes comparably notwithstanding whether they are exposed to smoking image cues.
Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition.
Díaz, Begoña; Blank, Helen; von Kriegstein, Katharina
2018-05-14
The cerebral cortex modulates early sensory processing via feed-back connections to sensory pathway nuclei. The functions of this top-down modulation for human behavior are poorly understood. Here, we show that top-down modulation of the visual sensory thalamus (the lateral geniculate body, LGN) is involved in visual-speech recognition. In two independent functional magnetic resonance imaging (fMRI) studies, LGN response increased when participants processed fast-varying features of articulatory movements required for visual-speech recognition, as compared to temporally more stable features required for face identification with the same stimulus material. The LGN response during the visual-speech task correlated positively with the visual-speech recognition scores across participants. In addition, the task-dependent modulation was present for speech movements and did not occur for control conditions involving non-speech biological movements. In face-to-face communication, visual speech recognition is used to enhance or even enable understanding what is said. Speech recognition is commonly explained in frameworks focusing on cerebral cortex areas. Our findings suggest that task-dependent modulation at subcortical sensory stages has an important role for communication: Together with similar findings in the auditory modality the findings imply that task-dependent modulation of the sensory thalami is a general mechanism to optimize speech recognition. Copyright © 2018. Published by Elsevier Inc.
Souza, Alessandra S; Rerko, Laura; Oberauer, Klaus
2016-06-01
Visual working memory (VWM) has a limited capacity. This limitation can be mitigated by the use of focused attention: if attention is drawn to the relevant working memory content before test, performance improves (the so-called retro-cue benefit). This study tests 2 explanations of the retro-cue benefit: (a) Focused attention protects memory representations from interference by visual input at test, and (b) focusing attention enhances retrieval. Across 6 experiments using color recognition and color reproduction tasks, we varied the amount of color interference at test, and the delay between a retrieval cue (i.e., the retro-cue) and the memory test. Retro-cue benefits were larger when the memory test introduced interfering visual stimuli, showing that the retro-cue effect is in part because of protection from visual interference. However, when visual interference was held constant, retro-cue benefits were still obtained whenever the retro-cue enabled retrieval of an object from VWM but delayed response selection. Our results show that accessible information in VWM might be lost in the processes of testing memory because of visual interference and incomplete retrieval. This is not an inevitable state of affairs, though: Focused attention can be used to get the most out of VWM. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
ERIC Educational Resources Information Center
Snyder, Gregory J.; Hough, Monica Strauss; Blanchet, Paul; Ivy, Lennette J.; Waddell, Dwight
2009-01-01
Purpose: Relatively recent research documents that visual choral speech, which represents an externally generated form of synchronous visual speech feedback, significantly enhanced fluency in those who stutter. As a consequence, it was hypothesized that self-generated synchronous and asynchronous visual speech feedback would likewise enhance…
Draht, Fabian; Zhang, Sijie; Rayan, Abdelrahman; Schönfeld, Fabian; Wiskott, Laurenz; Manahan-Vaughan, Denise
2017-01-01
Spatial encoding in the hippocampus is based on a range of different input sources. To generate spatial representations, reliable sensory cues from the external environment are integrated with idiothetic cues, derived from self-movement, that enable path integration and directional perception. In this study, we examined to what extent idiothetic cues significantly contribute to spatial representations and navigation: we recorded place cells while rodents navigated towards two visually identical chambers in 180° orientation via two different paths in darkness and in the absence of reliable auditory or olfactory cues. Our goal was to generate a conflict between local visual and direction-specific information, and then to assess which strategy was prioritized in different learning phases. We observed that, in the absence of distal cues, place fields are initially controlled by local visual cues that override idiothetic cues, but that with multiple exposures to the paradigm, spaced at intervals of days, idiothetic cues become increasingly implemented in generating an accurate spatial representation. Taken together, these data support that, in the absence of distal cues, local visual cues are prioritized in the generation of context-specific spatial representations through place cells, whereby idiothetic cues are deemed unreliable. With cumulative exposures to the environments, the animal learns to attend to subtle idiothetic cues to resolve the conflict between visual and direction-specific information.
Draht, Fabian; Zhang, Sijie; Rayan, Abdelrahman; Schönfeld, Fabian; Wiskott, Laurenz; Manahan-Vaughan, Denise
2017-01-01
Spatial encoding in the hippocampus is based on a range of different input sources. To generate spatial representations, reliable sensory cues from the external environment are integrated with idiothetic cues, derived from self-movement, that enable path integration and directional perception. In this study, we examined to what extent idiothetic cues significantly contribute to spatial representations and navigation: we recorded place cells while rodents navigated towards two visually identical chambers in 180° orientation via two different paths in darkness and in the absence of reliable auditory or olfactory cues. Our goal was to generate a conflict between local visual and direction-specific information, and then to assess which strategy was prioritized in different learning phases. We observed that, in the absence of distal cues, place fields are initially controlled by local visual cues that override idiothetic cues, but that with multiple exposures to the paradigm, spaced at intervals of days, idiothetic cues become increasingly implemented in generating an accurate spatial representation. Taken together, these data support that, in the absence of distal cues, local visual cues are prioritized in the generation of context-specific spatial representations through place cells, whereby idiothetic cues are deemed unreliable. With cumulative exposures to the environments, the animal learns to attend to subtle idiothetic cues to resolve the conflict between visual and direction-specific information. PMID:28634444
Jenkin, Michael R; Dyde, Richard T; Jenkin, Heather L; Zacher, James E; Harris, Laurence R
2011-01-01
The perceived direction of up depends on both gravity and visual cues to orientation. Static visual cues to orientation have been shown to be less effective in influencing the perception of upright (PU) under microgravity conditions than they are on earth (Dyde et al., 2009). Here we introduce dynamic orientation cues into the visual background to ascertain whether they might increase the effectiveness of visual cues in defining the PU under different gravity conditions. Brief periods of microgravity and hypergravity were created using parabolic flight. Observers viewed a polarized, natural scene presented at various orientations on a laptop viewed through a hood which occluded all other visual cues. The visual background was either an animated video clip in which actors moved along the visual ground plane or an individual static frame taken from the same clip. We measured the perceptual upright using the oriented character recognition test (OCHART). Dynamic visual cues significantly enhance the effectiveness of vision in determining the perceptual upright under normal gravity conditions. Strong trends were found for dynamic visual cues to produce an increase in the visual effect under both microgravity and hypergravity conditions.
ERIC Educational Resources Information Center
Schaadt, Gesa; Männel, Claudia; van der Meer, Elke; Pannekamp, Ann; Friederici, Angela D.
2016-01-01
Successful communication in everyday life crucially involves the processing of auditory and visual components of speech. Viewing our interlocutor and processing visual components of speech facilitates speech processing by triggering auditory processing. Auditory phoneme processing, analyzed by event-related brain potentials (ERP), has been shown…
Awareness of Rhythm Patterns in Speech and Music in Children with Specific Language Impairments
Cumming, Ruth; Wilson, Angela; Leong, Victoria; Colling, Lincoln J.; Goswami, Usha
2015-01-01
Children with specific language impairments (SLIs) show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm [amplitude rise time (ART) and sound duration] and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard “behind the door”). We report data for all of the SLI children (N = 45, IQ varying), as well as for two independent subgroupings with intact IQ. One subgroup, “Pure SLI,” had intact phonology and reading (N = 16), the other, “SLI PPR” (N = 15), had impaired phonology and reading. When IQ varied (all SLI children), we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR), group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a “prosodic phrasing” hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children. PMID:26733848
Awareness of Rhythm Patterns in Speech and Music in Children with Specific Language Impairments.
Cumming, Ruth; Wilson, Angela; Leong, Victoria; Colling, Lincoln J; Goswami, Usha
2015-01-01
Children with specific language impairments (SLIs) show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm [amplitude rise time (ART) and sound duration] and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard "behind the door"). We report data for all of the SLI children (N = 45, IQ varying), as well as for two independent subgroupings with intact IQ. One subgroup, "Pure SLI," had intact phonology and reading (N = 16), the other, "SLI PPR" (N = 15), had impaired phonology and reading. When IQ varied (all SLI children), we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR), group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a "prosodic phrasing" hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children.
The effect of contextual sound cues on visual fidelity perception.
Rojas, David; Cowan, Brent; Kapralos, Bill; Collins, Karen; Dubrowski, Adam
2014-01-01
Previous work has shown that sound can affect the perception of visual fidelity. Here we build upon this previous work by examining the effect of contextual sound cues (i.e., sounds that are related to the visuals) on visual fidelity perception. Results suggest that contextual sound cues do influence visual fidelity perception and, more specifically, our perception of visual fidelity increases with contextual sound cues. These results have implications for designers of multimodal virtual worlds and serious games that, with the appropriate use of contextual sounds, can reduce visual rendering requirements without a corresponding decrease in the perception of visual fidelity.
ERIC Educational Resources Information Center
Zekveld, Adriana A.; Rudner, Mary; Johnsrude, Ingrid S.; Heslenfeld, Dirk J.; Ronnberg, Jerker
2012-01-01
Text cues facilitate the perception of spoken sentences to which they are semantically related (Zekveld, Rudner, et al., 2011). In this study, semantically related and unrelated cues preceding sentences evoked more activation in middle temporal gyrus (MTG) and inferior frontal gyrus (IFG) than nonword cues, regardless of acoustic quality (speech…
Gallagher, Rosemary; Damodaran, Harish; Werner, William G; Powell, Wendy; Deutsch, Judith E
2016-08-19
Evidence based virtual environments (VEs) that incorporate compensatory strategies such as cueing may change motor behavior and increase exercise intensity while also being engaging and motivating. The purpose of this study was to determine if persons with Parkinson's disease and aged matched healthy adults responded to auditory and visual cueing embedded in a bicycling VE as a method to increase exercise intensity. We tested two groups of participants, persons with Parkinson's disease (PD) (n = 15) and age-matched healthy adults (n = 13) as they cycled on a stationary bicycle while interacting with a VE. Participants cycled under two conditions: auditory cueing (provided by a metronome) and visual cueing (represented as central road markers in the VE). The auditory condition had four trials in which auditory cues or the VE were presented alone or in combination. The visual condition had five trials in which the VE and visual cue rate presentation was manipulated. Data were analyzed by condition using factorial RMANOVAs with planned t-tests corrected for multiple comparisons. There were no differences in pedaling rates between groups for both the auditory and visual cueing conditions. Persons with PD increased their pedaling rate in the auditory (F 4.78, p = 0.029) and visual cueing (F 26.48, p < 0.000) conditions. Age-matched healthy adults also increased their pedaling rate in the auditory (F = 24.72, p < 0.000) and visual cueing (F = 40.69, p < 0.000) conditions. Trial-to-trial comparisons in the visual condition in age-matched healthy adults showed a step-wise increase in pedaling rate (p = 0.003 to p < 0.000). In contrast, persons with PD increased their pedaling rate only when explicitly instructed to attend to the visual cues (p < 0.000). An evidenced based cycling VE can modify pedaling rate in persons with PD and age-matched healthy adults. Persons with PD required attention directed to the visual cues in order to obtain an increase in cycling intensity. The combination of the VE and auditory cues was neither additive nor interfering. These data serve as preliminary evidence that embedding auditory and visual cues to alter cycling speed in a VE as method to increase exercise intensity that may promote fitness.
Listeners' expectation of room acoustical parameters based on visual cues
NASA Astrophysics Data System (ADS)
Valente, Daniel L.
Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audio-visual study, in which participants are instructed to make spatial congruency and quantity judgments in dynamic cross-modal environments. The results of these psychophysical tests suggest the importance of consilient audio-visual presentation to the legibility of an auditory scene. Several studies have looked into audio-visual interaction in room perception in recent years, but these studies rely on static images, speech signals, or photographs alone to represent the visual scene. Building on these studies, the aim is to propose a testing method that uses monochromatic compositing (blue-screen technique) to position a studio recording of a musical performance in a number of virtual acoustical environments and ask subjects to assess these environments. In the first experiment of the study, video footage was taken from five rooms varying in physical size from a small studio to a small performance hall. Participants were asked to perceptually align two distinct acoustical parameters---early-to-late reverberant energy ratio and reverberation time---of two solo musical performances in five contrasting visual environments according to their expectations of how the room should sound given its visual appearance. In the second experiment in the study, video footage shot from four different listening positions within a general-purpose space was coupled with sounds derived from measured binaural impulse responses (IRs). The relationship between the presented image, sound, and virtual receiver position was examined. It was found that many visual cues caused different perceived events of the acoustic environment. This included the visual attributes of the space in which the performance was located as well as the visual attributes of the performer. The addressed visual makeup of the performer included: (1) an actual video of the performance, (2) a surrogate image of the performance, for example a loudspeaker's image reproducing the performance, (3) no visual image of the performance (empty room), or (4) a multi-source visual stimulus (actual video of the performance coupled with two images of loudspeakers positioned to the left and right of the performer). For this experiment, perceived auditory events of sound were measured in terms of two subjective spatial metrics: Listener Envelopment (LEV) and Apparent Source Width (ASW) These metrics were hypothesized to be dependent on the visual imagery of the presented performance. Data was also collected by participants matching direct and reverberant sound levels for the presented audio-visual scenes. In the final experiment, participants judged spatial expectations of an ensemble of musicians presented in the five physical spaces from Experiment 1. Supporting data was accumulated in two stages. First, participants were given an audio-visual matching test, in which they were instructed to align the auditory width of a performing ensemble to a varying set of audio and visual cues. In the second stage, a conjoint analysis design paradigm was explored to extrapolate the relative magnitude of explored audio-visual factors in affecting three assessed response criteria: Congruency (the perceived match-up of the auditory and visual cues in the assessed performance), ASW and LEV. Results show that both auditory and visual factors affect the collected responses, and that the two sensory modalities coincide in distinct interactions. This study reveals participant resiliency in the presence of forced auditory-visual mismatch: Participants are able to adjust the acoustic component of the cross-modal environment in a statistically similar way despite randomized starting values for the monitored parameters. Subjective results of the experiments are presented along with objective measurements for verification.
Li, Feipeng; Trevino, Andrea; Menon, Anjali; Allen, Jont B
2012-10-01
In a previous study on plosives, the 3-Dimensional Deep Search (3DDS) method for the exploration of the necessary and sufficient cues for speech perception was introduced (Li et al., (2010). J. Acoust. Soc. Am. 127(4), 2599-2610). Here, this method is used to isolate the spectral cue regions for perception of the American English fricatives /∫, 3, s, z, f, v, θ, δ in time, frequency, and intensity. The fricatives are analyzed in the context of consonant-vowel utterances, using the vowel /α/. The necessary cues were found to be contained in the frication noise for /∫, 3, s, z, f, v/. 3DDS analysis isolated the cue regions of /s, z/ between 3.6 and 8 [kHz] and /∫, 3/ between 1.4 and 4.2 [kHz]. Some utterances were found to contain acoustic components that were unnecessary for correct perception, but caused listeners to hear non-target consonants when the primary cue region was removed; such acoustic components are labeled "conflicting cue regions." The amplitude modulation of the high-frequency frication region by the fundamental F0 was found to be a sufficient cue for voicing. Overall, the 3DDS method allows one to analyze the effects of natural speech components without initial assumptions about where perceptual cues lie in time-frequency space or which elements of production they correspond to.
Sound frequency affects speech emotion perception: results from congenital amusia
Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche
2015-01-01
Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718
ERIC Educational Resources Information Center
Rasanen, Okko
2011-01-01
Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…
Narayanan, Shrikanth; Georgiou, Panayiotis G
2013-02-07
The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion.
Landwehr, Markus; Fürstenberg, Dirk; Walger, Martin; von Wedel, Hasso; Meister, Hartmut
2014-01-01
Advances in speech coding strategies and electrode array designs for cochlear implants (CIs) predominantly aim at improving speech perception. Current efforts are also directed at transmitting appropriate cues of the fundamental frequency (F0) to the auditory nerve with respect to speech quality, prosody, and music perception. The aim of this study was to examine the effects of various electrode configurations and coding strategies on speech intonation identification, speaker gender identification, and music quality rating. In six MED-EL CI users electrodes were selectively deactivated in order to simulate different insertion depths and inter-electrode distances when using the high definition continuous interleaved sampling (HDCIS) and fine structure processing (FSP) speech coding strategies. Identification of intonation and speaker gender was determined and music quality rating was assessed. For intonation identification HDCIS was robust against the different electrode configurations, whereas fine structure processing showed significantly worse results when a short electrode depth was simulated. In contrast, speaker gender recognition was not affected by electrode configuration or speech coding strategy. Music quality rating was sensitive to electrode configuration. In conclusion, the three experiments revealed different outcomes, even though they all addressed the reception of F0 cues. Rapid changes in F0, as seen with intonation, were the most sensitive to electrode configurations and coding strategies. In contrast, electrode configurations and coding strategies did not show large effects when F0 information was available over a longer time period, as seen with speaker gender. Music quality relies on additional spectral cues other than F0, and was poorest when a shallow insertion was simulated.
ERIC Educational Resources Information Center
Megnin-Viggars, Odette; Goswami, Usha
2013-01-01
Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…
Directed Forgetting and Directed Remembering in Visual Working Memory
Williams, Melonie; Woodman, Geoffrey F.
2013-01-01
A defining characteristic of visual working memory is its limited capacity. This means that it is crucial to maintain only the most relevant information in visual working memory. However, empirical research is mixed as to whether it is possible to selectively maintain a subset of the information previously encoded into visual working memory. Here we examined the ability of subjects to use cues to either forget or remember a subset of the information already stored in visual working memory. In Experiment 1, participants were cued to either forget or remember one of two groups of colored squares during a change-detection task. We found that both types of cues aided performance in the visual working memory task, but that observers benefited more from a cue to remember than a cue to forget a subset of the objects. In Experiment 2, we show that the previous findings, which indicated that directed-forgetting cues are ineffective, were likely due to the presence of invalid cues that appear to cause observers to disregard such cues as unreliable. In Experiment 3, we recorded event-related potentials (ERPs) and show that an electrophysiological index of focused maintenance is elicited by cues that indicate which subset of information in visual working memory needs to be remembered, ruling out alternative explanations of the behavioral effects of retention-interval cues. The present findings demonstrate that observers can focus maintenance mechanisms on specific objects in visual working memory based on cues indicating future task relevance. PMID:22409182
Bock, Otmar; Bury, Nils
2018-03-01
Our perception of the vertical corresponds to the weighted sum of gravicentric, egocentric, and visual cues. Here we evaluate the interplay of those cues not for the perceived but rather for the motor vertical. Participants were asked to flip an omnidirectional switch down while their egocentric vertical was dissociated from their visual-gravicentric vertical. Responses were directed mid-between the two verticals; specifically, the data suggest that the relative weight of congruent visual-gravicentric cues averages 0.62, and correspondingly, the relative weight of egocentric cues averages 0.38. We conclude that the interplay of visual-gravicentric cues with egocentric cues is similar for the motor and for the perceived vertical. Unexpectedly, we observed a consistent dependence of the motor vertical on hand position, possibly mediated by hand orientation or by spatial selective attention.
Effects of sentence-structure complexity on speech initiation time and disfluency.
Tsiamtsiouris, Jim; Cairns, Helen Smith
2013-03-01
There is general agreement that stuttering is caused by a variety of factors, and language formulation and speech motor control are two important factors that have been implicated in previous research, yet the exact nature of their effects is still not well understood. Our goal was to test the hypothesis that sentences of high structural complexity would incur greater processing costs than sentences of low structural complexity and these costs would be higher for adults who stutter than for adults who do not stutter. Fluent adults and adults who stutter participated in an experiment that required memorization of a sentence classified as low or high structural complexity followed by production of that sentence upon a visual cue. Both groups of speakers initiated most sentences significantly faster in the low structural complexity condition than in the high structural complexity condition. Adults who stutter were over-all slower in speech initiation than were fluent speakers, but there were no significant interactions between complexity and group. However, adults who stutter produced significantly more disfluencies in sentences of high structural complexity than in those of low complexity. After reading this article, the learner will be able to: (a) identify integral parts of all well-known models of adult sentence production; (b) summarize the way that sentence structure might negatively influence the speech production processes; (c) discuss whether sentence structure influences speech initiation time and disfluencies. Copyright © 2012 Elsevier Inc. All rights reserved.
Paladini, Rebecca E.; Diana, Lorenzo; Zito, Giuseppe A.; Nyffeler, Thomas; Wyss, Patric; Mosimann, Urs P.; Müri, René M.; Nef, Tobias
2018-01-01
Cross-modal spatial cueing can affect performance in a visual search task. For example, search performance improves if a visual target and an auditory cue originate from the same spatial location, and it deteriorates if they originate from different locations. Moreover, it has recently been postulated that multisensory settings, i.e., experimental settings, in which critical stimuli are concurrently presented in different sensory modalities (e.g., visual and auditory), may trigger asymmetries in visuospatial attention. Thereby, a facilitation has been observed for visual stimuli presented in the right compared to the left visual space. However, it remains unclear whether auditory cueing of attention differentially affects search performance in the left and the right hemifields in audio-visual search tasks. The present study investigated whether spatial asymmetries would occur in a search task with cross-modal spatial cueing. Participants completed a visual search task that contained no auditory cues (i.e., unimodal visual condition), spatially congruent, spatially incongruent, and spatially non-informative auditory cues. To further assess participants’ accuracy in localising the auditory cues, a unimodal auditory spatial localisation task was also administered. The results demonstrated no left/right asymmetries in the unimodal visual search condition. Both an additional incongruent, as well as a spatially non-informative, auditory cue resulted in lateral asymmetries. Thereby, search times were increased for targets presented in the left compared to the right hemifield. No such spatial asymmetry was observed in the congruent condition. However, participants’ performance in the congruent condition was modulated by their tone localisation accuracy. The findings of the present study demonstrate that spatial asymmetries in multisensory processing depend on the validity of the cross-modal cues, and occur under specific attentional conditions, i.e., when visual attention has to be reoriented towards the left hemifield. PMID:29293637
How visual cues for when to listen aid selective auditory attention.
Varghese, Lenny A; Ozmeral, Erol J; Best, Virginia; Shinn-Cunningham, Barbara G
2012-06-01
Visual cues are known to aid auditory processing when they provide direct information about signal content, as in lip reading. However, some studies hint that visual cues also aid auditory perception by guiding attention to the target in a mixture of similar sounds. The current study directly tests this idea for complex, nonspeech auditory signals, using a visual cue providing only timing information about the target. Listeners were asked to identify a target zebra finch bird song played at a random time within a longer, competing masker. Two different maskers were used: noise and a chorus of competing bird songs. On half of all trials, a visual cue indicated the timing of the target within the masker. For the noise masker, the visual cue did not affect performance when target and masker were from the same location, but improved performance when target and masker were in different locations. In contrast, for the chorus masker, visual cues improved performance only when target and masker were perceived as coming from the same direction. These results suggest that simple visual cues for when to listen improve target identification by enhancing sounds near the threshold of audibility when the target is energetically masked and by enhancing segregation when it is difficult to direct selective attention to the target. Visual cues help little when target and masker already differ in attributes that enable listeners to engage selective auditory attention effectively, including differences in spectrotemporal structure and in perceived location.
Role of Visual Speech in Phonological Processing by Children With Hearing Loss
Jerger, Susan; Tye-Murray, Nancy; Abdi, Hervé
2011-01-01
Purpose This research assessed the influence of visual speech on phonological processing by children with hearing loss (HL). Method Children with HL and children with normal hearing (NH) named pictures while attempting to ignore auditory or audiovisual speech distractors whose onsets relative to the pictures were either congruent, conflicting in place of articulation, or conflicting in voicing—for example, the picture “pizza” coupled with the distractors “peach,” “teacher,” or “beast,” respectively. Speed of picture naming was measured. Results The conflicting conditions slowed naming, and phonological processing by children with HL displayed the age-related shift in sensitivity to visual speech seen in children with NH, although with developmental delay. Younger children with HL exhibited a disproportionately large influence of visual speech and a negligible influence of auditory speech, whereas older children with HL showed a robust influence of auditory speech with no benefit to performance from adding visual speech. The congruent conditions did not speed naming in children with HL, nor did the addition of visual speech influence performance. Unexpectedly, the /∧/-vowel congruent distractors slowed naming in children with HL and decreased articulatory proficiency. Conclusions Results for the conflicting conditions are consistent with the hypothesis that speech representations in children with HL (a) are initially disproportionally structured in terms of visual speech and (b) become better specified with age in terms of auditorily encoded information. PMID:19339701
Temporal and peripheral extraction of contextual cues from scenes during visual search.
Koehler, Kathryn; Eckstein, Miguel P
2017-02-01
Scene context is known to facilitate object recognition and guide visual search, but little work has focused on isolating image-based cues and evaluating their contributions to eye movement guidance and search performance. Here, we explore three types of contextual cues (a co-occurring object, the configuration of other objects, and the superordinate category of background elements) and assess their joint contributions to search performance in the framework of cue-combination and the temporal unfolding of their extraction. We also assess whether observers' ability to extract each contextual cue in the visual periphery is a bottleneck that determines the utilization and contribution of each cue to search guidance and decision accuracy. We find that during the first four fixations of a visual search task observers first utilize the configuration of objects for coarse eye movement guidance and later use co-occurring object information for finer guidance. In the absence of contextual cues, observers were suboptimally biased to report the target object as being absent. The presence of the co-occurring object was the only contextual cue that had a significant effect in reducing decision bias. The early influence of object-based cues on eye movements is corroborated by a clear demonstration of observers' ability to extract object cues up to 16° into the visual periphery. The joint contributions of the cues to decision search accuracy approximates that expected from the combination of statistically independent cues and optimal cue combination. Finally, the lack of utilization and contribution of the background-based contextual cue to search guidance cannot be explained by the availability of the contextual cue in the visual periphery; instead it is related to background cues providing the least inherent information about the precise location of the target in the scene.
NASA Technical Reports Server (NTRS)
Parris, B. L.; Cook, A. M.
1978-01-01
Data are presented that show the effects of visual and motion during cueing on pilot performance during takeoffs with engine failures. Four groups of USAF pilots flew a simulated KC-135 using four different cueing systems. The most basic of these systems was of the instrument-only type. Visual scene simulation and/or motion simulation was added to produce the other systems. Learning curves, mean performance, and subjective data are examined. The results show that the addition of visual cueing results in significant improvement in pilot performance, but the combined use of visual and motion cueing results in far better performance.
Manual control of yaw motion with combined visual and vestibular cues
NASA Technical Reports Server (NTRS)
Zacharias, G. L.; Young, L. R.
1977-01-01
Measurements are made of manual control performance in the closed-loop task of nulling perceived self-rotation velocity about an earth-vertical axis. Self-velocity estimation was modelled as a function of the simultaneous presentation of vestibular and peripheral visual field motion cues. Based on measured low-frequency operator behavior in three visual field environments, a parallel channel linear model is proposed which has separate visual and vestibular pathways summing in a complementary manner. A correction to the frequency responses is provided by a separate measurement of manual control performance in an analogous visual pursuit nulling task. The resulting dual-input describing function for motion perception dependence on combined cue presentation supports the complementary model, in which vestibular cues dominate sensation at frequencies above 0.05 Hz. The describing function model is extended by the proposal of a non-linear cue conflict model, in which cue weighting depends on the level of agreement between visual and vestibular cues.
Gesture helps learners learn, but not merely by guiding their visual attention.
Wakefield, Elizabeth; Novack, Miriam A; Congdon, Eliza L; Franconeri, Steven; Goldin-Meadow, Susan
2018-04-16
Teaching a new concept through gestures-hand movements that accompany speech-facilitates learning above-and-beyond instruction through speech alone (e.g., Singer & Goldin-Meadow, ). However, the mechanisms underlying this phenomenon are still under investigation. Here, we use eye tracking to explore one often proposed mechanism-gesture's ability to direct visual attention. Behaviorally, we replicate previous findings: Children perform significantly better on a posttest after learning through Speech+Gesture instruction than through Speech Alone instruction. Using eye tracking measures, we show that children who watch a math lesson with gesture do allocate their visual attention differently from children who watch a math lesson without gesture-they look more to the problem being explained, less to the instructor, and are more likely to synchronize their visual attention with information presented in the instructor's speech (i.e., follow along with speech) than children who watch the no-gesture lesson. The striking finding is that, even though these looking patterns positively predict learning outcomes, the patterns do not mediate the effects of training condition (Speech Alone vs. Speech+Gesture) on posttest success. We find instead a complex relation between gesture and visual attention in which gesture moderates the impact of visual looking patterns on learning-following along with speech predicts learning for children in the Speech+Gesture condition, but not for children in the Speech Alone condition. Gesture's beneficial effects on learning thus come not merely from its ability to guide visual attention, but also from its ability to synchronize with speech and affect what learners glean from that speech. © 2018 John Wiley & Sons Ltd.
The interaction of acoustic and linguistic grouping cues in auditory object formation
NASA Astrophysics Data System (ADS)
Shapley, Kathy; Carrell, Thomas
2005-09-01
One of the earliest explanations for good speech intelligibility in poor listening situations was context [Miller et al., J. Exp. Psychol. 41 (1951)]. Context presumably allows listeners to group and predict speech appropriately and is known as a top-down listening strategy. Amplitude comodulation is another mechanism that has been shown to improve sentence intelligibility. Amplitude comodulation provides acoustic grouping information without changing the linguistic content of the desired signal [Carrell and Opie, Percept. Psychophys. 52 (1992); Hu and Wang, Proceedings of ICASSP-02 (2002)] and is considered a bottom-up process. The present experiment investigated how amplitude comodulation and semantic information combined to improve speech intelligibility. Sentences with high- and low-predictability word sequences [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84 (1988)] were constructed in two different formats: time-varying sinusoidal sentences (TVS) and reduced-channel sentences (RC). The stimuli were chosen because they minimally represent the traditionally defined speech cues and therefore emphasized the importance of the high-level context effects and low-level acoustic grouping cues. Results indicated that semantic information did not influence intelligibility levels of TVS and RC sentences. In addition amplitude modulation aided listeners' intelligibility scores in the TVS condition but hindered listeners' intelligibility scores in the RC condition.
Electrophysiological and hemodynamic mismatch responses in rats listening to human speech syllables.
Mahmoudzadeh, Mahdi; Dehaene-Lambertz, Ghislaine; Wallois, Fabrice
2017-01-01
Speech is a complex auditory stimulus which is processed according to several time-scales. Whereas consonant discrimination is required to resolve rapid acoustic events, voice perception relies on slower cues. Humans, right from preterm ages, are particularly efficient to encode temporal cues. To compare the capacities of preterms to those observed in other mammals, we tested anesthetized adult rats by using exactly the same paradigm as that used in preterm neonates. We simultaneously recorded neural (using ECoG) and hemodynamic responses (using fNIRS) to series of human speech syllables and investigated the brain response to a change of consonant (ba vs. ga) and to a change of voice (male vs. female). Both methods revealed concordant results, although ECoG measures were more sensitive than fNIRS. Responses to syllables were bilateral, but with marked right-hemispheric lateralization. Responses to voice changes were observed with both methods, while only ECoG was sensitive to consonant changes. These results suggest that rats more effectively processed the speech envelope than fine temporal cues in contrast with human preterm neonates, in whom the opposite effects were observed. Cross-species comparisons constitute a very valuable tool to define the singularities of the human brain and species-specific bias that may help human infants to learn their native language.
NASA Technical Reports Server (NTRS)
Parrish, R. V.; Bowles, R. L.
1983-01-01
This paper addresses the issues of motion/visual cueing fidelity requirements for vortex encounters during simulated transport visual approaches and landings. Four simulator configurations were utilized to provide objective performance measures during simulated vortex penetrations, and subjective comments from pilots were collected. The configurations used were as follows: fixed base with visual degradation (delay), fixed base with no visual degradation, moving base with visual degradation (delay), and moving base with no visual degradation. The statistical comparisons of the objective measures and the subjective pilot opinions indicated that although both minimum visual delay and motion cueing are recommended for the vortex penetration task, the visual-scene delay characteristics were not as significant a fidelity factor as was the presence of motion cues. However, this indication was applicable to a restricted task, and to transport aircraft. Although they were statistically significant, the effects of visual delay and motion cueing on the touchdown-related measures were considered to be of no practical consequence.
Measuring effectiveness of semantic cues in degraded English sentences in non-native listeners.
Shi, Lu-Feng
2014-01-01
This study employed Boothroyd and Nittrouer's k (1988) to directly quantify effectiveness in native versus non-native listeners' use of semantic cues. Listeners were presented speech-perception-in-noise sentences processed at three levels of concurrent multi-talker babble and reverberation. For each condition, 50 sentences with multiple semantic cues and 50 with minimum semantic cues were randomly presented. Listeners verbally reported and wrote down the target words. The metric, k, was derived from percent-correct scores for sentences with and without semantics. Ten native and 33 non-native listeners participated. The presence of semantics increased recognition benefit by over 250% for natives, but access to semantics remained limited for non-native listeners (90-135%). The k was comparable across conditions for native listeners, but level-dependent for non-natives. The k for non-natives was significantly different from 1 in all conditions, suggesting semantic cues, though reduced in importance in difficult conditions, were helpful for non-natives. Non-natives as a group were not as effective in using semantics to facilitate English sentence recognition as natives. Poor listening conditions were particularly adverse to the use of semantics in non-natives, who may rely on clear acoustic-phonetic cues before benefitting from semantic cues when recognizing connected speech.
Media/Device Configurations for Platoon Leader Tactical Training
1985-02-01
munication and visual communication sig- na ls, VInputs to the The device should simulate the real- Platoon Leader time receipt of all tactical voice...communication, audio and visual battle- field cues, and visual communication signals. 14- Table 4 (Continued) Functional Capability Categories and...battlefield cues, and visual communication signals. 0.8 Receipt of limited tactical voice communication, plus audio and visual battlefield cues, and visual
Unsupervised real-time speaker identification for daily movies
NASA Astrophysics Data System (ADS)
Li, Ying; Kuo, C.-C. Jay
2002-07-01
The problem of identifying speakers for movie content analysis is addressed in this paper. While most previous work on speaker identification was carried out in a supervised mode using pure audio data, more robust results can be obtained in real-time by integrating knowledge from multiple media sources in an unsupervised mode. In this work, both audio and visual cues will be employed and subsequently combined in a probabilistic framework to identify speakers. Particularly, audio information is used to identify speakers with a maximum likelihood (ML)-based approach while visual information is adopted to distinguish speakers by detecting and recognizing their talking faces based on face detection/recognition and mouth tracking techniques. Moreover, to accommodate for speakers' acoustic variations along time, we update their models on the fly by adapting to their newly contributed speech data. Encouraging results have been achieved through extensive experiments, which shows a promising future of the proposed audiovisual-based unsupervised speaker identification system.
Visual Speech Primes Open-Set Recognition of Spoken Words
ERIC Educational Resources Information Center
Buchwald, Adam B.; Winters, Stephen J.; Pisoni, David B.
2009-01-01
Visual speech perception has become a topic of considerable interest to speech researchers. Previous research has demonstrated that perceivers neurally encode and use speech information from the visual modality, and this information has been found to facilitate spoken word recognition in tasks such as lexical decision (Kim, Davis, & Krins,…
The effect of changing the secondary task in dual-task paradigms for measuring listening effort.
Picou, Erin M; Ricketts, Todd A
2014-01-01
The purpose of this study was to evaluate the effect of changing the secondary task in dual-task paradigms that measure listening effort. Specifically, the effects of increasing the secondary task complexity or the depth of processing on a paradigm's sensitivity to changes in listening effort were quantified in a series of two experiments. Specific factors investigated within each experiment were background noise and visual cues. Participants in Experiment 1 were adults with normal hearing (mean age 23 years) and participants in Experiment 2 were adults with mild sloping to moderately severe sensorineural hearing loss (mean age 60.1 years). In both experiments, participants were tested using three dual-task paradigms. These paradigms had identical primary tasks, which were always monosyllable word recognition. The secondary tasks were all physical reaction time measures. The stimulus for the secondary task varied by paradigm and was a (1) simple visual probe, (2) a complex visual probe, or (3) the category of word presented. In this way, the secondary tasks mainly varied from the simple paradigm by either complexity or depth of speech processing. Using all three paradigms, participants were tested in four conditions, (1) auditory-only stimuli in quiet, (2) auditory-only stimuli in noise, (3) auditory-visual stimuli in quiet, and (4) auditory-visual stimuli in noise. During auditory-visual conditions, the talker's face was visible. Signal-to-noise ratios used during conditions with background noise were set individually so word recognition performance was matched in auditory-only and auditory-visual conditions. In noise, word recognition performance was approximately 80% and 65% for Experiments 1 and 2, respectively. For both experiments, word recognition performance was stable across the three paradigms, confirming that none of the secondary tasks interfered with the primary task. In Experiment 1 (listeners with normal hearing), analysis of median reaction times revealed a significant main effect of background noise on listening effort only with the paradigm that required deep processing. Visual cues did not change listening effort as measured with any of the three dual-task paradigms. In Experiment 2 (listeners with hearing loss), analysis of median reaction times revealed expected significant effects of background noise using all three paradigms, but no significant effects of visual cues. None of the dual-task paradigms were sensitive to the effects of visual cues. Furthermore, changing the complexity of the secondary task did not change dual-task paradigm sensitivity to the effects of background noise on listening effort for either group of listeners. However, the paradigm whose secondary task involved deeper processing was more sensitive to the effects of background noise for both groups of listeners. While this paradigm differed from the others in several respects, depth of processing may be partially responsible for the increased sensitivity. Therefore, this paradigm may be a valuable tool for evaluating other factors that affect listening effort.
Multimodal Infant-Directed Communication: How Caregivers Combine Tactile and Linguistic Cues
ERIC Educational Resources Information Center
Abu-Zhaya, Rana; Seidl, Amanda; Cristia, Alejandrina
2017-01-01
Both touch and speech independently have been shown to play an important role in infant development. However, little is known about how they may be combined in the input to the child. We examined the use of touch and speech together by having mothers read their 5-month-olds books about body parts and animals. Results suggest that speech+touch…
ERIC Educational Resources Information Center
Holden, Laura K.; Vandali, Andrew E.; Skinner, Margaret W.; Fourakis, Marios S.; Holden, Timothy A.
2005-01-01
One of the difficulties faced by cochlear implant (CI) recipients is perception of low-intensity speech cues. A. E. Vandali (2001) has developed the transient emphasis spectral maxima (TESM) strategy to amplify short-duration, low-level sounds. The aim of the present study was to determine whether speech scores would be significantly higher with…
ERIC Educational Resources Information Center
McMurray, Bob; Jongman, Allard
2011-01-01
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the…
Taitelbaum-Swead, Riki; Fostick, Leah
2016-01-01
Everyday life includes fluctuating noise levels, resulting in continuously changing speech intelligibility. The study aims were: (1) to quantify the amount of decrease in age-related speech perception, as a result of increasing noise level, and (2) to test the effect of age on context usage at the word level (smaller amount of contextual cues). A total of 24 young adults (age 20-30 years) and 20 older adults (age 60-75 years) were tested. Meaningful and nonsense one-syllable consonant-vowel-consonant words were presented with the background noise types of speech noise (SpN), babble noise (BN), and white noise (WN), with a signal-to-noise ratio (SNR) of 0 and -5 dB. Older adults had lower accuracy in SNR = 0, with WN being the most difficult condition for all participants. Measuring the change in speech perception when SNR decreased showed a reduction of 18.6-61.5% in intelligibility, with age effect only for BN. Both young and older adults used less phonemic context with WN, as compared to other conditions. Older adults are more affected by an increasing noise level of fluctuating informational noise as compared to steady-state noise. They also use less contextual cues when perceiving monosyllabic words. Further studies should take into consideration that when presenting the stimulus differently (change in noise level, less contextual cues), other perceptual and cognitive processes are involved. © 2016 S. Karger AG, Basel.
NASA Technical Reports Server (NTRS)
Zacharias, G. L.; Young, L. R.
1981-01-01
Measurements are made of manual control performance in the closed-loop task of nulling perceived self-rotation velocity about an earth-vertical axis. Self-velocity estimation is modeled as a function of the simultaneous presentation of vestibular and peripheral visual field motion cues. Based on measured low-frequency operator behavior in three visual field environments, a parallel channel linear model is proposed which has separate visual and vestibular pathways summing in a complementary manner. A dual-input describing function analysis supports the complementary model; vestibular cues dominate sensation at higher frequencies. The describing function model is extended by the proposal of a nonlinear cue conflict model, in which cue weighting depends on the level of agreement between visual and vestibular cues.
Visual Attention in Flies-Dopamine in the Mushroom Bodies Mediates the After-Effect of Cueing.
Koenig, Sebastian; Wolf, Reinhard; Heisenberg, Martin
2016-01-01
Visual environments may simultaneously comprise stimuli of different significance. Often such stimuli require incompatible responses. Selective visual attention allows an animal to respond exclusively to the stimuli at a certain location in the visual field. In the process of establishing its focus of attention the animal can be influenced by external cues. Here we characterize the behavioral properties and neural mechanism of cueing in the fly Drosophila melanogaster. A cue can be attractive, repulsive or ineffective depending upon (e.g.) its visual properties and location in the visual field. Dopamine signaling in the brain is required to maintain the effect of cueing once the cue has disappeared. Raising or lowering dopamine at the synapse abolishes this after-effect. Specifically, dopamine is necessary and sufficient in the αβ-lobes of the mushroom bodies. Evidence is provided for an involvement of the αβposterior Kenyon cells.
Little, Anthony C; DeBruine, Lisa M; Jones, Benedict C
2011-07-07
Evolutionary approaches to human attractiveness have documented several traits that are proposed to be attractive across individuals and cultures, although both cross-individual and cross-cultural variations are also often found. Previous studies show that parasite prevalence and mortality/health are related to cultural variation in preferences for attractive traits. Visual experience of pathogen cues may mediate such variable preferences. Here we showed individuals slideshows of images with cues to low and high pathogen prevalence and measured their visual preferences for face traits. We found that both men and women moderated their preferences for facial masculinity and symmetry according to recent experience of visual cues to environmental pathogens. Change in preferences was seen mainly for opposite-sex faces, with women preferring more masculine and more symmetric male faces and men preferring more feminine and more symmetric female faces after exposure to pathogen cues than when not exposed to such cues. Cues to environmental pathogens had no significant effects on preferences for same-sex faces. These data complement studies of cross-cultural differences in preferences by suggesting a mechanism for variation in mate preferences. Similar visual experience could lead to within-cultural agreement and differing visual experience could lead to cross-cultural variation. Overall, our data demonstrate that preferences can be strategically flexible according to recent visual experience with pathogen cues. Given that cues to pathogens may signal an increase in contagion/mortality risk, it may be adaptive to shift visual preferences in favour of proposed good-gene markers in environments where such cues are more evident.
Little, Anthony C.; DeBruine, Lisa M.; Jones, Benedict C.
2011-01-01
Evolutionary approaches to human attractiveness have documented several traits that are proposed to be attractive across individuals and cultures, although both cross-individual and cross-cultural variations are also often found. Previous studies show that parasite prevalence and mortality/health are related to cultural variation in preferences for attractive traits. Visual experience of pathogen cues may mediate such variable preferences. Here we showed individuals slideshows of images with cues to low and high pathogen prevalence and measured their visual preferences for face traits. We found that both men and women moderated their preferences for facial masculinity and symmetry according to recent experience of visual cues to environmental pathogens. Change in preferences was seen mainly for opposite-sex faces, with women preferring more masculine and more symmetric male faces and men preferring more feminine and more symmetric female faces after exposure to pathogen cues than when not exposed to such cues. Cues to environmental pathogens had no significant effects on preferences for same-sex faces. These data complement studies of cross-cultural differences in preferences by suggesting a mechanism for variation in mate preferences. Similar visual experience could lead to within-cultural agreement and differing visual experience could lead to cross-cultural variation. Overall, our data demonstrate that preferences can be strategically flexible according to recent visual experience with pathogen cues. Given that cues to pathogens may signal an increase in contagion/mortality risk, it may be adaptive to shift visual preferences in favour of proposed good-gene markers in environments where such cues are more evident. PMID:21123269
Buchan, Julie N; Munhall, Kevin G
2011-01-01
Conflicting visual speech information can influence the perception of acoustic speech, causing an illusory percept of a sound not present in the actual acoustic speech (the McGurk effect). We examined whether participants can voluntarily selectively attend to either the auditory or visual modality by instructing participants to pay attention to the information in one modality and to ignore competing information from the other modality. We also examined how performance under these instructions was affected by weakening the influence of the visual information by manipulating the temporal offset between the audio and video channels (experiment 1), and the spatial frequency information present in the video (experiment 2). Gaze behaviour was also monitored to examine whether attentional instructions influenced the gathering of visual information. While task instructions did have an influence on the observed integration of auditory and visual speech information, participants were unable to completely ignore conflicting information, particularly information from the visual stream. Manipulating temporal offset had a more pronounced interaction with task instructions than manipulating the amount of visual information. Participants' gaze behaviour suggests that the attended modality influences the gathering of visual information in audiovisual speech perception.
Effects of intelligibility on working memory demand for speech perception.
Francis, Alexander L; Nusbaum, Howard C
2009-08-01
Understanding low-intelligibility speech is effortful. In three experiments, we examined the effects of intelligibility on working memory (WM) demands imposed by perception of synthetic speech. In all three experiments, a primary speeded word recognition task was paired with a secondary WM-load task designed to vary the availability of WM capacity during speech perception. Speech intelligibility was varied either by training listeners to use available acoustic cues in a more diagnostic manner (as in Experiment 1) or by providing listeners with more informative acoustic cues (i.e., better speech quality, as in Experiments 2 and 3). In the first experiment, training significantly improved intelligibility and recognition speed; increasing WM load significantly slowed recognition. A significant interaction between training and load indicated that the benefit of training on recognition speed was observed only under low memory load. In subsequent experiments, listeners received no training; intelligibility was manipulated by changing synthesizers. Improving intelligibility without training improved recognition accuracy, and increasing memory load still decreased it, but more intelligible speech did not produce more efficient use of available WM capacity. This suggests that perceptual learning modifies the way available capacity is used, perhaps by increasing the use of more phonetically informative features and/or by decreasing use of less informative ones.
The effect of visual context on manual localization of remembered targets
NASA Technical Reports Server (NTRS)
Barry, S. R.; Bloomberg, J. J.; Huebner, W. P.
1997-01-01
This paper examines the contribution of egocentric cues and visual context to manual localization of remembered targets. Subjects pointed in the dark to the remembered position of a target previously viewed without or within a structured visual scene. Without a remembered visual context, subjects pointed to within 2 degrees of the target. The presence of a visual context with cues of straight ahead enhanced pointing performance to the remembered location of central but not off-center targets. Thus, visual context provides strong visual cues of target position and the relationship of body position to target location. Without a visual context, egocentric cues provide sufficient input for accurate pointing to remembered targets.
Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann
2011-01-01
During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream--prior to its fusion with auditory phonological features [Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. Time course of early audiovisual interactions during speech and non-speech central-auditory processing: An MEG study. Journal of Cognitive Neuroscience, 21, 259-274, 2009]. Using functional magnetic resonance imaging, the present follow-up study aims to further elucidate the topographic distribution of visual-phonological operations and audiovisual (AV) interactions during speech perception. Ambiguous acoustic syllables--disambiguated to /pa/ or /ta/ by the visual channel (speaking face)--served as test materials, concomitant with various control conditions (nonspeech AV signals, visual-only and acoustic-only speech, and nonspeech stimuli). (i) Visual speech yielded an AV-subadditive activation of primary auditory cortex and the anterior superior temporal gyrus (STG), whereas the posterior STG responded both to speech and nonspeech motion. (ii) The inferior frontal and the fusiform gyrus of the right hemisphere showed a strong phonetic/phonological impact (differential effects of visual /pa/ vs. /ta/) upon hemodynamic activation during presentation of speaking faces. Taken together with the previous MEG data, these results point at a dual-pathway model of visual speech information processing: On the one hand, access to the auditory system via the anterior supratemporal “what" path may give rise to direct activation of "auditory objects." On the other hand, visual speech information seems to be represented in a right-hemisphere visual working memory, providing a potential basis for later interactions with auditory information such as the McGurk effect.
Cortical Integration of Audio-Visual Information
Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.
2013-01-01
We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442
Simultaneous acquisition of multiple auditory-motor transformations in speech
Rochet-Capellan, Amelie; Ostry, David J.
2011-01-01
The brain easily generates the movement that is needed in a given situation. Yet surprisingly, the results of experimental studies suggest that it is difficult to acquire more than one skill at a time. To do so, it has generally been necessary to link the required movement to arbitrary cues. In the present study, we show that speech motor learning provides an informative model for the acquisition of multiple sensorimotor skills. During training, subjects are required to repeat aloud individual words in random order while auditory feedback is altered in real-time in different ways for the different words. We find that subjects can quite readily and simultaneously modify their speech movements to correct for these different auditory transformations. This multiple learning occurs effortlessly without explicit cues and without any apparent awareness of the perturbation. The ability to simultaneously learn several different auditory-motor transformations is consistent with the idea that in speech motor learning, the brain acquires instance specific memories. The results support the hypothesis that speech motor learning is fundamentally local. PMID:21325534
The fMRI BOLD response to unisensory and multisensory smoking cues in nicotine-dependent adults
Cortese, Bernadette M.; Uhde, Thomas W.; Brady, Kathleen T.; McClernon, F. Joseph; Yang, Qing X.; Collins, Heather R.; LeMatty, Todd; Hartwell, Karen J.
2015-01-01
Given that the vast majority of functional magnetic resonance imaging (fMRI) studies of drug cue reactivity use unisensory visual cues, but that multisensory cues may elicit greater craving-related brain responses, the current study sought to compare the fMRI BOLD response to unisensory visual and multisensory, visual plus odor, smoking cues in 17 nicotine-dependent adult cigarette smokers. Brain activation to smoking-related, compared to neutral, pictures was assessed under cigarette smoke and odorless odor conditions. While smoking pictures elicited a pattern of activation consistent with the addiction literature, the multisensory (odor + picture) smoking cues elicited significantly greater and more widespread activation in mainly frontal and temporal regions. BOLD signal elicited by the multi-sensory, but not unisensory cues, was significantly related to participants’ level of control over craving as well. Results demonstrated that the co-presentation of cigarette smoke odor with smoking-related visual cues, compared to the visual cues alone, elicited greater levels of craving-related brain activation in key regions implicated in reward. These preliminary findings support future research aimed at a better understanding of multisensory integration of drug cues and craving. PMID:26475784
The fMRI BOLD response to unisensory and multisensory smoking cues in nicotine-dependent adults.
Cortese, Bernadette M; Uhde, Thomas W; Brady, Kathleen T; McClernon, F Joseph; Yang, Qing X; Collins, Heather R; LeMatty, Todd; Hartwell, Karen J
2015-12-30
Given that the vast majority of functional magnetic resonance imaging (fMRI) studies of drug cue reactivity use unisensory visual cues, but that multisensory cues may elicit greater craving-related brain responses, the current study sought to compare the fMRI BOLD response to unisensory visual and multisensory, visual plus odor, smoking cues in 17 nicotine-dependent adult cigarette smokers. Brain activation to smoking-related, compared to neutral, pictures was assessed under cigarette smoke and odorless odor conditions. While smoking pictures elicited a pattern of activation consistent with the addiction literature, the multisensory (odor+picture) smoking cues elicited significantly greater and more widespread activation in mainly frontal and temporal regions. BOLD signal elicited by the multisensory, but not unisensory cues, was significantly related to participants' level of control over craving as well. Results demonstrated that the co-presentation of cigarette smoke odor with smoking-related visual cues, compared to the visual cues alone, elicited greater levels of craving-related brain activation in key regions implicated in reward. These preliminary findings support future research aimed at a better understanding of multisensory integration of drug cues and craving. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Clark, Gavin I; Rock, Adam J; McKeith, Charles F A; Coventry, William L
2017-09-01
Poker-machine gamblers have been demonstrated to report increases in the urge to gamble following exposure to salient gambling cues. However, the processes which contribute to this urge to gamble remain to be understood. The present study aimed to investigate whether changes in the conscious experience of visual imagery, rationality and volitional control (over one's thoughts, images and attention) predicted changes in the urge to gamble following exposure to a gambling cue. Thirty-one regular poker-machine gamblers who reported at least low levels of problem gambling on the Problem Gambling Severity Index (PGSI), were recruited to complete an online cue-reactivity experiment. Participants completed the PGSI, the visual imagery, rationality and volitional control subscales of the Phenomenology of Consciousness Inventory (PCI), and a visual analogue scale (VAS) assessing urge to gamble. Participants completed the PCI subscales and VAS at baseline, following a neutral video cue and following a gambling video cue. Urge to gamble was found to significantly increase from neutral cue to gambling cue (while controlling for baseline urge) and this increase was predicted by PGSI score. After accounting for the effects of problem-gambling severity, cue-reactive visual imagery, rationality and volitional control significantly improved the prediction of cue-reactive urge to gamble. The small sample size and limited participant characteristic data restricts the generalizability of the findings. Nevertheless, this is the first study to demonstrate that changes in the subjective experience of visual imagery, volitional control and rationality predict changes in the urge to gamble from neutral to gambling cue. The results suggest that visual imagery, rationality and volitional control may play an important role in the experience of the urge to gamble in poker-machine gamblers.
ERIC Educational Resources Information Center
Krahmer, Emiel; Swerts, Marc
2007-01-01
Speakers employ acoustic cues (pitch accents) to indicate that a word is important, but may also use visual cues (beat gestures, head nods, eyebrow movements) for this purpose. Even though these acoustic and visual cues are related, the exact nature of this relationship is far from well understood. We investigate whether producing a visual beat…
Pinheiro, Ana P; Rezaii, Neguine; Nestor, Paul G; Rauber, Andréia; Spencer, Kevin M; Niznikiewicz, Margaret
2016-02-01
During speech comprehension, multiple cues need to be integrated at a millisecond speed, including semantic information, as well as voice identity and affect cues. A processing advantage has been demonstrated for self-related stimuli when compared with non-self stimuli, and for emotional relative to neutral stimuli. However, very few studies investigated self-other speech discrimination and, in particular, how emotional valence and voice identity interactively modulate speech processing. In the present study we probed how the processing of words' semantic valence is modulated by speaker's identity (self vs. non-self voice). Sixteen healthy subjects listened to 420 prerecorded adjectives differing in voice identity (self vs. non-self) and semantic valence (neutral, positive and negative), while electroencephalographic data were recorded. Participants were instructed to decide whether the speech they heard was their own (self-speech condition), someone else's (non-self speech), or if they were unsure. The ERP results demonstrated interactive effects of speaker's identity and emotional valence on both early (N1, P2) and late (Late Positive Potential - LPP) processing stages: compared with non-self speech, self-speech with neutral valence elicited more negative N1 amplitude, self-speech with positive valence elicited more positive P2 amplitude, and self-speech with both positive and negative valence elicited more positive LPP. ERP differences between self and non-self speech occurred in spite of similar accuracy in the recognition of both types of stimuli. Together, these findings suggest that emotion and speaker's identity interact during speech processing, in line with observations of partially dependent processing of speech and speaker information. Copyright © 2016. Published by Elsevier Inc.
Visual Features Involving Motion Seen from Airport Control Towers
NASA Technical Reports Server (NTRS)
Ellis, Stephen R.; Liston, Dorion
2010-01-01
Visual motion cues are used by tower controllers to support both visual and anticipated separation. Some of these cues are tabulated as part of the overall set of visual features used in towers to separate aircraft. An initial analyses of one motion cue, landing deceleration, is provided as a basis for evaluating how controllers detect and use it for spacing aircraft on or near the surface. Understanding cues like it will help determine if they can be safely used in a remote/virtual tower in which their presentation may be visually degraded.
Jordan, Timothy R; Sheen, Mercedes; Abedipour, Lily; Paterson, Kevin B
2014-01-01
When observing a talking face, it has often been argued that visual speech to the left and right of fixation may produce differences in performance due to divided projections to the two cerebral hemispheres. However, while it seems likely that such a division in hemispheric projections exists for areas away from fixation, the nature and existence of a functional division in visual speech perception at the foveal midline remains to be determined. We investigated this issue by presenting visual speech in matched hemiface displays to the left and right of a central fixation point, either exactly abutting the foveal midline or else located away from the midline in extrafoveal vision. The location of displays relative to the foveal midline was controlled precisely using an automated, gaze-contingent eye-tracking procedure. Visual speech perception showed a clear right hemifield advantage when presented in extrafoveal locations but no hemifield advantage (left or right) when presented abutting the foveal midline. Thus, while visual speech observed in extrafoveal vision appears to benefit from unilateral projections to left-hemisphere processes, no evidence was obtained to indicate that a functional division exists when visual speech is observed around the point of fixation. Implications of these findings for understanding visual speech perception and the nature of functional divisions in hemispheric projection are discussed.
PONS, FERRAN; ANDREU, LLORENC.; SANZ-TORRENT, MONICA; BUIL-LEGAZ, LUCIA; LEWKOWICZ, DAVID J.
2014-01-01
Speech perception involves the integration of auditory and visual articulatory information and, thus, requires the perception of temporal synchrony between this information. There is evidence that children with Specific Language Impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component followed the visual component. None of the groups perceived an audiovisual asynchrony of 366ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception. PMID:22874648
Pons, Ferran; Andreu, Llorenç; Sanz-Torrent, Monica; Buil-Legaz, Lucía; Lewkowicz, David J
2013-06-01
Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666 ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component preceded [corrected] the visual component. None of the groups perceived an audiovisual asynchrony of 366 ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception.
Rosemann, Stephanie; Thiel, Christiane M
2018-07-15
Hearing loss is associated with difficulties in understanding speech, especially under adverse listening conditions. In these situations, seeing the speaker improves speech intelligibility in hearing-impaired participants. On the neuronal level, previous research has shown cross-modal plastic reorganization in the auditory cortex following hearing loss leading to altered processing of auditory, visual and audio-visual information. However, how reduced auditory input effects audio-visual speech perception in hearing-impaired subjects is largely unknown. We here investigated the impact of mild to moderate age-related hearing loss on processing audio-visual speech using functional magnetic resonance imaging. Normal-hearing and hearing-impaired participants performed two audio-visual speech integration tasks: a sentence detection task inside the scanner and the McGurk illusion outside the scanner. Both tasks consisted of congruent and incongruent audio-visual conditions, as well as auditory-only and visual-only conditions. We found a significantly stronger McGurk illusion in the hearing-impaired participants, which indicates stronger audio-visual integration. Neurally, hearing loss was associated with an increased recruitment of frontal brain areas when processing incongruent audio-visual, auditory and also visual speech stimuli, which may reflect the increased effort to perform the task. Hearing loss modulated both the audio-visual integration strength measured with the McGurk illusion and brain activation in frontal areas in the sentence task, showing stronger integration and higher brain activation with increasing hearing loss. Incongruent compared to congruent audio-visual speech revealed an opposite brain activation pattern in left ventral postcentral gyrus in both groups, with higher activation in hearing-impaired participants in the incongruent condition. Our results indicate that already mild to moderate hearing loss impacts audio-visual speech processing accompanied by changes in brain activation particularly involving frontal areas. These changes are modulated by the extent of hearing loss. Copyright © 2018 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Lowenstein, Joanna H.; Nittrouer, Susan
2015-01-01
Purpose: One task of childhood involves learning to optimally weight acoustic cues in the speech signal in order to recover phonemic categories. This study examined the extent to which spectral degradation, as associated with cochlear implants, might interfere. The 3 goals were to measure, for adults and children, (a) cue weighting with spectrally…
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.
Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina
2015-07-01
It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line with the 'auditory-visual view' of auditory speech perception, which assumes that auditory speech recognition is optimized by using predictions from previously encoded speaker-specific audio-visual internal models. Copyright © 2015 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Katsioloudis, Petros; Jovanovic, Vukica; Jones, Mildred
2016-01-01
Several theorists believe that different types of visual cues influence cognition and behavior through learned associations; however, research provides inconsistent results. Considering this, a quasi-experimental study was done to determine if there are significant positive effects of visual cues (color blue) and to identify if a positive increase…
The Role of Visual Speech Information in Supporting Perceptual Learning of Degraded Speech
ERIC Educational Resources Information Center
Wayne, Rachel V.; Johnsrude, Ingrid S.
2012-01-01
Following cochlear implantation, hearing-impaired listeners must adapt to speech as heard through their prosthesis. Visual speech information (VSI; the lip and facial movements of speech) is typically available in everyday conversation. Here, we investigate whether learning to understand a popular auditory simulation of speech as transduced by a…
Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects.
Skoog Waller, Sara; Eriksson, Mårten
2016-01-01
The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics ( f 0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f 0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f 0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20-25, 40-45, and 60-65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers' age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency ( f 0 ) and speech rate when attempting to sound younger and decreased f 0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f 0 , as a cue to speaker age. It was concluded that age disguise by voice can be achieved by naïve speakers even though the perceived effect was smaller than intended.
Feijoo, Sara; Muñoz, Carmen; Amadó, Anna; Serrat, Elisabet
2017-01-01
One of the most important tasks in first language development is assigning words to their grammatical category. The Semantic Bootstrapping Hypothesis postulates that, in order to accomplish this task, children are guided by a neat correspondence between semantic and grammatical categories, since nouns typically refer to objects and verbs to actions. It is this correspondence that guides children's initial word categorization. Other approaches, on the other hand, suggest that children might make use of distributional cues and word contexts to accomplish the word categorization task. According to such approaches, the Semantic Bootstrapping assumption offers an important limitation, as it might not be true that all the nouns that children hear refer to specific objects or people. In order to explore that, we carried out two studies based on analyses of children's linguistic input. We analyzed child-directed speech addressed to four children under the age of 2;6, taken from the CHILDES database. The corpora were selected from the Manchester corpus. The corpora from the four selected children contained a total of 10,681 word types and 364,196 word tokens. In our first study, discriminant analyses were performed using semantic cues alone. The results show that many of the nouns found in parents' speech do not relate to specific objects and that semantic information alone might not be sufficient for successful word categorization. Given that there must be an additional source of information which, alongside with semantics, might assist young learners in word categorization, our second study explores the availability of both distributional and semantic cues in child-directed speech. Our results confirm that this combination might yield better results for word categorization. These results are in line with theories that suggest the need for an integration of multiple cues from different sources in language development.
Role of somatosensory and vestibular cues in attenuating visually induced human postural sway
NASA Technical Reports Server (NTRS)
Peterka, Robert J.; Benolken, Martha S.
1993-01-01
The purpose was to determine the contribution of visual, vestibular, and somatosensory cues to the maintenance of stance in humans. Postural sway was induced by full field, sinusoidal visual surround rotations about an axis at the level of the ankle joints. The influences of vestibular and somatosensory cues were characterized by comparing postural sway in normal and bilateral vestibular absent subjects in conditions that provided either accurate or inaccurate somatosensory orientation information. In normal subjects, the amplitude of visually induced sway reached a saturation level as stimulus amplitude increased. The saturation amplitude decreased with increasing stimulus frequency. No saturation phenomena was observed in subjects with vestibular loss, implying that vestibular cues were responsible for the saturation phenomenon. For visually induced sways below the saturation level, the stimulus-response curves for both normal and vestibular loss subjects were nearly identical implying that (1) normal subjects were not using vestibular information to attenuate their visually induced sway, possibly because sway was below a vestibular-related threshold level, and (2) vestibular loss subjects did not utilize visual cues to a greater extent than normal subjects; that is, a fundamental change in visual system 'gain' was not used to compensate for a vestibular deficit. An unexpected finding was that the amplitude of body sway induced by visual surround motion could be almost three times greater than the amplitude of the visual stimulus in normals and vestibular loss subjects. This occurred in conditions where somatosensory cues were inaccurate and at low stimulus amplitudes. A control system model of visually induced postural sway was developed to explain this finding. For both subject groups, the amplitude of visually induced sway was smaller by a factor of about four in tests where somatosensory cues provided accurate versus inaccurate orientation information. This implied that (1) the vestibular loss subjects did not utilize somatosensory cues to a greater extent than normal subjects; that is, changes in somatosensory system 'gain' were not used to compensate for a vestibular deficit, and (2) the threshold for the use of vestibular cues in normals was apparently lower in test conditions where somatosensory cues were providing accurate orientation information.
The effect of contextual cues on the encoding of motor memories.
Howard, Ian S; Wolpert, Daniel M; Franklin, David W
2013-05-01
Several studies have shown that sensory contextual cues can reduce the interference observed during learning of opposing force fields. However, because each study examined a small set of cues, often in a unique paradigm, the relative efficacy of different sensory contextual cues is unclear. In the present study we quantify how seven contextual cues, some investigated previously and some novel, affect the formation and recall of motor memories. Subjects made movements in a velocity-dependent curl field, with direction varying randomly from trial to trial but always associated with a unique contextual cue. Linking field direction to the cursor or background color, or to peripheral visual motion cues, did not reduce interference. In contrast, the orientation of a visual object attached to the hand cursor significantly reduced interference, albeit by a small amount. When the fields were associated with movement in different locations in the workspace, a substantial reduction in interference was observed. We tested whether this reduction in interference was due to the different locations of the visual feedback (targets and cursor) or the movements (proprioceptive). When the fields were associated only with changes in visual display location (movements always made centrally) or only with changes in the movement location (visual feedback always displayed centrally), a substantial reduction in interference was observed. These results show that although some visual cues can lead to the formation and recall of distinct representations in motor memory, changes in spatial visual and proprioceptive states of the movement are far more effective than changes in simple visual contextual cues.
Domain general learning: Infants use social and non-social cues when learning object statistics
Barry, Ryan A.; Graf Estes, Katharine; Rivera, Susan M.
2015-01-01
Previous research has shown that infants can learn from social cues. But is a social cue more effective at directing learning than a non-social cue? This study investigated whether 9-month-old infants (N = 55) could learn a visual statistical regularity in the presence of a distracting visual sequence when attention was directed by either a social cue (a person) or a non-social cue (a rectangle). The results show that both social and non-social cues can guide infants’ attention to a visual shape sequence (and away from a distracting sequence). The social cue more effectively directed attention than the non-social cue during the familiarization phase, but the social cue did not result in significantly stronger learning than the non-social cue. The findings suggest that domain general attention mechanisms allow for the comparable learning seen in both conditions. PMID:25999879
Vatakis, Argiro; Maragos, Petros; Rodomagoulakis, Isidoros; Spence, Charles
2012-01-01
We investigated how the physical differences associated with the articulation of speech affect the temporal aspects of audiovisual speech perception. Video clips of consonants and vowels uttered by three different speakers were presented. The video clips were analyzed using an auditory-visual signal saliency model in order to compare signal saliency and behavioral data. Participants made temporal order judgments (TOJs) regarding which speech-stream (auditory or visual) had been presented first. The sensitivity of participants' TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. We expected that in the case of the place of articulation and roundedness, where the visual-speech signal is more salient, temporal perception of speech would be modulated by the visual-speech signal. No such effect was expected for the manner of articulation or height. The results demonstrate that for place and manner of articulation, participants' temporal percept was affected (although not always significantly) by highly-salient speech-signals with the visual-signals requiring smaller visual-leads at the PSS. This was not the case when height was evaluated. These findings suggest that in the case of audiovisual speech perception, a highly salient visual-speech signal may lead to higher probabilities regarding the identity of the auditory-signal that modulate the temporal window of multisensory integration of the speech-stimulus. PMID:23060756
SUBTHALAMIC NUCLEUS NEURONS DIFFERENTIALLY ENCODE EARLY AND LATE ASPECTS OF SPEECH PRODUCTION.
Lipski, W J; Alhourani, A; Pirnia, T; Jones, P W; Dastolfo-Hromack, C; Helou, L B; Crammond, D J; Shaiman, S; Dickey, M W; Holt, L L; Turner, R S; Fiez, J A; Richardson, R M
2018-05-22
Basal ganglia-thalamocortical loops mediate all motor behavior, yet little detail is known about the role of basal ganglia nuclei in speech production. Using intracranial recording during deep brain stimulation surgery in humans with Parkinson's disease, we tested the hypothesis that the firing rate of subthalamic nucleus neurons is modulated in sync with motor execution aspects of speech. Nearly half of seventy-nine unit recordings exhibited firing rate modulation, during a syllable reading task across twelve subjects (male and female). Trial-to-trial timing of changes in subthalamic neuronal activity, relative to cue onset versus production onset, revealed that locking to cue presentation was associated more with units that decreased firing rate, while locking to speech onset was associated more with units that increased firing rate. These unique data indicate that subthalamic activity is dynamic during the production of speech, reflecting temporally-dependent inhibition and excitation of separate populations of subthalamic neurons. SIGNIFICANCE STATEMENT The basal ganglia are widely assumed to participate in speech production, yet no prior studies have reported detailed examination of speech-related activity in basal ganglia nuclei. Using microelectrode recordings from the subthalamic nucleus during a single syllable reading task, in awake humans undergoing deep brain stimulation implantation surgery, we show that the firing rate of subthalamic nucleus neurons is modulated in response to motor execution aspects of speech. These results are the first to establish a role for subthalamic nucleus neurons in encoding of aspects of speech production, and they lay the groundwork for launching a modern subfield to explore basal ganglia function in human speech. Copyright © 2018 the authors.
Cross-Sensory Transfer of Reference Frames in Spatial Memory
ERIC Educational Resources Information Center
Kelly, Jonathan W.; Avraamides, Marios N.
2011-01-01
Two experiments investigated whether visual cues influence spatial reference frame selection for locations learned through touch. Participants experienced visual cues emphasizing specific environmental axes and later learned objects through touch. Visual cues were manipulated and haptic learning conditions were held constant. Imagined perspective…
ERIC Educational Resources Information Center
Drijvers, Linda; Ozyurek, Asli
2017-01-01
Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…
Attentional bias to food-related visual cues: is there a role in obesity?
Doolan, K J; Breslin, G; Hanna, D; Gallagher, A M
2015-02-01
The incentive sensitisation model of obesity suggests that modification of the dopaminergic associated reward systems in the brain may result in increased awareness of food-related visual cues present in the current food environment. Having a heightened awareness of these visual food cues may impact on food choices and eating behaviours with those being most aware of or demonstrating greater attention to food-related stimuli potentially being at greater risk of overeating and subsequent weight gain. To date, research related to attentional responses to visual food cues has been both limited and conflicting. Such inconsistent findings may in part be explained by the use of different methodological approaches to measure attentional bias and the impact of other factors such as hunger levels, energy density of visual food cues and individual eating style traits that may influence visual attention to food-related cues outside of weight status alone. This review examines the various methodologies employed to measure attentional bias with a particular focus on the role that attentional processing of food-related visual cues may have in obesity. Based on the findings of this review, it appears that it may be too early to clarify the role visual attention to food-related cues may have in obesity. Results however highlight the importance of considering the most appropriate methodology to use when measuring attentional bias and the characteristics of the study populations targeted while interpreting results to date and in designing future studies.
Subconscious Visual Cues during Movement Execution Allow Correct Online Choice Reactions
Leukel, Christian; Lundbye-Jensen, Jesper; Christensen, Mark Schram; Gollhofer, Albert; Nielsen, Jens Bo; Taube, Wolfgang
2012-01-01
Part of the sensory information is processed by our central nervous system without conscious perception. Subconscious processing has been shown to be capable of triggering motor reactions. In the present study, we asked the question whether visual information, which is not consciously perceived, could influence decision-making in a choice reaction task. Ten healthy subjects (28±5 years) executed two different experimental protocols. In the Motor reaction protocol, a visual target cue was shown on a computer screen. Depending on the displayed cue, subjects had to either complete a reaching movement (go-condition) or had to abort the movement (stop-condition). The cue was presented with different display durations (20–160 ms). In the second Verbalization protocol, subjects verbalized what they experienced on the screen. Again, the cue was presented with different display durations. This second protocol tested for conscious perception of the visual cue. The results of this study show that subjects achieved significantly more correct responses in the Motor reaction protocol than in the Verbalization protocol. This difference was only observed at the very short display durations of the visual cue. Since correct responses in the Verbalization protocol required conscious perception of the visual information, our findings imply that the subjects performed correct motor responses to visual cues, which they were not conscious about. It is therefore concluded that humans may reach decisions based on subconscious visual information in a choice reaction task. PMID:23049749
Children and Adults Integrate Talker and Verb Information in Online Processing
ERIC Educational Resources Information Center
Borovsky, Arielle; Creel, Sarah C.
2014-01-01
Children seem able to efficiently interpret a variety of linguistic cues during speech comprehension, yet have difficulty interpreting sources of nonlinguistic and paralinguistic information that accompany speech. The current study asked whether (paralinguistic) voice-activated role knowledge is rapidly interpreted in coordination with a…
Heimbauer, Lisa A; Antworth, Rebecca L; Owren, Michael J
2012-01-01
Nonhuman primates appear to capitalize more effectively on visual cues than corresponding auditory versions. For example, studies of inferential reasoning have shown that monkeys and apes readily respond to seeing that food is present ("positive" cuing) or absent ("negative" cuing). Performance is markedly less effective with auditory cues, with many subjects failing to use this input. Extending recent work, we tested eight captive tufted capuchins (Cebus apella) in locating food using positive and negative cues in visual and auditory domains. The monkeys chose between two opaque cups to receive food contained in one of them. Cup contents were either shown or shaken, providing location cues from both cups, positive cues only from the baited cup, or negative cues from the empty cup. As in previous work, subjects readily used both positive and negative visual cues to secure reward. However, auditory outcomes were both similar to and different from those of earlier studies. Specifically, all subjects came to exploit positive auditory cues, but none responded to negative versions. The animals were also clearly different in visual versus auditory performance. Results indicate that a significant proportion of capuchins may be able to use positive auditory cues, with experience and learning likely playing a critical role. These findings raise the possibility that experience may be significant in visually based performance in this task as well, and highlight that coming to grips with evident differences between visual versus auditory processing may be important for understanding primate cognition more generally.
Alpha Oscillatory Dynamics Index Temporal Expectation Benefits in Working Memory.
Wilsch, Anna; Henry, Molly J; Herrmann, Björn; Maess, Burkhard; Obleser, Jonas
2015-07-01
Enhanced alpha power compared with a baseline can reflect states of increased cognitive load, for example, when listening to speech in noise. Can knowledge about "when" to listen (temporal expectations) potentially counteract cognitive load and concomitantly reduce alpha? The current magnetoencephalography (MEG) experiment induced cognitive load using an auditory delayed-matching-to-sample task with 2 syllables S1 and S2 presented in speech-shaped noise. Temporal expectation about the occurrence of S1 was manipulated in 3 different cue conditions: "Neutral" (uninformative about foreperiod), "early-cued" (short foreperiod), and "late-cued" (long foreperiod). Alpha power throughout the trial was highest when the cue was uninformative about the onset time of S1 (neutral) and lowest for the late-cued condition. This alpha-reducing effect of late compared with neutral cues was most evident during memory retention in noise and originated primarily in the right insula. Moreover, individual alpha effects during retention accounted best for observed individual performance differences between late-cued and neutral conditions, indicating a tradeoff between allocation of neural resources and the benefits drawn from temporal cues. Overall, the results indicate that temporal expectations can facilitate the encoding of speech in noise, and concomitantly reduce neural markers of cognitive load. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Speech Presentation Cues Moderate Frontal EEG Asymmetry in Socially Withdrawn Young Adults
Cole, Claire; Zapp, Daniel J.; Nelson, S. Katherine; Pérez-Edgar, Koraly
2011-01-01
Socially withdrawn individuals display solitary behavior across wide contexts with both unfamiliar and familiar peers. This tendency to withdraw may be driven by either past or anticipated negative social encounters. In addition, socially withdrawn individuals often exhibit right frontal electroencephalogram (EEG) asymmetry at baseline and when under stress. In the current study we examined shifts in frontal EEG activity in young adults (N=41) at baseline, as they viewed either an anxiety-provoking or a benign speech video, and as they subsequently prepared for their own speech. Results indicated that right frontal EEG activity increased, relative to the left, only for socially withdrawn participants exposed to the anxious video. These results suggest that contextual affective cues may prime an individual’s response to stress, particularly if they illustrate or substantiate an anticipated negative event. PMID:22169714
Differential processing of binocular and monocular gloss cues in human visual cortex
Di Luca, Massimiliano; Ban, Hiroshi; Muryy, Alexander; Fleming, Roland W.
2016-01-01
The visual impression of an object's surface reflectance (“gloss”) relies on a range of visual cues, both monocular and binocular. Whereas previous imaging work has identified processing within ventral visual areas as important for monocular cues, little is known about cortical areas involved in processing binocular cues. Here, we used human functional MRI (fMRI) to test for brain areas selectively involved in the processing of binocular cues. We manipulated stereoscopic information to create four conditions that differed in their disparity structure and in the impression of surface gloss that they evoked. We performed multivoxel pattern analysis to find areas whose fMRI responses allow classes of stimuli to be distinguished based on their depth structure vs. material appearance. We show that higher dorsal areas play a role in processing binocular gloss information, in addition to known ventral areas involved in material processing, with ventral area lateral occipital responding to both object shape and surface material properties. Moreover, we tested for similarities between the representation of gloss from binocular cues and monocular cues. Specifically, we tested for transfer in the decoding performance of an algorithm trained on glossy vs. matte objects defined by either binocular or by monocular cues. We found transfer effects from monocular to binocular cues in dorsal visual area V3B/kinetic occipital (KO), suggesting a shared representation of the two cues in this area. These results indicate the involvement of mid- to high-level visual circuitry in the estimation of surface material properties, with V3B/KO potentially playing a role in integrating monocular and binocular cues. PMID:26912596
Auditory Emotional Cues Enhance Visual Perception
ERIC Educational Resources Information Center
Zeelenberg, Rene; Bocanegra, Bruno R.
2010-01-01
Recent studies show that emotional stimuli impair performance to subsequently presented neutral stimuli. Here we show a cross-modal perceptual enhancement caused by emotional cues. Auditory cue words were followed by a visually presented neutral target word. Two-alternative forced-choice identification of the visual target was improved by…
Loiselle, Louise H; Dorman, Michael F; Yost, William A; Cook, Sarah J; Gifford, Rene H
2016-08-01
To assess the role of interaural time differences and interaural level differences in (a) sound-source localization, and (b) speech understanding in a cocktail party listening environment for listeners with bilateral cochlear implants (CIs) and for listeners with hearing-preservation CIs. Eleven bilateral listeners with MED-EL (Durham, NC) CIs and 8 listeners with hearing-preservation CIs with symmetrical low frequency, acoustic hearing using the MED-EL or Cochlear device were evaluated using 2 tests designed to task binaural hearing, localization, and a simulated cocktail party. Access to interaural cues for localization was constrained by the use of low-pass, high-pass, and wideband noise stimuli. Sound-source localization accuracy for listeners with bilateral CIs in response to the high-pass noise stimulus and sound-source localization accuracy for the listeners with hearing-preservation CIs in response to the low-pass noise stimulus did not differ significantly. Speech understanding in a cocktail party listening environment improved for all listeners when interaural cues, either interaural time difference or interaural level difference, were available. The findings of the current study indicate that similar degrees of benefit to sound-source localization and speech understanding in complex listening environments are possible with 2 very different rehabilitation strategies: the provision of bilateral CIs and the preservation of hearing.
2012-06-01
a listener uses to interpret the auditory environment is interaural difference cues. Interaural difference cues are perceived binaurally , and they...signal in noise is not enough for accurate localization performance. Instead, it appears that both audibility and binaural signal processing of both...be interpreted differently among researchers. 4. Conclusions Accurately processed and interpreted binaural and monaural spatial cues enable a
Normal-Hearing Listeners’ and Cochlear Implant Users’ Perception of Pitch Cues in Emotional Speech
Fuller, Christina; Gilbers, Dicky; Broersma, Mirjam; Goudbeek, Martijn; Free, Rolien; Başkent, Deniz
2015-01-01
In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study’s aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings’ pitch cues were phonetically analyzed. The recordings were used to test 20 normal-hearing listeners’ and 20 CI users’ emotion recognition. In congruence with previous studies, high-arousal emotions had a higher mean pitch, wider pitch range, and more dominant pitches than low-arousal emotions. Regarding pitch, speakers did not differentiate emotions based on valence but on arousal. Normal-hearing listeners outperformed CI users in emotion recognition, even when presented with CI simulated stimuli. However, only normal-hearing listeners recognized one particular actor’s emotions worse than the other actors’. The groups behaved differently when presented with similar input, showing that they had to employ differing strategies. Considering the respective speaker’s deviating pronunciation, it appears that for normal-hearing listeners, mean pitch is a more salient cue than pitch range, whereas CI users are biased toward pitch range cues. PMID:27648210
Taylor, Adrian; Katomeri, Magdalena
2006-01-01
A review and meta-analysis by Hamer et al. (2006) showed that a single session of exercise can attenuate post-exercise blood pressure (BP) responses to stress, but no studies examined the effects among smokers or with brisk walking. Healthy volunteers (n=60), averaging 28 years of age and smoking 15 cigarettes daily, abstained from smoking for 2 h before being randomly assigned to a 15-min brisk semi-self-paced walk or passive control condition. Subject characteristics, typical smoking cue-elicited cravings and BP were assessed at baseline. After each condition, BP was assessed before and after three psycho-social stressors were carried out: (1) computerised Stroop word-colour interference task, (2) speech task and (3) only handling a lit cigarette. A two-way mixed ANCOVA (controlling for baseline) revealed a significant overall interaction effect for time by condition for both systolic blood pressure (SBP) and diastolic blood pressure (DBP). Univariate ANCOVAs (to compare between-groups post-stressor BP, controlling for pre-stressor BP) revealed that exercise attenuated systolic BP and diastolic BP responses to the Stroop and speech tasks and SBP to the lit cigarette equivalent to an attenuated SBP and DBP of up to 3.8 mmHg. Post-exercise attenuation effects were moderated by resting blood pressure and self-reported smoking cue-elicited craving. Effects were strongest among those with higher blood pressure and smokers who reported typically stronger cravings when faced with smoking cues. Blood pressure responses to the lit cigarette were not associated with responses to the Stroop and speech task. A self-paced 15-min walk can reduce smokers' SBP and DBP responses to stress, of a magnitude similar on average to non-smokers.
Segmenting words from natural speech: subsegmental variation in segmental cues.
Rytting, C Anton; Brew, Chris; Fosler-Lussier, Eric
2010-06-01
Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation. One finding is that high levels of phonetic variability degrade the model's performance. While robustness to phonetic variability may be intrinsically valuable, this finding needs to be complemented by parallel studies of the actual abilities of children to segment phonetically variable speech.
Visual-auditory integration during speech imitation in autism.
Williams, Justin H G; Massaro, Dominic W; Peel, Natalie J; Bosseler, Alexis; Suddendorf, Thomas
2004-01-01
Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional 'mirror neuron' systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a 'virtual' head (Baldi), delivered speech stimuli for identification in auditory, visual or bimodal conditions. Children with ASD were poorer than controls at recognizing stimuli in the unimodal conditions, but once performance on this measure was controlled for, no group difference was found in the bimodal condition. A group of participants with ASD were also trained to develop their speech-reading ability. Training improved visual accuracy and this also improved the children's ability to utilize visual information in their processing of speech. Overall results were compared to predictions from mathematical models based on integration and non-integration, and were most consistent with the integration model. We conclude that, whilst they are less accurate in recognizing stimuli in the unimodal condition, children with ASD show normal integration of visual and auditory speech stimuli. Given that training in recognition of visual speech was effective, children with ASD may benefit from multi-modal approaches in imitative therapy and language training.
Visual Feedback of Tongue Movement for Novel Speech Sound Learning
Katz, William F.; Mehta, Sonya
2015-01-01
Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one's own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker's learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA) was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ/; a voiced, coronal, palatal stop) before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers' productions were evaluated using kinematic (tongue-tip spatial positioning) and acoustic (burst spectra) measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing. PMID:26635571
Auditory emotional cues enhance visual perception.
Zeelenberg, René; Bocanegra, Bruno R
2010-04-01
Recent studies show that emotional stimuli impair performance to subsequently presented neutral stimuli. Here we show a cross-modal perceptual enhancement caused by emotional cues. Auditory cue words were followed by a visually presented neutral target word. Two-alternative forced-choice identification of the visual target was improved by emotional cues as compared to neutral cues. When the cue was presented visually we replicated the emotion-induced impairment found in other studies. Our results suggest emotional stimuli have a twofold effect on perception. They impair perception by reflexively attracting attention at the expense of competing stimuli. However, emotional stimuli also induce a nonspecific perceptual enhancement that carries over onto other stimuli when competition is reduced, for example, by presenting stimuli in different modalities. Copyright 2009 Elsevier B.V. All rights reserved.
Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J
2011-07-26
A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.
Training to Improve Hearing Speech in Noise: Biological Mechanisms
Song, Judy H.; Skoe, Erika; Banai, Karen
2012-01-01
We investigated training-related improvements in listening in noise and the biological mechanisms mediating these improvements. Training-related malleability was examined using a program that incorporates cognitively based listening exercises to improve speech-in-noise perception. Before and after training, auditory brainstem responses to a speech syllable were recorded in quiet and multitalker noise from adults who ranged in their speech-in-noise perceptual ability. Controls did not undergo training but were tested at intervals equivalent to the trained subjects. Trained subjects exhibited significant improvements in speech-in-noise perception that were retained 6 months later. Subcortical responses in noise demonstrated training-related enhancements in the encoding of pitch-related cues (the fundamental frequency and the second harmonic), particularly for the time-varying portion of the syllable that is most vulnerable to perceptual disruption (the formant transition region). Subjects with the largest strength of pitch encoding at pretest showed the greatest perceptual improvement. Controls exhibited neither neurophysiological nor perceptual changes. We provide the first demonstration that short-term training can improve the neural representation of cues important for speech-in-noise perception. These results implicate and delineate biological mechanisms contributing to learning success, and they provide a conceptual advance to our understanding of the kind of training experiences that can influence sensory processing in adulthood. PMID:21799207
Phase-Locked Responses to Speech in Human Auditory Cortex are Enhanced During Comprehension
Peelle, Jonathan E.; Gross, Joachim; Davis, Matthew H.
2013-01-01
A growing body of evidence shows that ongoing oscillations in auditory cortex modulate their phase to match the rhythm of temporally regular acoustic stimuli, increasing sensitivity to relevant environmental cues and improving detection accuracy. In the current study, we test the hypothesis that nonsensory information provided by linguistic content enhances phase-locked responses to intelligible speech in the human brain. Sixteen adults listened to meaningful sentences while we recorded neural activity using magnetoencephalography. Stimuli were processed using a noise-vocoding technique to vary intelligibility while keeping the temporal acoustic envelope consistent. We show that the acoustic envelopes of sentences contain most power between 4 and 7 Hz and that it is in this frequency band that phase locking between neural activity and envelopes is strongest. Bilateral oscillatory neural activity phase-locked to unintelligible speech, but this cerebro-acoustic phase locking was enhanced when speech was intelligible. This enhanced phase locking was left lateralized and localized to left temporal cortex. Together, our results demonstrate that entrainment to connected speech does not only depend on acoustic characteristics, but is also affected by listeners’ ability to extract linguistic information. This suggests a biological framework for speech comprehension in which acoustic and linguistic cues reciprocally aid in stimulus prediction. PMID:22610394
Phase-locked responses to speech in human auditory cortex are enhanced during comprehension.
Peelle, Jonathan E; Gross, Joachim; Davis, Matthew H
2013-06-01
A growing body of evidence shows that ongoing oscillations in auditory cortex modulate their phase to match the rhythm of temporally regular acoustic stimuli, increasing sensitivity to relevant environmental cues and improving detection accuracy. In the current study, we test the hypothesis that nonsensory information provided by linguistic content enhances phase-locked responses to intelligible speech in the human brain. Sixteen adults listened to meaningful sentences while we recorded neural activity using magnetoencephalography. Stimuli were processed using a noise-vocoding technique to vary intelligibility while keeping the temporal acoustic envelope consistent. We show that the acoustic envelopes of sentences contain most power between 4 and 7 Hz and that it is in this frequency band that phase locking between neural activity and envelopes is strongest. Bilateral oscillatory neural activity phase-locked to unintelligible speech, but this cerebro-acoustic phase locking was enhanced when speech was intelligible. This enhanced phase locking was left lateralized and localized to left temporal cortex. Together, our results demonstrate that entrainment to connected speech does not only depend on acoustic characteristics, but is also affected by listeners' ability to extract linguistic information. This suggests a biological framework for speech comprehension in which acoustic and linguistic cues reciprocally aid in stimulus prediction.
The Effects of Explicit Visual Cues in Reading Biological Diagrams
ERIC Educational Resources Information Center
Ge, Yun-Ping; Unsworth, Len; Wang, Kuo-Hua
2017-01-01
Drawing on cognitive theories, this study intends to investigate the effects of explicit visual cues which have been proposed as a critical factor in facilitating understanding of biological images. Three diagrams from Taiwanese textbooks with implicit visual cues, involving the concepts of biological classification systems, fish taxonomy, and…
Visual Navigation during Colony Emigration by the Ant Temnothorax rugatulus
Bowens, Sean R.; Glatt, Daniel P.; Pratt, Stephen C.
2013-01-01
Many ants rely on both visual cues and self-generated chemical signals for navigation, but their relative importance varies across species and context. We evaluated the roles of both modalities during colony emigration by Temnothorax rugatulus. Colonies were induced to move from an old nest in the center of an arena to a new nest at the arena edge. In the midst of the emigration the arena floor was rotated 60°around the old nest entrance, thus displacing any substrate-bound odor cues while leaving visual cues unchanged. This manipulation had no effect on orientation, suggesting little influence of substrate cues on navigation. When this rotation was accompanied by the blocking of most visual cues, the ants became highly disoriented, suggesting that they did not fall back on substrate cues even when deprived of visual information. Finally, when the substrate was left in place but the visual surround was rotated, the ants' subsequent headings were strongly rotated in the same direction, showing a clear role for visual navigation. Combined with earlier studies, these results suggest that chemical signals deposited by Temnothorax ants serve more for marking of familiar territory than for orientation. The ants instead navigate visually, showing the importance of this modality even for species with small eyes and coarse visual acuity. PMID:23671713
Charboneau, Evonne J.; Dietrich, Mary S.; Park, Sohee; Cao, Aize; Watkins, Tristan J; Blackford, Jennifer U; Benningfield, Margaret M.; Martin, Peter R.; Buchowski, Maciej S.; Cowan, Ronald L.
2013-01-01
Craving is a major motivator underlying drug use and relapse but the neural correlates of cannabis craving are not well understood. This study sought to determine whether visual cannabis cues increase cannabis craving and whether cue-induced craving is associated with regional brain activation in cannabis-dependent individuals. Cannabis craving was assessed in 16 cannabis-dependent adult volunteers while they viewed cannabis cues during a functional MRI (fMRI) scan. The Marijuana Craving Questionnaire was administered immediately before and after each of three cannabis cue-exposure fMRI runs. FMRI blood-oxygenation-level-dependent (BOLD) signal intensity was determined in regions activated by cannabis cues to examine the relationship of regional brain activation to cannabis craving. Craving scores increased significantly following exposure to visual cannabis cues. Visual cues activated multiple brain regions, including inferior orbital frontal cortex, posterior cingulate gyrus, parahippocampal gyrus, hippocampus, amygdala, superior temporal pole, and occipital cortex. Craving scores at baseline and at the end of all three runs were significantly correlated with brain activation during the first fMRI run only, in the limbic system (including amygdala and hippocampus) and paralimbic system (superior temporal pole), and visual regions (occipital cortex). Cannabis cues increased craving in cannabis-dependent individuals and this increase was associated with activation in the limbic, paralimbic, and visual systems during the first fMRI run, but not subsequent fMRI runs. These results suggest that these regions may mediate visually cued aspects of drug craving. This study provides preliminary evidence for the neural basis of cue-induced cannabis craving and suggests possible neural targets for interventions targeted at treating cannabis dependence. PMID:24035535
The Impact of Early Bilingualism on Face Recognition Processes.
Kandel, Sonia; Burfin, Sabine; Méary, David; Ruiz-Tada, Elisa; Costa, Albert; Pascalis, Olivier
2016-01-01
Early linguistic experience has an impact on the way we decode audiovisual speech in face-to-face communication. The present study examined whether differences in visual speech decoding could be linked to a broader difference in face processing. To identify a phoneme we have to do an analysis of the speaker's face to focus on the relevant cues for speech decoding (e.g., locating the mouth with respect to the eyes). Face recognition processes were investigated through two classic effects in face recognition studies: the Other-Race Effect (ORE) and the Inversion Effect. Bilingual and monolingual participants did a face recognition task with Caucasian faces (own race), Chinese faces (other race), and cars that were presented in an Upright or Inverted position. The results revealed that monolinguals exhibited the classic ORE. Bilinguals did not. Overall, bilinguals were slower than monolinguals. These results suggest that bilinguals' face processing abilities differ from monolinguals'. Early exposure to more than one language may lead to a perceptual organization that goes beyond language processing and could extend to face analysis. We hypothesize that these differences could be due to the fact that bilinguals focus on different parts of the face than monolinguals, making them more efficient in other race face processing but slower. However, more studies using eye-tracking techniques are necessary to confirm this explanation.
Toward semantic-based retrieval of visual information: a model-based approach
NASA Astrophysics Data System (ADS)
Park, Youngchoon; Golshani, Forouzan; Panchanathan, Sethuraman
2002-07-01
This paper center around the problem of automated visual content classification. To enable classification based image or visual object retrieval, we propose a new image representation scheme called visual context descriptor (VCD) that is a multidimensional vector in which each element represents the frequency of a unique visual property of an image or a region. VCD utilizes the predetermined quality dimensions (i.e., types of features and quantization level) and semantic model templates mined in priori. Not only observed visual cues, but also contextually relevant visual features are proportionally incorporated in VCD. Contextual relevance of a visual cue to a semantic class is determined by using correlation analysis of ground truth samples. Such co-occurrence analysis of visual cues requires transformation of a real-valued visual feature vector (e.g., color histogram, Gabor texture, etc.,) into a discrete event (e.g., terms in text). Good-feature to track, rule of thirds, iterative k-means clustering and TSVQ are involved in transformation of feature vectors into unified symbolic representations called visual terms. Similarity-based visual cue frequency estimation is also proposed and used for ensuring the correctness of model learning and matching since sparseness of sample data causes the unstable results of frequency estimation of visual cues. The proposed method naturally allows integration of heterogeneous visual or temporal or spatial cues in a single classification or matching framework, and can be easily integrated into a semantic knowledge base such as thesaurus, and ontology. Robust semantic visual model template creation and object based image retrieval are demonstrated based on the proposed content description scheme.
On Older Listeners' Ability to Perceive Dynamic Pitch
ERIC Educational Resources Information Center
Shen, Jing; Wright, Richard; Souza, Pamela E.
2016-01-01
Purpose: Natural speech comes with variation in pitch, which serves as an important cue for speech recognition. The present study investigated older listeners' dynamic pitch perception with a focus on interindividual variability. In particular, we asked whether some of the older listeners' inability to perceive dynamic pitch stems from the higher…
Flexibility in Statistical Word Segmentation: Finding Words in Foreign Speech
ERIC Educational Resources Information Center
Graf Estes, Katharine; Gluck, Stephanie Chen-Wu; Bastos, Carolina
2015-01-01
The present experiments investigated the flexibility of statistical word segmentation. There is ample evidence that infants can use statistical cues (e.g., syllable transitional probabilities) to segment fluent speech. However, it is unclear how effectively infants track these patterns in unfamiliar phonological systems. We examined whether…
Multichannel Compression, Temporal Cues, and Audibility.
ERIC Educational Resources Information Center
Souza, Pamela E.; Turner, Christopher W.
1998-01-01
The effect of the reduction of the temporal envelope produced by multichannel compression on recognition was examined in 16 listeners with hearing loss, with particular focus on audibility of the speech signal. Multichannel compression improved speech recognition when superior audibility was provided by a two-channel compression system over linear…
Audio-Visual Speech Perception Is Special
ERIC Educational Resources Information Center
Tuomainen, J.; Andersen, T.S.; Tiippana, K.; Sams, M.
2005-01-01
In face-to-face conversation speech is perceived by ear and eye. We studied the prerequisites of audio-visual speech perception by using perceptually ambiguous sine wave replicas of natural speech as auditory stimuli. When the subjects were not aware that the auditory stimuli were speech, they showed only negligible integration of auditory and…
Visual and Auditory Input in Second-Language Speech Processing
ERIC Educational Resources Information Center
Hardison, Debra M.
2010-01-01
The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on…
Park, Hyojin; Kayser, Christoph; Thut, Gregor; Gross, Joachim
2016-01-01
During continuous speech, lip movements provide visual temporal signals that facilitate speech processing. Here, using MEG we directly investigated how these visual signals interact with rhythmic brain activity in participants listening to and seeing the speaker. First, we investigated coherence between oscillatory brain activity and speaker’s lip movements and demonstrated significant entrainment in visual cortex. We then used partial coherence to remove contributions of the coherent auditory speech signal from the lip-brain coherence. Comparing this synchronization between different attention conditions revealed that attending visual speech enhances the coherence between activity in visual cortex and the speaker’s lips. Further, we identified a significant partial coherence between left motor cortex and lip movements and this partial coherence directly predicted comprehension accuracy. Our results emphasize the importance of visually entrained and attention-modulated rhythmic brain activity for the enhancement of audiovisual speech processing. DOI: http://dx.doi.org/10.7554/eLife.14521.001 PMID:27146891
Seeing is believing: information content and behavioural response to visual and chemical cues
Gonzálvez, Francisco G.; Rodríguez-Gironés, Miguel A.
2013-01-01
Predator avoidance and foraging often pose conflicting demands. Animals can decrease mortality risk searching for predators, but searching decreases foraging time and hence intake. We used this principle to investigate how prey should use information to detect, assess and respond to predation risk from an optimal foraging perspective. A mathematical model showed that solitary bees should increase flower examination time in response to predator cues and that the rate of false alarms should be negatively correlated with the relative value of the flower explored. The predatory ant, Oecophylla smaragdina, and the harmless ant, Polyrhachis dives, differ in the profile of volatiles they emit and in their visual appearance. As predicted, the solitary bee Nomia strigata spent more time examining virgin flowers in presence of predator cues than in their absence. Furthermore, the proportion of flowers rejected decreased from morning to noon, as the relative value of virgin flowers increased. In addition, bees responded differently to visual and chemical cues. While chemical cues induced bees to search around flowers, bees detecting visual cues hovered in front of them. These strategies may allow prey to identify the nature of visual cues and to locate the source of chemical cues. PMID:23698013
NASA Technical Reports Server (NTRS)
Foyle, David C.; Kaiser, Mary K.; Johnson, Walter W.
1992-01-01
This paper reviews some of the sources of visual information that are available in the out-the-window scene and describes how these visual cues are important for routine pilotage and training, as well as the development of simulator visual systems and enhanced or synthetic vision systems for aircraft cockpits. It is shown how these visual cues may change or disappear under environmental or sensor conditions, and how the visual scene can be augmented by advanced displays to capitalize on the pilot's excellent ability to extract visual information from the visual scene.
Suprasegmental information affects processing of talking faces at birth.
Guellai, Bahia; Mersad, Karima; Streri, Arlette
2015-02-01
From birth, newborns show a preference for faces talking a native language compared to silent faces. The present study addresses two questions that remained unanswered by previous research: (a) Does the familiarity with the language play a role in this process and (b) Are all the linguistic and paralinguistic cues necessary in this case? Experiment 1 extended newborns' preference for native speakers to non-native ones. Given that fetuses and newborns are sensitive to the prosodic characteristics of speech, Experiments 2 and 3 presented faces talking native and nonnative languages with the speech stream being low-pass filtered. Results showed that newborns preferred looking at a person who talked to them even when only the prosodic cues were provided for both languages. Nonetheless, a familiarity preference for the previously talking face is observed in the "normal speech" condition (i.e., Experiment 1) and a novelty preference in the "filtered speech" condition (Experiments 2 and 3). This asymmetry reveals that newborns process these two types of stimuli differently and that they may already be sensitive to a mismatch between the articulatory movements of the face and the corresponding speech sounds. Copyright © 2014 Elsevier Inc. All rights reserved.
Rhythm Perception and Its Role in Perception and Learning of Dysrhythmic Speech.
Borrie, Stephanie A; Lansford, Kaitlin L; Barrett, Tyson S
2017-03-01
The perception of rhythm cues plays an important role in recognizing spoken language, especially in adverse listening conditions. Indeed, this has been shown to hold true even when the rhythm cues themselves are dysrhythmic. This study investigates whether expertise in rhythm perception provides a processing advantage for perception (initial intelligibility) and learning (intelligibility improvement) of naturally dysrhythmic speech, dysarthria. Fifty young adults with typical hearing participated in 3 key tests, including a rhythm perception test, a receptive vocabulary test, and a speech perception and learning test, with standard pretest, familiarization, and posttest phases. Initial intelligibility scores were calculated as the proportion of correct pretest words, while intelligibility improvement scores were calculated by subtracting this proportion from the proportion of correct posttest words. Rhythm perception scores predicted intelligibility improvement scores but not initial intelligibility. On the other hand, receptive vocabulary scores predicted initial intelligibility scores but not intelligibility improvement. Expertise in rhythm perception appears to provide an advantage for processing dysrhythmic speech, but a familiarization experience is required for the advantage to be realized. Findings are discussed in relation to the role of rhythm in speech processing and shed light on processing models that consider the consequence of rhythm abnormalities in dysarthria.
Audio visual speech source separation via improved context dependent association model
NASA Astrophysics Data System (ADS)
Kazemi, Alireza; Boostani, Reza; Sobhanmanesh, Fariborz
2014-12-01
In this paper, we exploit the non-linear relation between a speech source and its associated lip video as a source of extra information to propose an improved audio-visual speech source separation (AVSS) algorithm. The audio-visual association is modeled using a neural associator which estimates the visual lip parameters from a temporal context of acoustic observation frames. We define an objective function based on mean square error (MSE) measure between estimated and target visual parameters. This function is minimized for estimation of the de-mixing vector/filters to separate the relevant source from linear instantaneous or time-domain convolutive mixtures. We have also proposed a hybrid criterion which uses AV coherency together with kurtosis as a non-Gaussianity measure. Experimental results are presented and compared in terms of visually relevant speech detection accuracy and output signal-to-interference ratio (SIR) of source separation. The suggested audio-visual model significantly improves relevant speech classification accuracy compared to existing GMM-based model and the proposed AVSS algorithm improves the speech separation quality compared to reference ICA- and AVSS-based methods.
Neurophysiology underlying influence of stimulus reliability on audiovisual integration.
Shatzer, Hannah; Shen, Stanley; Kerlin, Jess R; Pitt, Mark A; Shahin, Antoine J
2018-01-24
We tested the predictions of the dynamic reweighting model (DRM) of audiovisual (AV) speech integration, which posits that spectrotemporally reliable (informative) AV speech stimuli induce a reweighting of processing from low-level to high-level auditory networks. This reweighting decreases sensitivity to acoustic onsets and in turn increases tolerance to AV onset asynchronies (AVOA). EEG was recorded while subjects watched videos of a speaker uttering trisyllabic nonwords that varied in spectrotemporal reliability and asynchrony of the visual and auditory inputs. Subjects judged the stimuli as in-sync or out-of-sync. Results showed that subjects exhibited greater AVOA tolerance for non-blurred than blurred visual speech and for less than more degraded acoustic speech. Increased AVOA tolerance was reflected in reduced amplitude of the P1-P2 auditory evoked potentials, a neurophysiological indication of reduced sensitivity to acoustic onsets and successful AV integration. There was also sustained visual alpha band (8-14 Hz) suppression (desynchronization) following acoustic speech onsets for non-blurred vs. blurred visual speech, consistent with continuous engagement of the visual system as the speech unfolds. The current findings suggest that increased spectrotemporal reliability of acoustic and visual speech promotes robust AV integration, partly by suppressing sensitivity to acoustic onsets, in support of the DRM's reweighting mechanism. Increased visual signal reliability also sustains the engagement of the visual system with the auditory system to maintain alignment of information across modalities. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Role of somatosensory and vestibular cues in attenuating visually induced human postural sway
NASA Technical Reports Server (NTRS)
Peterka, R. J.; Benolken, M. S.
1995-01-01
The purpose of this study was to determine the contribution of visual, vestibular, and somatosensory cues to the maintenance of stance in humans. Postural sway was induced by full-field, sinusoidal visual surround rotations about an axis at the level of the ankle joints. The influences of vestibular and somatosensory cues were characterized by comparing postural sway in normal and bilateral vestibular absent subjects in conditions that provided either accurate or inaccurate somatosensory orientation information. In normal subjects, the amplitude of visually induced sway reached a saturation level as stimulus amplitude increased. The saturation amplitude decreased with increasing stimulus frequency. No saturation phenomena were observed in subjects with vestibular loss, implying that vestibular cues were responsible for the saturation phenomenon. For visually induced sways below the saturation level, the stimulus-response curves for both normal subjects and subjects experiencing vestibular loss were nearly identical, implying (1) that normal subjects were not using vestibular information to attenuate their visually induced sway, possibly because sway was below a vestibular-related threshold level, and (2) that subjects with vestibular loss did not utilize visual cues to a greater extent than normal subjects; that is, a fundamental change in visual system "gain" was not used to compensate for a vestibular deficit. An unexpected finding was that the amplitude of body sway induced by visual surround motion could be almost 3 times greater than the amplitude of the visual stimulus in normal subjects and subjects with vestibular loss. This occurred in conditions where somatosensory cues were inaccurate and at low stimulus amplitudes. A control system model of visually induced postural sway was developed to explain this finding. For both subject groups, the amplitude of visually induced sway was smaller by a factor of about 4 in tests where somatosensory cues provided accurate versus inaccurate orientation information. This implied (1) that the subjects experiencing vestibular loss did not utilize somatosensory cues to a greater extent than normal subjects; that is, changes in somatosensory system "gain" were not used to compensate for a vestibular deficit, and (2) that the threshold for the use of vestibular cues in normal subjects was apparently lower in test conditions where somatosensory cues were providing accurate orientation information.
Janssen, Simone; Schmidt, Sabine
2009-07-01
The perception of prosodic cues in human speech may be rooted in mechanisms common to mammals. The present study explores to what extent bats use rhythm and frequency, typically carrying prosodic information in human speech, for the classification of communication call series. Using a two-alternative, forced choice procedure, we trained Megaderma lyra to discriminate between synthetic contact call series differing in frequency, rhythm on level of calls and rhythm on level of call series, and measured the classification performance for stimuli differing in only one, or two, of the above parameters. A comparison with predictions from models based on one, combinations of two, or all, parameters revealed that the bats based their decision predominantly on frequency and in addition on rhythm on the level of call series, whereas rhythm on level of calls was not taken into account in this paradigm. Moreover, frequency and rhythm on the level of call series were evaluated independently. Our results show that parameters corresponding to prosodic cues in human languages are perceived and evaluated by bats. Thus, these necessary prerequisites for a communication via prosodic structures in mammals have evolved far before human speech.
Understanding speaker attitudes from prosody by adults with Parkinson's disease.
Monetta, Laura; Cheang, Henry S; Pell, Marc D
2008-09-01
The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).
Rate and onset cues can improve cochlear implant synthetic vowel recognition in noise
Mc Laughlin, Myles; Reilly, Richard B.; Zeng, Fan-Gang
2013-01-01
Understanding speech-in-noise is difficult for most cochlear implant (CI) users. Speech-in-noise segregation cues are well understood for acoustic hearing but not for electric hearing. This study investigated the effects of stimulation rate and onset delay on synthetic vowel-in-noise recognition in CI subjects. In experiment I, synthetic vowels were presented at 50, 145, or 795 pulse/s and noise at the same three rates, yielding nine combinations. Recognition improved significantly if the noise had a lower rate than the vowel, suggesting that listeners can use temporal gaps in the noise to detect a synthetic vowel. This hypothesis is supported by accurate prediction of synthetic vowel recognition using a temporal integration window model. Using lower rates a similar trend was observed in normal hearing subjects. Experiment II found that for CI subjects, a vowel onset delay improved performance if the noise had a lower or higher rate than the synthetic vowel. These results show that differing rates or onset times can improve synthetic vowel-in-noise recognition, indicating a need to develop speech processing strategies that encode or emphasize these cues. PMID:23464025
Differential processing of binocular and monocular gloss cues in human visual cortex.
Sun, Hua-Chun; Di Luca, Massimiliano; Ban, Hiroshi; Muryy, Alexander; Fleming, Roland W; Welchman, Andrew E
2016-06-01
The visual impression of an object's surface reflectance ("gloss") relies on a range of visual cues, both monocular and binocular. Whereas previous imaging work has identified processing within ventral visual areas as important for monocular cues, little is known about cortical areas involved in processing binocular cues. Here, we used human functional MRI (fMRI) to test for brain areas selectively involved in the processing of binocular cues. We manipulated stereoscopic information to create four conditions that differed in their disparity structure and in the impression of surface gloss that they evoked. We performed multivoxel pattern analysis to find areas whose fMRI responses allow classes of stimuli to be distinguished based on their depth structure vs. material appearance. We show that higher dorsal areas play a role in processing binocular gloss information, in addition to known ventral areas involved in material processing, with ventral area lateral occipital responding to both object shape and surface material properties. Moreover, we tested for similarities between the representation of gloss from binocular cues and monocular cues. Specifically, we tested for transfer in the decoding performance of an algorithm trained on glossy vs. matte objects defined by either binocular or by monocular cues. We found transfer effects from monocular to binocular cues in dorsal visual area V3B/kinetic occipital (KO), suggesting a shared representation of the two cues in this area. These results indicate the involvement of mid- to high-level visual circuitry in the estimation of surface material properties, with V3B/KO potentially playing a role in integrating monocular and binocular cues. Copyright © 2016 the American Physiological Society.
Social Engagement in Public Places: A Tale of One Robot
2014-03-01
study we examined a prediction of Computers Are Social Actors (CASA) framework: the more machines present human -like characteristics in a consistent...social cues to increasing levels of social cues during story-telling to human -like game-playing interaction. We found several strong aspects of...support for CASA: the robot that provides even minimal social cues (speech) is more engaging than a robot that does nothing, and the more human -like the
Salter, Phia S; Kelley, Nicholas J; Molina, Ludwin E; Thai, Luyen T
2017-09-01
Photographs provide critical retrieval cues for personal remembering, but few studies have considered this phenomenon at the collective level. In this research, we examined the psychological consequences of visual attention to the presence (or absence) of racially charged retrieval cues within American racial segregation photographs. We hypothesised that attention to racial retrieval cues embedded in historical photographs would increase social justice concept accessibility. In Study 1, we recorded gaze patterns with an eye-tracker among participants viewing images that contained racial retrieval cues or were digitally manipulated to remove them. In Study 2, we manipulated participants' gaze behaviour by either directing visual attention toward racial retrieval cues, away from racial retrieval cues, or directing attention within photographs where racial retrieval cues were missing. Across Studies 1 and 2, visual attention to racial retrieval cues in photographs documenting historical segregation predicted social justice concept accessibility.
USDA-ARS?s Scientific Manuscript database
In June and July 2011 traps were deployed in Tuskegee National Forest, Macon County, Alabama to test the influence of chemical and visual cues on for the capture of bark and ambrosia beetles (Coleoptera: Curculionidae: Scolytinae). \\using chemical and visual cues. The first experiment investigated t...
Impact of Visual, Vocal, and Lexical Cues on Judgments of Counselor Qualities
ERIC Educational Resources Information Center
Strahan, Carole; Zytowski, Donald G.
1976-01-01
Undergraduate students (N=130) rated Carl Rogers via visual, lexical, vocal, or vocal-lexical communication channels. Lexical cues were more important in creating favorable impressions among females. Subsequent exposure to combined visual-vocal-lexical cues resulted in warmer and less distant ratings, but not on a consistent basis. (Author)
Robot Command Interface Using an Audio-Visual Speech Recognition System
NASA Astrophysics Data System (ADS)
Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
Schiller, Peter H; Kwak, Michelle C; Slocum, Warren M
2012-08-01
This study examined how effectively visual and auditory cues can be integrated in the brain for the generation of motor responses. The latencies with which saccadic eye movements are produced in humans and monkeys form, under certain conditions, a bimodal distribution, the first mode of which has been termed express saccades. In humans, a much higher percentage of express saccades is generated when both visual and auditory cues are provided compared with the single presentation of these cues [H. C. Hughes et al. (1994) J. Exp. Psychol. Hum. Percept. Perform., 20, 131-153]. In this study, we addressed two questions: first, do monkeys also integrate visual and auditory cues for express saccade generation as do humans and second, does such integration take place in humans when, instead of eye movements, the task is to press levers with fingers? Our results show that (i) in monkeys, as in humans, the combined visual and auditory cues generate a much higher percentage of express saccades than do singly presented cues and (ii) the latencies with which levers are pressed by humans are shorter when both visual and auditory cues are provided compared with the presentation of single cues, but the distribution in all cases is unimodal; response latencies in the express range seen in the execution of saccadic eye movements are not obtained with lever pressing. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Schaadt, Gesa; van der Meer, Elke; Pannekamp, Ann; Oberecker, Regine; Männel, Claudia
2018-01-17
During information processing, individuals benefit from bimodally presented input, as has been demonstrated for speech perception (i.e., printed letters and speech sounds) or the perception of emotional expressions (i.e., facial expression and voice tuning). While typically developing individuals show this bimodal benefit, school children with dyslexia do not. Currently, it is unknown whether the bimodal processing deficit in dyslexia also occurs for visual-auditory speech processing that is independent of reading and spelling acquisition (i.e., no letter-sound knowledge is required). Here, we tested school children with and without spelling problems on their bimodal perception of video-recorded mouth movements pronouncing syllables. We analyzed the event-related potential Mismatch Response (MMR) to visual-auditory speech information and compared this response to the MMR to monomodal speech information (i.e., auditory-only, visual-only). We found a reduced MMR with later onset to visual-auditory speech information in children with spelling problems compared to children without spelling problems. Moreover, when comparing bimodal and monomodal speech perception, we found that children without spelling problems showed significantly larger responses in the visual-auditory experiment compared to the visual-only response, whereas children with spelling problems did not. Our results suggest that children with dyslexia exhibit general difficulties in bimodal speech perception independently of letter-speech sound knowledge, as apparent in altered bimodal speech perception and lacking benefit from bimodal information. This general deficit in children with dyslexia may underlie the previously reported reduced bimodal benefit for letter-speech sound combinations and similar findings in emotion perception. Copyright © 2018 Elsevier Ltd. All rights reserved.
Perceived gender in clear and conversational speech
NASA Astrophysics Data System (ADS)
Booz, Jaime A.
Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated with making adjustments to improve speech clarity have a larger impact on perceived femininity than others. Using a clear speech strategy alone may not be sufficient for a male speaker to be perceived as female, but could be used as one of many tools to help speakers achieve more "feminine" speech, in conjunction with more specific strategies targeting the acoustic parameters outlined in this study.
Comprehensive evaluation of a child with an auditory brainstem implant.
Eisenberg, Laurie S; Johnson, Karen C; Martinez, Amy S; DesJardin, Jean L; Stika, Carren J; Dzubak, Danielle; Mahalak, Mandy Lutz; Rector, Emily P
2008-02-01
We had an opportunity to evaluate an American child whose family traveled to Italy to receive an auditory brainstem implant (ABI). The goal of this evaluation was to obtain insight into possible benefits derived from the ABI and to begin developing assessment protocols for pediatric clinical trials. Case study. Tertiary referral center. Pediatric ABI Patient 1 was born with auditory nerve agenesis. Auditory brainstem implant surgery was performed in December, 2005, in Verona, Italy. The child was assessed at the House Ear Institute, Los Angeles, in July 2006 at the age of 3 years 11 months. Follow-up assessment has continued at the HEAR Center in Birmingham, Alabama. Auditory brainstem implant. Performance was assessed for the domains of audition, speech and language, intelligence and behavior, quality of life, and parental factors. Patient 1 demonstrated detection of sound, speech pattern perception with visual cues, and inconsistent auditory-only vowel discrimination. Language age with signs was approximately 2 years, and vocalizations were increasing. Of normal intelligence, he exhibited attention deficits with difficulty completing structured tasks. Twelve months later, this child was able to identify speech patterns consistently; closed-set word identification was emerging. These results were within the range of performance for a small sample of similarly aged pediatric cochlear implant users. Pediatric ABI assessment with a group of well-selected children is needed to examine risk versus benefit in this population and to analyze whether open-set speech recognition is achievable.
Audio–visual interactions for motion perception in depth modulate activity in visual area V3A
Ogawa, Akitoshi; Macaluso, Emiliano
2013-01-01
Multisensory signals can enhance the spatial perception of objects and events in the environment. Changes of visual size and auditory intensity provide us with the main cues about motion direction in depth. However, frequency changes in audition and binocular disparity in vision also contribute to the perception of motion in depth. Here, we presented subjects with several combinations of auditory and visual depth-cues to investigate multisensory interactions during processing of motion in depth. The task was to discriminate the direction of auditory motion in depth according to increasing or decreasing intensity. Rising or falling auditory frequency provided an additional within-audition cue that matched or did not match the intensity change (i.e. intensity-frequency (IF) “matched vs. unmatched” conditions). In two-thirds of the trials, a task-irrelevant visual stimulus moved either in the same or opposite direction of the auditory target, leading to audio–visual “congruent vs. incongruent” between-modalities depth-cues. Furthermore, these conditions were presented either with or without binocular disparity. Behavioral data showed that the best performance was observed in the audio–visual congruent condition with IF matched. Brain imaging results revealed maximal response in visual area V3A when all cues provided congruent and reliable depth information (i.e. audio–visual congruent, IF-matched condition including disparity cues). Analyses of effective connectivity revealed increased coupling from auditory cortex to V3A specifically in audio–visual congruent trials. We conclude that within- and between-modalities cues jointly contribute to the processing of motion direction in depth, and that they do so via dynamic changes of connectivity between visual and auditory cortices. PMID:23333414
Lönnstedt, Oona M; Munday, Philip L; McCormick, Mark I; Ferrari, Maud C O; Chivers, Douglas P
2013-09-01
Carbon dioxide (CO2) levels in the atmosphere and surface ocean are rising at an unprecedented rate due to sustained and accelerating anthropogenic CO2 emissions. Previous studies have documented that exposure to elevated CO2 causes impaired antipredator behavior by coral reef fish in response to chemical cues associated with predation. However, whether ocean acidification will impair visual recognition of common predators is currently unknown. This study examined whether sensory compensation in the presence of multiple sensory cues could reduce the impacts of ocean acidification on antipredator responses. When exposed to seawater enriched with levels of CO2 predicted for the end of this century (880 μatm CO2), prey fish completely lost their response to conspecific alarm cues. While the visual response to a predator was also affected by high CO2, it was not entirely lost. Fish exposed to elevated CO2, spent less time in shelter than current-day controls and did not exhibit antipredator signaling behavior (bobbing) when multiple predator cues were present. They did, however, reduce feeding rate and activity levels to the same level as controls. The results suggest that the response of fish to visual cues may partially compensate for the lack of response to chemical cues. Fish subjected to elevated CO2 levels, and exposed to chemical and visual predation cues simultaneously, responded with the same intensity as controls exposed to visual cues alone. However, these responses were still less than control fish simultaneously exposed to chemical and visual predation cues. Consequently, visual cues improve antipredator behavior of CO2 exposed fish, but do not fully compensate for the loss of response to chemical cues. The reduced ability to correctly respond to a predator will have ramifications for survival in encounters with predators in the field, which could have repercussions for population replenishment in acidified oceans.
Contextual Cueing Effect in Spatial Layout Defined by Binocular Disparity
Zhao, Guang; Zhuang, Qian; Ma, Jie; Tu, Shen; Liu, Qiang; Sun, Hong-jin
2017-01-01
Repeated visual context induces higher search efficiency, revealing a contextual cueing effect, which depends on the association between the target and its visual context. In this study, participants performed a visual search task where search items were presented with depth information defined by binocular disparity. When the 3-dimensional (3D) configurations were repeated over blocks, the contextual cueing effect was obtained (Experiment 1). When depth information was in chaos over repeated configurations, visual search was not facilitated and the contextual cueing effect largely crippled (Experiment 2). However, when we made the search items within a tiny random displacement in the 2-dimentional (2D) plane but maintained the depth information constant, the contextual cueing was preserved (Experiment 3). We concluded that the contextual cueing effect was robust in the context provided by 3D space with stereoscopic information, and more importantly, the visual system prioritized stereoscopic information in learning of spatial information when depth information was available. PMID:28912739
Contextual Cueing Effect in Spatial Layout Defined by Binocular Disparity.
Zhao, Guang; Zhuang, Qian; Ma, Jie; Tu, Shen; Liu, Qiang; Sun, Hong-Jin
2017-01-01
Repeated visual context induces higher search efficiency, revealing a contextual cueing effect, which depends on the association between the target and its visual context. In this study, participants performed a visual search task where search items were presented with depth information defined by binocular disparity. When the 3-dimensional (3D) configurations were repeated over blocks, the contextual cueing effect was obtained (Experiment 1). When depth information was in chaos over repeated configurations, visual search was not facilitated and the contextual cueing effect largely crippled (Experiment 2). However, when we made the search items within a tiny random displacement in the 2-dimentional (2D) plane but maintained the depth information constant, the contextual cueing was preserved (Experiment 3). We concluded that the contextual cueing effect was robust in the context provided by 3D space with stereoscopic information, and more importantly, the visual system prioritized stereoscopic information in learning of spatial information when depth information was available.
ERIC Educational Resources Information Center
Haskins Labs., New Haven, CT.
This report is one of a regular series about the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. The 11 papers discuss the dissociation of spectral and temporal cues to the voicing distinction in initial stopped consonants; perceptual integration and selective attention in…
Revisiting place and temporal theories of pitch
2014-01-01
The nature of pitch and its neural coding have been studied for over a century. A popular debate has revolved around the question of whether pitch is coded via “place” cues in the cochlea, or via timing cues in the auditory nerve. In the most recent incarnation of this debate, the role of temporal fine structure has been emphasized in conveying important pitch and speech information, particularly because the lack of temporal fine structure coding in cochlear implants might explain some of the difficulties faced by cochlear implant users in perceiving music and pitch contours in speech. In addition, some studies have postulated that hearing-impaired listeners may have a specific deficit related to processing temporal fine structure. This article reviews some of the recent literature surrounding the debate, and argues that much of the recent evidence suggesting the importance of temporal fine structure processing can also be accounted for using spectral (place) or temporal-envelope cues. PMID:25364292
A magnetoencephalography study of visual processing of pain anticipation.
Machado, Andre G; Gopalakrishnan, Raghavan; Plow, Ela B; Burgess, Richard C; Mosher, John C
2014-07-15
Anticipating pain is important for avoiding injury; however, in chronic pain patients, anticipatory behavior can become maladaptive, leading to sensitization and limiting function. Knowledge of networks involved in pain anticipation and conditioning over time could help devise novel, better-targeted therapies. With the use of magnetoencephalography, we evaluated in 10 healthy subjects the neural processing of pain anticipation. Anticipatory cortical activity elicited by consecutive visual cues that signified imminent painful stimulus was compared with cues signifying nonpainful and no stimulus. We found that the neural processing of visually evoked pain anticipation involves the primary visual cortex along with cingulate and frontal regions. Visual cortex could quickly and independently encode and discriminate between visual cues associated with pain anticipation and no pain during preconscious phases following object presentation. When evaluating the effect of task repetition on participating cortical areas, we found that activity of prefrontal and cingulate regions was mostly prominent early on when subjects were still naive to a cue's contextual meaning. Visual cortical activity was significant throughout later phases. Although visual cortex may precisely and time efficiently decode cues anticipating pain or no pain, prefrontal areas establish the context associated with each cue. These findings have important implications toward processes involved in pain anticipation and maladaptive pain conditioning. Copyright © 2014 the American Physiological Society.
Visual activity predicts auditory recovery from deafness after adult cochlear implantation.
Strelnikov, Kuzma; Rouger, Julien; Demonet, Jean-François; Lagleyre, Sebastien; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal
2013-12-01
Modern cochlear implantation technologies allow deaf patients to understand auditory speech; however, the implants deliver only a coarse auditory input and patients must use long-term adaptive processes to achieve coherent percepts. In adults with post-lingual deafness, the high progress of speech recovery is observed during the first year after cochlear implantation, but there is a large range of variability in the level of cochlear implant outcomes and the temporal evolution of recovery. It has been proposed that when profoundly deaf subjects receive a cochlear implant, the visual cross-modal reorganization of the brain is deleterious for auditory speech recovery. We tested this hypothesis in post-lingually deaf adults by analysing whether brain activity shortly after implantation correlated with the level of auditory recovery 6 months later. Based on brain activity induced by a speech-processing task, we found strong positive correlations in areas outside the auditory cortex. The highest positive correlations were found in the occipital cortex involved in visual processing, as well as in the posterior-temporal cortex known for audio-visual integration. The other area, which positively correlated with auditory speech recovery, was localized in the left inferior frontal area known for speech processing. Our results demonstrate that the visual modality's functional level is related to the proficiency level of auditory recovery. Based on the positive correlation of visual activity with auditory speech recovery, we suggest that visual modality may facilitate the perception of the word's auditory counterpart in communicative situations. The link demonstrated between visual activity and auditory speech perception indicates that visuoauditory synergy is crucial for cross-modal plasticity and fostering speech-comprehension recovery in adult cochlear-implanted deaf patients.
Drijvers, Linda; Özyürek, Asli; Jensen, Ole
2018-06-19
Previous work revealed that visual semantic information conveyed by gestures can enhance degraded speech comprehension, but the mechanisms underlying these integration processes under adverse listening conditions remain poorly understood. We used MEG to investigate how oscillatory dynamics support speech-gesture integration when integration load is manipulated by auditory (e.g., speech degradation) and visual semantic (e.g., gesture congruency) factors. Participants were presented with videos of an actress uttering an action verb in clear or degraded speech, accompanied by a matching (mixing gesture + "mixing") or mismatching (drinking gesture + "walking") gesture. In clear speech, alpha/beta power was more suppressed in the left inferior frontal gyrus and motor and visual cortices when integration load increased in response to mismatching versus matching gestures. In degraded speech, beta power was less suppressed over posterior STS and medial temporal lobe for mismatching compared with matching gestures, showing that integration load was lowest when speech was degraded and mismatching gestures could not be integrated and disambiguate the degraded signal. Our results thus provide novel insights on how low-frequency oscillatory modulations in different parts of the cortex support the semantic audiovisual integration of gestures in clear and degraded speech: When speech is clear, the left inferior frontal gyrus and motor and visual cortices engage because higher-level semantic information increases semantic integration load. When speech is degraded, posterior STS/middle temporal gyrus and medial temporal lobe are less engaged because integration load is lowest when visual semantic information does not aid lexical retrieval and speech and gestures cannot be integrated.
Unconscious cues bias first saccades in a free-saccade task.
Huang, Yu-Feng; Tan, Edlyn Gui Fang; Soon, Chun Siong; Hsieh, Po-Jang
2014-10-01
Visual-spatial attention can be biased towards salient visual information without visual awareness. It is unclear, however, whether such bias can further influence free-choices such as saccades in a free viewing task. In our experiment, we presented visual cues below awareness threshold immediately before people made free saccades. Our results showed that masked cues could influence the direction and latency of the first free saccade, suggesting that salient visual information can unconsciously influence free actions. Copyright © 2014 Elsevier Inc. All rights reserved.
Suggested Interactivity: Seeking Perceived Affordances for Information Visualization.
Boy, Jeremy; Eveillard, Louis; Detienne, Françoise; Fekete, Jean-Daniel
2016-01-01
In this article, we investigate methods for suggesting the interactivity of online visualizations embedded with text. We first assess the need for such methods by conducting three initial experiments on Amazon's Mechanical Turk. We then present a design space for Suggested Interactivity (i. e., visual cues used as perceived affordances-SI), based on a survey of 382 HTML5 and visualization websites. Finally, we assess the effectiveness of three SI cues we designed for suggesting the interactivity of bar charts embedded with text. Our results show that only one cue (SI3) was successful in inciting participants to interact with the visualizations, and we hypothesize this is because this particular cue provided feedforward.
Xie, Zilong; Reetzke, Rachel; Chandrasekaran, Bharath
2018-05-24
Increasing visual perceptual load can reduce pre-attentive auditory cortical activity to sounds, a reflection of the limited and shared attentional resources for sensory processing across modalities. Here, we demonstrate that modulating visual perceptual load can impact the early sensory encoding of speech sounds, and that the impact of visual load is highly dependent on the predictability of the incoming speech stream. Participants (n = 20, 9 females) performed a visual search task of high (target similar to distractors) and low (target dissimilar to distractors) perceptual load, while early auditory electrophysiological responses were recorded to native speech sounds. Speech sounds were presented either in a 'repetitive context', or a less predictable 'variable context'. Independent of auditory stimulus context, pre-attentive auditory cortical activity was reduced during high visual load, relative to low visual load. We applied a data-driven machine learning approach to decode speech sounds from the early auditory electrophysiological responses. Decoding performance was found to be poorer under conditions of high (relative to low) visual load, when the incoming acoustic stream was predictable. When the auditory stimulus context was less predictable, decoding performance was substantially greater for the high (relative to low) visual load conditions. Our results provide support for shared attentional resources between visual and auditory modalities that substantially influence the early sensory encoding of speech signals in a context-dependent manner. Copyright © 2018 IBRO. Published by Elsevier Ltd. All rights reserved.
Multi-microphone adaptive array augmented with visual cueing.
Gibson, Paul L; Hedin, Dan S; Davies-Venn, Evelyn E; Nelson, Peggy; Kramer, Kevin
2012-01-01
We present the development of an audiovisual array that enables hearing aid users to converse with multiple speakers in reverberant environments with significant speech babble noise where their hearing aids do not function well. The system concept consists of a smartphone, a smartphone accessory, and a smartphone software application. The smartphone accessory concept is a multi-microphone audiovisual array in a form factor that allows attachment to the back of the smartphone. The accessory will also contain a lower power radio by which it can transmit audio signals to compatible hearing aids. The smartphone software application concept will use the smartphone's built in camera to acquire images and perform real-time face detection using the built-in face detection support of the smartphone. The audiovisual beamforming algorithm uses the location of talking targets to improve the signal to noise ratio and consequently improve the user's speech intelligibility. Since the proposed array system leverages a handheld consumer electronic device, it will be portable and low cost. A PC based experimental system was developed to demonstrate the feasibility of an audiovisual multi-microphone array and these results are presented.
Kirk, Karen Iler; Prusick, Lindsay; French, Brian; Gotch, Chad; Eisenberg, Laurie S; Young, Nancy
2012-06-01
Under natural conditions, listeners use both auditory and visual speech cues to extract meaning from speech signals containing many sources of variability. However, traditional clinical tests of spoken word recognition routinely employ isolated words or sentences produced by a single talker in an auditory-only presentation format. The more central cognitive processes used during multimodal integration, perceptual normalization, and lexical discrimination that may contribute to individual variation in spoken word recognition performance are not assessed in conventional tests of this kind. In this article, we review our past and current research activities aimed at developing a series of new assessment tools designed to evaluate spoken word recognition in children who are deaf or hard of hearing. These measures are theoretically motivated by a current model of spoken word recognition and also incorporate "real-world" stimulus variability in the form of multiple talkers and presentation formats. The goal of this research is to enhance our ability to estimate real-world listening skills and to predict benefit from sensory aid use in children with varying degrees of hearing loss. American Academy of Audiology.
Putative mechanisms mediating tolerance for audiovisual stimulus onset asynchrony.
Bhat, Jyoti; Miller, Lee M; Pitt, Mark A; Shahin, Antoine J
2015-03-01
Audiovisual (AV) speech perception is robust to temporal asynchronies between visual and auditory stimuli. We investigated the neural mechanisms that facilitate tolerance for audiovisual stimulus onset asynchrony (AVOA) with EEG. Individuals were presented with AV words that were asynchronous in onsets of voice and mouth movement and judged whether they were synchronous or not. Behaviorally, individuals tolerated (perceived as synchronous) longer AVOAs when mouth movement preceded the speech (V-A) stimuli than when the speech preceded mouth movement (A-V). Neurophysiologically, the P1-N1-P2 auditory evoked potentials (AEPs), time-locked to sound onsets and known to arise in and surrounding the primary auditory cortex (PAC), were smaller for the in-sync than the out-of-sync percepts. Spectral power of oscillatory activity in the beta band (14-30 Hz) following the AEPs was larger during the in-sync than out-of-sync perception for both A-V and V-A conditions. However, alpha power (8-14 Hz), also following AEPs, was larger for the in-sync than out-of-sync percepts only in the V-A condition. These results demonstrate that AVOA tolerance is enhanced by inhibiting low-level auditory activity (e.g., AEPs representing generators in and surrounding PAC) that code for acoustic onsets. By reducing sensitivity to acoustic onsets, visual-to-auditory onset mapping is weakened, allowing for greater AVOA tolerance. In contrast, beta and alpha results suggest the involvement of higher-level neural processes that may code for language cues (phonetic, lexical), selective attention, and binding of AV percepts, allowing for wider neural windows of temporal integration, i.e., greater AVOA tolerance. Copyright © 2015 the American Physiological Society.
Gilchrist, Amanda L; Duarte, Audrey; Verhaeghen, Paul
2016-01-01
Research with younger adults has shown that retrospective cues can be used to orient top-down attention toward relevant items in working memory. We examined whether older adults could take advantage of these cues to improve memory performance. Younger and older adults were presented with visual arrays of five colored shapes; during maintenance, participants were presented either with an informative cue based on an object feature (here, object shape or color) that would be probed, or with an uninformative, neutral cue. Although older adults were less accurate overall, both age groups benefited from the presentation of an informative, feature-based cue relative to a neutral cue. Surprisingly, we also observed differences in the effectiveness of shape versus color cues and their effects upon post-cue memory load. These results suggest that older adults can use top-down attention to remove irrelevant items from visual working memory, provided that task-relevant features function as cues.
Influences of Semantic and Prosodic Cues on Word Repetition and Categorization in Autism
ERIC Educational Resources Information Center
Singh, Leher; Harrow, MariLouise S.
2014-01-01
Purpose: To investigate sensitivity to prosodic and semantic cues to emotion in individuals with high-functioning autism (HFA). Method: Emotional prosody and semantics were independently manipulated to assess the relative influence of prosody versus semantics on speech processing. A sample of 10-year-old typically developing children (n = 10) and…
Rhythm Perception and Its Role in Perception and Learning of Dysrhythmic Speech
ERIC Educational Resources Information Center
Borrie, Stephanie A.; Lansford, Kaitlin L.; Barrett, Tyson S.
2017-01-01
Purpose: The perception of rhythm cues plays an important role in recognizing spoken language, especially in adverse listening conditions. Indeed, this has been shown to hold true even when the rhythm cues themselves are dysrhythmic. This study investigates whether expertise in rhythm perception provides a processing advantage for perception…
Chemical and visual communication during mate searching in rock shrimp.
Díaz, Eliecer R; Thiel, Martin
2004-06-01
Mate searching in crustaceans depends on different communicational cues, of which chemical and visual cues are most important. Herein we examined the role of chemical and visual communication during mate searching and assessment in the rock shrimp Rhynchocinetes typus. Adult male rock shrimp experience major ontogenetic changes. The terminal molt stages (named "robustus") are dominant and capable of monopolizing females during the mating process. Previous studies had shown that most females preferably mate with robustus males, but how these dominant males and receptive females find each other is uncertain, and is the question we examined herein. In a Y-maze designed to test for the importance of waterborne chemical cues, we observed that females approached the robustus male significantly more often than the typus male. Robustus males, however, were unable to locate receptive females via chemical signals. Using an experimental set-up that allowed testing for the importance of visual cues, we demonstrated that receptive females do not use visual cues to select robustus males, but robustus males use visual cues to find receptive females. Visual cues used by the robustus males were the tumults created by agitated aggregations of subordinate typus males around the receptive females. These results indicate a strong link between sexual communication and the mating system of rock shrimp in which dominant males monopolize receptive females. We found that females and males use different (sex-specific) communicational cues during mate searching and assessment, and that the sexual communication of rock shrimp is similar to that of the American lobster, where females are first attracted to the dominant males by chemical cues emitted by these males. A brief comparison between these two species shows that female behaviors during sexual communication contribute strongly to the outcome of mate searching and assessment.
Influences of selective adaptation on perception of audiovisual speech
Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.
2016-01-01
Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781
Can Short Duration Visual Cues Influence Students' Reasoning and Eye Movements in Physics Problems?
ERIC Educational Resources Information Center
Madsen, Adrian; Rouinfar, Amy; Larson, Adam M.; Loschky, Lester C.; Rebello, N. Sanjay
2013-01-01
We investigate the effects of visual cueing on students' eye movements and reasoning on introductory physics problems with diagrams. Participants in our study were randomly assigned to either the cued or noncued conditions, which differed by whether the participants saw conceptual physics problems overlaid with dynamic visual cues. Students in the…
Sensitivity to Visual Prosodic Cues in Signers and Nonsigners
ERIC Educational Resources Information Center
Brentari, Diane; Gonzalez, Carolina; Seidl, Amanda; Wilbur, Ronnie
2011-01-01
Three studies are presented in this paper that address how nonsigners perceive the visual prosodic cues in a sign language. In Study 1, adult American nonsigners and users of American Sign Language (ASL) were compared on their sensitivity to the visual cues in ASL Intonational Phrases. In Study 2, hearing, nonsigning American infants were tested…
Enhancing Learning from Dynamic and Static Visualizations by Means of Cueing
ERIC Educational Resources Information Center
Kuhl, Tim; Scheiter, Katharina; Gerjets, Peter
2012-01-01
The current study investigated whether learning from dynamic and two presentation formats for static visualizations can be enhanced by means of cueing. One hundred and fifty university students were randomly assigned to six conditions, resulting from a 2x3-design, with cueing (with/without) and type of visualization (dynamic, static-sequential,…
The development of visual speech perception in Mandarin Chinese-speaking children.
Chen, Liang; Lei, Jianghua
2017-01-01
The present study aimed to investigate the development of visual speech perception in Chinese-speaking children. Children aged 7, 13 and 16 were asked to visually identify both consonant and vowel sounds in Chinese as quickly and accurately as possible. Results revealed (1) an increase in accuracy of visual speech perception between ages 7 and 13 after which the accuracy rate either stagnates or drops; and (2) a U-shaped development pattern in speed of perception with peak performance in 13-year olds. Results also showed that across all age groups, the overall levels of accuracy rose, whereas the response times fell for simplex finals, complex finals and initials. These findings suggest that (1) visual speech perception in Chinese is a developmental process that is acquired over time and is still fine-tuned well into late adolescence; (2) factors other than cross-linguistic differences in phonological complexity and degrees of reliance on visual information are involved in development of visual speech perception.
Speech Segmentation by Statistical Learning Depends on Attention
ERIC Educational Resources Information Center
Toro, Juan M.; Sinnett, Scott; Soto-Faraco, Salvador
2005-01-01
We addressed the hypothesis that word segmentation based on statistical regularities occurs without the need of attention. Participants were presented with a stream of artificial speech in which the only cue to extract the words was the presence of statistical regularities between syllables. Half of the participants were asked to passively listen…
Learning across Languages: Bilingual Experience Supports Dual Language Statistical Word Segmentation
ERIC Educational Resources Information Center
Antovich, Dylan M.; Graf Estes, Katharine
2018-01-01
Bilingual acquisition presents learning challenges beyond those found in monolingual environments, including the need to segment speech in two languages. Infants may use statistical cues, such as syllable-level transitional probabilities, to segment words from fluent speech. In the present study we assessed monolingual and bilingual 14-month-olds'…
The Exploitation of Subphonemic Acoustic Detail in L2 Speech Segmentation
ERIC Educational Resources Information Center
Shoemaker, Ellenor
2014-01-01
The current study addresses an aspect of second language (L2) phonological acquisition that has received little attention to date--namely, the acquisition of allophonic variation as a word boundary cue. The role of subphonemic variation in the segmentation of speech by native speakers has been indisputably demonstrated; however, the acquisition of…
Effects of First and Second Language on Segmentation of Non-Native Speech
ERIC Educational Resources Information Center
Hanulikova, Adriana; Mitterer, Holger; McQueen, James M.
2011-01-01
Do Slovak-German bilinguals apply native Slovak phonological and lexical knowledge when segmenting German speech? When Slovaks listen to their native language, segmentation is impaired when fixed-stress cues are absent (Hanulikova, McQueen & Mitterer, 2010), and, following the Possible-Word Constraint (PWC; Norris, McQueen, Cutler & Butterfield,…
Use of "um" in the Deceptive Speech of a Convicted Murderer
ERIC Educational Resources Information Center
Villar, Gina; Arciuli, Joanne; Mallard, David
2012-01-01
Previous studies have demonstrated a link between language behaviors and deception; however, questions remain about the role of specific linguistic cues, especially in real-life high-stakes lies. This study investigated use of the so-called filler, "um," in externally verifiable truthful versus deceptive speech of a convicted murderer. The data…
Spoken Word Recognition of Chinese Words in Continuous Speech
ERIC Educational Resources Information Center
Yip, Michael C. W.
2015-01-01
The present study examined the role of positional probability of syllables played in recognition of spoken word in continuous Cantonese speech. Because some sounds occur more frequently at the beginning position or ending position of Cantonese syllables than the others, so these kinds of probabilistic information of syllables may cue the locations…
Con-Text: Text Detection for Fine-grained Object Classification.
Karaoglu, Sezer; Tao, Ran; van Gemert, Jan C; Gevers, Theo
2017-05-24
This work focuses on fine-grained object classification using recognized scene text in natural images. While the state-of-the-art relies on visual cues only, this paper is the first work which proposes to combine textual and visual cues. Another novelty is the textual cue extraction. Unlike the state-of-the-art text detection methods, we focus more on the background instead of text regions. Once text regions are detected, they are further processed by two methods to perform text recognition i.e. ABBYY commercial OCR engine and a state-of-the-art character recognition algorithm. Then, to perform textual cue encoding, bi- and trigrams are formed between the recognized characters by considering the proposed spatial pairwise constraints. Finally, extracted visual and textual cues are combined for fine-grained classification. The proposed method is validated on four publicly available datasets: ICDAR03, ICDAR13, Con-Text and Flickr-logo. We improve the state-of-the-art end-to-end character recognition by a large margin of 15% on ICDAR03. We show that textual cues are useful in addition to visual cues for fine-grained classification. We show that textual cues are also useful for logo retrieval. Adding textual cues outperforms visual- and textual-only in fine-grained classification (70.7% to 60.3%) and logo retrieval (57.4% to 54.8%).
Location cue validity affects inhibition of return of visual processing.
Wright, R D; Richard, C M
2000-01-01
Inhibition-of-return is the process by which visual search for an object positioned among others is biased toward novel rather than previously inspected items. It is thought to occur automatically and to increase search efficiency. We examined this phenomenon by studying the facilitative and inhibitory effects of location cueing on target-detection response times in a search task. The results indicated that facilitation was a reflexive consequence of cueing whereas inhibition appeared to depend on cue informativeness. More specifically, the inhibition-of-return effect occurred only when the cue provided no information about the impending target's location. We suggest that the results are consistent with the notion of two levels of visual processing. The first involves rapid and reflexive operations that underlie the facilitative effects of location cueing on target detection. The second involves a rapid but goal-driven inhibition procedure that the perceiver can invoke if doing so will enhance visual search performance.
2018-02-12
usability preference. Results under the second focus showed that the frequency with which participants expected status updates differed depending upon the...assistance requests for both navigational route and building selection depending on the type of exogenous visual cues displayed? 3) Is there a difference...in response time to visual reports for both navigational route and building selection depending on the type of exogenous visual cues displayed? 4
Do you see what I hear? Vantage point preference and visual dominance in a time-space synaesthete.
Jarick, Michelle; Stewart, Mark T; Smilek, Daniel; Dixon, Michael J
2013-01-01
Time-space synaesthetes "see" time units organized in a spatial form. While the structure might be invariant for most synaesthetes, the perspective by which some view their calendar is somewhat flexible. One well-studied synaesthete L adopts different viewpoints for months seen vs. heard. Interestingly, L claims to prefer her auditory perspective, even though the month names are represented visually upside down. To verify this, we used a spatial-cueing task that included audiovisual month cues. These cues were either congruent with L's preferred "auditory" viewpoint (auditory-only and auditory + month inverted) or incongruent (upright visual-only and auditory + month upright). Our prediction was that L would show enhanced cueing effects (larger response time difference between valid and invalid targets) following the audiovisual congruent cues since both elicit the "preferred" auditory perspective. Also, when faced with conflicting cues, we predicted L would choose the preferred auditory perspective over the visual perspective. As we expected, L did show enhanced cueing effects following the audiovisual congruent cues that corresponded with her preferred auditory perspective, but that the visual perspective dominated when L was faced with both viewpoints simultaneously. The results are discussed with relation to the reification hypothesis of sequence space synaesthesia (Eagleman, 2009).
Do you see what I hear? Vantage point preference and visual dominance in a time-space synaesthete
Jarick, Michelle; Stewart, Mark T.; Smilek, Daniel; Dixon, Michael J.
2013-01-01
Time-space synaesthetes “see” time units organized in a spatial form. While the structure might be invariant for most synaesthetes, the perspective by which some view their calendar is somewhat flexible. One well-studied synaesthete L adopts different viewpoints for months seen vs. heard. Interestingly, L claims to prefer her auditory perspective, even though the month names are represented visually upside down. To verify this, we used a spatial-cueing task that included audiovisual month cues. These cues were either congruent with L's preferred “auditory” viewpoint (auditory-only and auditory + month inverted) or incongruent (upright visual-only and auditory + month upright). Our prediction was that L would show enhanced cueing effects (larger response time difference between valid and invalid targets) following the audiovisual congruent cues since both elicit the “preferred” auditory perspective. Also, when faced with conflicting cues, we predicted L would choose the preferred auditory perspective over the visual perspective. As we expected, L did show enhanced cueing effects following the audiovisual congruent cues that corresponded with her preferred auditory perspective, but that the visual perspective dominated when L was faced with both viewpoints simultaneously. The results are discussed with relation to the reification hypothesis of sequence space synaesthesia (Eagleman, 2009). PMID:24137140
Magrelli, Silvia; Jermann, Patrick; Noris, Basilio; Ansermet, François; Hentsch, François; Nadel, Jacqueline; Billard, Aude
2013-01-01
This study investigates attention orienting to social stimuli in children with Autism Spectrum Conditions (ASC) during dyadic social interactions taking place in real-life settings. We study the effect of social cues that differ in complexity and distinguish between social cues produced by facial expressions of emotion and those produced during speech. We record the children's gazes using a head-mounted eye-tracking device and report on a detailed and quantitative analysis of the motion of the gaze in response to the social cues. The study encompasses a group of children with ASC from 2 to 11-years old (n = 14) and a group of typically developing (TD) children (n = 17) between 3 and 6-years old. While the two groups orient overtly to facial expressions, children with ASC do so to a lesser extent. Children with ASC differ importantly from TD children in the way they respond to speech cues, displaying little overt shifting of attention to speaking faces. When children with ASC orient to facial expressions, they show reaction times and first fixation lengths similar to those presented by TD children. However, children with ASC orient to speaking faces slower than TD children. These results support the hypothesis that individuals affected by ASC have difficulties processing complex social sounds and detecting intermodal correspondence between facial and vocal information. It also corroborates evidence that people with ASC show reduced overt attention toward social stimuli. PMID:24312064
Examining assortativity in the mental lexicon: Evidence from word associations.
Van Rensbergen, Bram; Storms, Gert; De Deyne, Simon
2015-12-01
Words are characterized by a variety of lexical and psychological properties, such as their part of speech, word-frequency, concreteness, or affectivity. In this study, we examine how these properties relate to a word's connectivity in the mental lexicon, the structure containing a person's knowledge of words. In particular, we examine the extent to which these properties display assortative mixing, that is, the extent to which words in the lexicon are more likely to be connected to words that share these properties. We investigated three types of word properties: 1) subjective word covariates: valence, dominance, arousal, and concreteness; 2) lexical information: part of speech; and 3) distributional word properties: age-of-acquisition, word frequency, and contextual diversity. We assessed which of these factors exhibit assortativity using a word association task, where the probability of producing a certain response to a cue is a measure of the associative strength between the cue and response in the mental lexicon. Our results show that the extent to which these aspects exhibit assortativity varies considerably, with a high cue-response correspondence on valence, dominance, arousal, concreteness, and part of speech, indicating that these factors correspond to the words people deem as related. In contrast, we find that cues and responses show only little correspondence on word frequency, contextual diversity, and age-of-acquisition, indicating that, compared to subjective and lexical word covariates, distributional properties exhibit only little assortativity in the mental lexicon. Possible theoretical accounts and implications of these findings are discussed.
Maloney, Erin K; Cappella, Joseph N
2016-01-01
Visual depictions of vaping in electronic cigarette advertisements may serve as smoking cues to smokers and former smokers, increasing urge to smoke and smoking behavior, and decreasing self-efficacy, attitudes, and intentions to quit or abstain. After assessing baseline urge to smoke, 301 daily smokers, 272 intermittent smokers, and 311 former smokers were randomly assigned to view three e-cigarette commercials with vaping visuals (the cue condition) or without vaping visuals (the no-cue condition), or to answer unrelated media use questions (the no-ad condition). Participants then answered a posttest questionnaire assessing the outcome variables of interest. Relative to other conditions, in the cue condition, daily smokers reported greater urge to smoke a tobacco cigarette and a marginally significantly greater incidence of actually smoking a tobacco cigarette during the experiment. Former smokers in the cue condition reported lower intentions to abstain from smoking than former smokers in other conditions. No significant differences emerged among intermittent smokers across conditions. These data suggest that visual depictions of vaping in e-cigarette commercials increase daily smokers' urge to smoke cigarettes and may lead to more actual smoking behavior. For former smokers, these cues in advertising may undermine abstinence efforts. Intermittent smokers did not appear to be reactive to these cues. A lack of significant differences between participants in the no-cue and no-ad conditions compared to the cue condition suggests that visual depictions of e-cigarettes and vaping function as smoking cues, and cue reactivity is the mechanism through which these effects were obtained.
Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects
Skoog Waller, Sara; Eriksson, Mårten
2016-01-01
The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics (f0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20–25, 40–45, and 60–65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers’ age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency (f0) and speech rate when attempting to sound younger and decreased f0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f0, as a cue to speaker age. It was concluded that age disguise by voice can be achieved by naïve speakers even though the perceived effect was smaller than intended. PMID:27917144
First-Pass Processing of Value Cues in the Ventral Visual Pathway.
Sasikumar, Dennis; Emeric, Erik; Stuphorn, Veit; Connor, Charles E
2018-02-19
Real-world value often depends on subtle, continuously variable visual cues specific to particular object categories, like the tailoring of a suit, the condition of an automobile, or the construction of a house. Here, we used microelectrode recording in behaving monkeys to test two possible mechanisms for category-specific value-cue processing: (1) previous findings suggest that prefrontal cortex (PFC) identifies object categories, and based on category identity, PFC could use top-down attentional modulation to enhance visual processing of category-specific value cues, providing signals to PFC for calculating value, and (2) a faster mechanism would be first-pass visual processing of category-specific value cues, immediately providing the necessary visual information to PFC. This, however, would require learned mechanisms for processing the appropriate cues in a given object category. To test these hypotheses, we trained monkeys to discriminate value in four letter-like stimulus categories. Each category had a different, continuously variable shape cue that signified value (liquid reward amount) as well as other cues that were irrelevant. Monkeys chose between stimuli of different reward values. Consistent with the first-pass hypothesis, we found early signals for category-specific value cues in area TE (the final stage in monkey ventral visual pathway) beginning 81 ms after stimulus onset-essentially at the start of TE responses. Task-related activity emerged in lateral PFC approximately 40 ms later and consisted mainly of category-invariant value tuning. Our results show that, for familiar, behaviorally relevant object categories, high-level ventral pathway cortex can implement rapid, first-pass processing of category-specific value cues. Copyright © 2018 Elsevier Ltd. All rights reserved.