Emotional Speech Perception Unfolding in Time: The Role of the Basal Ganglia
Paulmann, Silke; Ott, Derek V. M.; Kotz, Sonja A.
2011-01-01
The basal ganglia (BG) have repeatedly been linked to emotional speech processing in studies involving patients with neurodegenerative and structural changes of the BG. However, the majority of previous studies did not consider that (i) emotional speech processing entails multiple processing steps, and the possibility that (ii) the BG may engage in one rather than the other of these processing steps. In the present study we investigate three different stages of emotional speech processing (emotional salience detection, meaning-related processing, and identification) in the same patient group to verify whether lesions to the BG affect these stages in a qualitatively different manner. Specifically, we explore early implicit emotional speech processing (probe verification) in an ERP experiment followed by an explicit behavioral emotional recognition task. In both experiments, participants listened to emotional sentences expressing one of four emotions (anger, fear, disgust, happiness) or neutral sentences. In line with previous evidence patients and healthy controls show differentiation of emotional and neutral sentences in the P200 component (emotional salience detection) and a following negative-going brain wave (meaning-related processing). However, the behavioral recognition (identification stage) of emotional sentences was impaired in BG patients, but not in healthy controls. The current data provide further support that the BG are involved in late, explicit rather than early emotional speech processing stages. PMID:21437277
Ben-David, Boaz M; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H H M
2016-02-01
Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5 discrete emotions (anger, fear, happiness, sadness, and neutral) presented in prosody and semantics. Listeners were asked to rate the sentence as a whole, integrating both speech channels, or to focus on one channel only (prosody or semantics). We observed supremacy of congruency, failure of selective attention, and prosodic dominance. Supremacy of congruency means that a sentence that presents the same emotion in both speech channels was rated highest; failure of selective attention means that listeners were unable to selectively attend to one channel when instructed; and prosodic dominance means that prosodic information plays a larger role than semantics in processing emotional speech. Emotional prosody and semantics are separate but not separable channels, and it is difficult to perceive one without the influence of the other. Our findings indicate that the Test for Rating of Emotions in Speech can reveal specific aspects in the processing of emotional speech and may in the future prove useful for understanding emotion-processing deficits in individuals with pathologies.
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians.
Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan
2014-01-01
Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes.
Involvement of Right STS in Audio-Visual Integration for Affective Speech Demonstrated Using MEG
Hagan, Cindy C.; Woods, Will; Johnson, Sam; Green, Gary G. R.; Young, Andrew W.
2013-01-01
Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals. PMID:23950977
Involvement of right STS in audio-visual integration for affective speech demonstrated using MEG.
Hagan, Cindy C; Woods, Will; Johnson, Sam; Green, Gary G R; Young, Andrew W
2013-01-01
Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals.
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians
Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan
2015-01-01
Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes. PMID:25688196
Pinheiro, Ana P; Rezaii, Neguine; Nestor, Paul G; Rauber, Andréia; Spencer, Kevin M; Niznikiewicz, Margaret
2016-02-01
During speech comprehension, multiple cues need to be integrated at a millisecond speed, including semantic information, as well as voice identity and affect cues. A processing advantage has been demonstrated for self-related stimuli when compared with non-self stimuli, and for emotional relative to neutral stimuli. However, very few studies investigated self-other speech discrimination and, in particular, how emotional valence and voice identity interactively modulate speech processing. In the present study we probed how the processing of words' semantic valence is modulated by speaker's identity (self vs. non-self voice). Sixteen healthy subjects listened to 420 prerecorded adjectives differing in voice identity (self vs. non-self) and semantic valence (neutral, positive and negative), while electroencephalographic data were recorded. Participants were instructed to decide whether the speech they heard was their own (self-speech condition), someone else's (non-self speech), or if they were unsure. The ERP results demonstrated interactive effects of speaker's identity and emotional valence on both early (N1, P2) and late (Late Positive Potential - LPP) processing stages: compared with non-self speech, self-speech with neutral valence elicited more negative N1 amplitude, self-speech with positive valence elicited more positive P2 amplitude, and self-speech with both positive and negative valence elicited more positive LPP. ERP differences between self and non-self speech occurred in spite of similar accuracy in the recognition of both types of stimuli. Together, these findings suggest that emotion and speaker's identity interact during speech processing, in line with observations of partially dependent processing of speech and speaker information. Copyright © 2016. Published by Elsevier Inc.
The Sound of Feelings: Electrophysiological Responses to Emotional Speech in Alexithymia
Goerlich, Katharina Sophia; Aleman, André; Martens, Sander
2012-01-01
Background Alexithymia is a personality trait characterized by difficulties in the cognitive processing of emotions (cognitive dimension) and in the experience of emotions (affective dimension). Previous research focused mainly on visual emotional processing in the cognitive alexithymia dimension. We investigated the impact of both alexithymia dimensions on electrophysiological responses to emotional speech in 60 female subjects. Methodology During unattended processing, subjects watched a movie while an emotional prosody oddball paradigm was presented in the background. During attended processing, subjects detected deviants in emotional prosody. The cognitive alexithymia dimension was associated with a left-hemisphere bias during early stages of unattended emotional speech processing, and with generally reduced amplitudes of the late P3 component during attended processing. In contrast, the affective dimension did not modulate unattended emotional prosody perception, but was associated with reduced P3 amplitudes during attended processing particularly to emotional prosody spoken in high intensity. Conclusions Our results provide evidence for a dissociable impact of the two alexithymia dimensions on electrophysiological responses during the attended and unattended processing of emotional prosody. The observed electrophysiological modulations are indicative of a reduced sensitivity to the emotional qualities of speech, which may be a contributing factor to problems in interpersonal communication associated with alexithymia. PMID:22615853
An ERP study of vocal emotion processing in asymmetric Parkinson’s disease
Garrido-Vásquez, Patricia; Pell, Marc D.; Paulmann, Silke; Strecker, Karl; Schwarz, Johannes; Kotz, Sonja A.
2013-01-01
Parkinson’s disease (PD) has been related to impaired processing of emotional speech intonation (emotional prosody). One distinctive feature of idiopathic PD is motor symptom asymmetry, with striatal dysfunction being strongest in the hemisphere contralateral to the most affected body side. It is still unclear whether this asymmetry may affect vocal emotion perception. Here, we tested 22 PD patients (10 with predominantly left-sided [LPD] and 12 with predominantly right-sided motor symptoms) and 22 healthy controls in an event-related potential study. Sentences conveying different emotional intonations were presented in lexical and pseudo-speech versions. Task varied between an explicit and an implicit instruction. Of specific interest was emotional salience detection from prosody, reflected in the P200 component. We predicted that patients with predominantly right-striatal dysfunction (LPD) would exhibit P200 alterations. Our results support this assumption. LPD patients showed enhanced P200 amplitudes, and specific deficits were observed for disgust prosody, explicit anger processing and implicit processing of happy prosody. Lexical speech was predominantly affected while the processing of pseudo-speech was largely intact. P200 amplitude in patients correlated significantly with left motor scores and asymmetry indices. The data suggest that emotional salience detection from prosody is affected by asymmetric neuronal degeneration in PD. PMID:22956665
Keshtiari, Niloofar; Kuhlmann, Michael; Eslami, Moharram; Klann-Delius, Gisela
2015-03-01
Research on emotional speech often requires valid stimuli for assessing perceived emotion through prosody and lexical content. To date, no comprehensive emotional speech database for Persian is officially available. The present article reports the process of designing, compiling, and evaluating a comprehensive emotional speech database for colloquial Persian. The database contains a set of 90 validated novel Persian sentences classified in five basic emotional categories (anger, disgust, fear, happiness, and sadness), as well as a neutral category. These sentences were validated in two experiments by a group of 1,126 native Persian speakers. The sentences were articulated by two native Persian speakers (one male, one female) in three conditions: (1) congruent (emotional lexical content articulated in a congruent emotional voice), (2) incongruent (neutral sentences articulated in an emotional voice), and (3) baseline (all emotional and neutral sentences articulated in neutral voice). The speech materials comprise about 470 sentences. The validity of the database was evaluated by a group of 34 native speakers in a perception test. Utterances recognized better than five times chance performance (71.4 %) were regarded as valid portrayals of the target emotions. Acoustic analysis of the valid emotional utterances revealed differences in pitch, intensity, and duration, attributes that may help listeners to correctly classify the intended emotion. The database is designed to be used as a reliable material source (for both text and speech) in future cross-cultural or cross-linguistic studies of emotional speech, and it is available for academic research purposes free of charge. To access the database, please contact the first author.
The effect of emotion on articulation rate in persistence and recovery of childhood stuttering.
Erdemir, Aysu; Walden, Tedra A; Jefferson, Caswell M; Choi, Dahye; Jones, Robin M
2018-06-01
This study investigated the possible association of emotional processes and articulation rate in pre-school age children who stutter and persist (persisting), children who stutter and recover (recovered) and children who do not stutter (nonstuttering). The participants were ten persisting, ten recovered, and ten nonstuttering children between the ages of 3-5 years; who were classified as persisting, recovered, or nonstuttering approximately 2-2.5 years after the experimental testing took place. The children were exposed to three emotionally-arousing video clips (baseline, positive and negative) and produced a narrative based on a text-free storybook following each video clip. From the audio-recordings of these narratives, individual utterances were transcribed and articulation rates were calculated. Results indicated that persisting children exhibited significantly slower articulation rates following the negative emotion condition, unlike recovered and nonstuttering children whose articulation rates were not affected by either of the two emotion-inducing conditions. Moreover, all stuttering children displayed faster rates during fluent compared to stuttered speech; however, the recovered children were significantly faster than the persisting children during fluent speech. Negative emotion plays a detrimental role on the speech-motor control processes of children who persist, whereas children who eventually recover seem to exhibit a relatively more stable and mature speech-motor system. This suggests that complex interactions between speech-motor and emotional processes are at play in stuttering recovery and persistency; and articulation rates following negative emotion or during stuttered versus fluent speech might be considered as potential factors to prospectively predict persistence and recovery from stuttering. Copyright © 2017 Elsevier Inc. All rights reserved.
Lima, César F; Garrett, Carolina; Castro, São Luís
2013-01-01
Does emotion processing in music and speech prosody recruit common neurocognitive mechanisms? To examine this question, we implemented a cross-domain comparative design in Parkinson's disease (PD). Twenty-four patients and 25 controls performed emotion recognition tasks for music and spoken sentences. In music, patients had impaired recognition of happiness and peacefulness, and intact recognition of sadness and fear; this pattern was independent of general cognitive and perceptual abilities. In speech, patients had a small global impairment, which was significantly mediated by executive dysfunction. Hence, PD affected differently musical and prosodic emotions. This dissociation indicates that the mechanisms underlying the two domains are partly independent.
Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai
2017-01-01
Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705
Impact of personality on the cerebral processing of emotional prosody.
Brück, Carolin; Kreifelts, Benjamin; Kaza, Evangelia; Lotze, Martin; Wildgruber, Dirk
2011-09-01
While several studies have focused on identifying common brain mechanisms governing the decoding of emotional speech melody, interindividual variations in the cerebral processing of prosodic information, in comparison, have received only little attention to date: Albeit, for instance, differences in personality among individuals have been shown to modulate emotional brain responses, personality influences on the neural basis of prosody decoding have not been investigated systematically yet. Thus, the present study aimed at delineating relationships between interindividual differences in personality and hemodynamic responses evoked by emotional speech melody. To determine personality-dependent modulations of brain reactivity, fMRI activation patterns during the processing of emotional speech cues were acquired from 24 healthy volunteers and subsequently correlated with individual trait measures of extraversion and neuroticism obtained for each participant. Whereas correlation analysis did not indicate any link between brain activation and extraversion, strong positive correlations between measures of neuroticism and hemodynamic responses of the right amygdala, the left postcentral gyrus as well as medial frontal structures including the right anterior cingulate cortex emerged, suggesting that brain mechanisms mediating the decoding of emotional speech melody may vary depending on differences in neuroticism among individuals. Observed trait-specific modulations are discussed in the light of processing biases as well as differences in emotion control or task strategies which may be associated with the personality trait of neuroticism. Copyright © 2011 Elsevier Inc. All rights reserved.
Emotional reactivity and regulation in preschool-age children who stutter.
Ntourou, Katerina; Conture, Edward G; Walden, Tedra A
2013-09-01
This study experimentally investigated behavioral correlates of emotional reactivity and emotion regulation and their relation to speech (dis)fluency in preschool-age children who do (CWS) and do not (CWNS) stutter during emotion-eliciting conditions. Participants (18 CWS, 14 boys; 18 CWNS, 14 boys) completed two experimental tasks (1) a neutral ("apples and leaves in a transparent box," ALTB) and (2) a frustrating ("attractive toy in a transparent box," ATTB) task, both of which were followed by a narrative task. Dependent measures were emotional reactivity (positive affect, negative affect), emotion regulation (self-speech, distraction) exhibited during the ALTB and the ATTB tasks, percentage of stuttered disfluencies (SDs) and percentage of non-stuttered disfluencies (NSDs) produced during the narratives. Results indicated that preschool-age CWS exhibited significantly more negative emotion and more self-speech than preschool-age CWNS. For CWS only, emotion regulation behaviors (i.e., distraction, self-speech) during the experimental tasks were predictive of stuttered disfluencies during the subsequent narrative tasks. Furthermore, for CWS there was no relation between emotional processes and non-stuttered disfluencies, but CWNS's negative affect was significantly related to nonstuttered disfluencies. In general, present findings support the notion that emotional processes are associated with childhood stuttering. Specifically, findings are consistent with the notion that preschool-age CWS are more emotionally reactive than CWNS and that their self-speech regulatory attempts may be less than effective in modulating their emotions. The reader will be able to: (a) communicate the relevance of studying the role of emotion in developmental stuttering close to the onset of stuttering and (b) describe the main findings of the present study in relation to previous studies that have used different methodologies to investigate the role of emotion in developmental stuttering of young children who stutter. Copyright © 2013 Elsevier Inc. All rights reserved.
Effects of musical expertise on oscillatory brain activity in response to emotional sounds.
Nolden, Sophie; Rigoulot, Simon; Jolicoeur, Pierre; Armony, Jorge L
2017-08-01
Emotions can be conveyed through a variety of channels in the auditory domain, be it via music, non-linguistic vocalizations, or speech prosody. Moreover, recent studies suggest that expertise in one sound category can impact the processing of emotional sounds in other sound categories as they found that musicians process more efficiently emotional musical and vocal sounds than non-musicians. However, the neural correlates of these modulations, especially their time course, are not very well understood. Consequently, we focused here on how the neural processing of emotional information varies as a function of sound category and expertise of participants. Electroencephalogram (EEG) of 20 non-musicians and 17 musicians was recorded while they listened to vocal (speech and vocalizations) and musical sounds. The amplitude of EEG-oscillatory activity in the theta, alpha, beta, and gamma band was quantified and Independent Component Analysis (ICA) was used to identify underlying components of brain activity in each band. Category differences were found in theta and alpha bands, due to larger responses to music and speech than to vocalizations, and in posterior beta, mainly due to differential processing of speech. In addition, we observed greater activation in frontal theta and alpha for musicians than for non-musicians, as well as an interaction between expertise and emotional content of sounds in frontal alpha. The results reflect musicians' expertise in recognition of emotion-conveying music, which seems to also generalize to emotional expressions conveyed by the human voice, in line with previous accounts of effects of expertise on musical and vocal sounds processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Gordon, Karen A.; Papsin, Blake C.; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A.
2017-01-01
Objectives: Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Design: Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Results: Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Conclusions: Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation. PMID:28085739
Good, Arla; Gordon, Karen A; Papsin, Blake C; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A
Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation.
ERIC Educational Resources Information Center
Spangler, Sibylle M.; Schwarzer, Gudrun; Korell, Monika; Maier-Karius, Johanna
2010-01-01
Four experiments were conducted with 5- to 11-year-olds and adults to investigate whether facial identity, facial speech, emotional expression, and gaze direction are processed independently of or in interaction with one another. In a computer-based, speeded sorting task, participants sorted faces according to facial identity while disregarding…
Double Fourier analysis for Emotion Identification in Voiced Speech
NASA Astrophysics Data System (ADS)
Sierra-Sosa, D.; Bastidas, M.; Ortiz P., D.; Quintero, O. L.
2016-04-01
We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech. Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions. A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds. Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions. Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it. Finally features related with emotions in voiced speech are extracted and presented.
NASA Astrophysics Data System (ADS)
Anagnostopoulos, Christos Nikolaos; Vovoli, Eftichia
An emotion recognition framework based on sound processing could improve services in human-computer interaction. Various quantitative speech features obtained from sound processing of acting speech were tested, as to whether they are sufficient or not to discriminate between seven emotions. Multilayered perceptrons were trained to classify gender and emotions on the basis of a 24-input vector, which provide information about the prosody of the speaker over the entire sentence using statistics of sound features. Several experiments were performed and the results were presented analytically. Emotion recognition was successful when speakers and utterances were “known” to the classifier. However, severe misclassifications occurred during the utterance-independent framework. At least, the proposed feature vector achieved promising results for utterance-independent recognition of high- and low-arousal emotions.
Preschoolers' real-time coordination of vocal and facial emotional information.
Berman, Jared M J; Chambers, Craig G; Graham, Susan A
2016-02-01
An eye-tracking methodology was used to examine the time course of 3- and 5-year-olds' ability to link speech bearing different acoustic cues to emotion (i.e., happy-sounding, neutral, and sad-sounding intonation) to photographs of faces reflecting different emotional expressions. Analyses of saccadic eye movement patterns indicated that, for both 3- and 5-year-olds, sad-sounding speech triggered gaze shifts to a matching (sad-looking) face from the earliest moments of speech processing. However, it was not until approximately 800ms into a happy-sounding utterance that preschoolers began to use the emotional cues from speech to identify a matching (happy-looking) face. Complementary analyses based on conscious/controlled behaviors (children's explicit points toward the faces) indicated that 5-year-olds, but not 3-year-olds, could successfully match happy-sounding and sad-sounding vocal affect to a corresponding emotional face. Together, the findings clarify developmental patterns in preschoolers' implicit versus explicit ability to coordinate emotional cues across modalities and highlight preschoolers' greater sensitivity to sad-sounding speech as the auditory signal unfolds in time. Copyright © 2015 Elsevier Inc. All rights reserved.
Emotional speech comprehension in children and adolescents with autism spectrum disorders.
Le Sourn-Bissaoui, Sandrine; Aguert, Marc; Girard, Pauline; Chevreuil, Claire; Laval, Virginie
2013-01-01
We examined the understanding of emotional speech by children and adolescents with autism spectrum disorders (ASD). We predicted that they would have difficulty understanding emotional speech, not because of an emotional prosody processing impairment but because of problems drawing appropriate inferences, especially in multiple-cue environments. Twenty-six children and adolescents with ASD and 26 typically developing controls performed a computerized task featuring emotional prosody, either embedded in a discrepant context or without any context at all. They must identify the speaker's feeling. When the prosody was the sole cue, participants with ASD performed just as well as controls, relying on this cue to infer the speaker's intention. When the prosody was embedded in a discrepant context, both ASD and TD participants exhibited a contextual bias and a negativity bias. However ASD participants relied less on the emotional prosody than the controls when it was positive. We discuss these findings with respect to executive function and intermodal processing. After reading this article, the reader should be able to (1) describe the ASD participants pragmatic impairments, (2) explain why ASD participants did not have an emotional prosody processing impairment, and (3) explain why ASD participants had difficulty inferring the speaker's intention from emotional prosody in a discrepant situation. Copyright © 2013 Elsevier Inc. All rights reserved.
Exploring expressivity and emotion with artificial voice and speech technologies.
Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James
2013-10-01
Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.
Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang
2018-05-01
Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
2015-05-28
recognition is simpler and requires less computational resources compared to other inputs such as facial expressions . The Berlin database of Emotional ...Processing Magazine, IEEE, vol. 18, no. 1, pp. 32– 80, 2001. [15] K. R. Scherer, T. Johnstone, and G. Klasmeyer, “Vocal expression of emotion ...Network for Real-Time Speech- Emotion Recognition 5a. CONTRACT NUMBER IN-HOUSE 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 62788F 6. AUTHOR(S) Q
Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis.
Patel, Aniruddh D
2011-01-01
Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The "OPERA" hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.
Processing of prosodic changes in natural speech stimuli in school-age children.
Lindström, R; Lepistö, T; Makkonen, T; Kujala, T
2012-12-01
Speech prosody conveys information about important aspects of communication: the meaning of the sentence and the emotional state or intention of the speaker. The present study addressed processing of emotional prosodic changes in natural speech stimuli in school-age children (mean age 10 years) by recording the electroencephalogram, facial electromyography, and behavioral responses. The stimulus was a semantically neutral Finnish word uttered with four different emotional connotations: neutral, commanding, sad, and scornful. In the behavioral sound-discrimination task the reaction times were fastest for the commanding stimulus and longest for the scornful stimulus, and faster for the neutral than for the sad stimulus. EEG and EMG responses were measured during non-attentive oddball paradigm. Prosodic changes elicited a negative-going, fronto-centrally distributed neural response peaking at about 500 ms from the onset of the stimulus, followed by a fronto-central positive deflection, peaking at about 740 ms. For the commanding stimulus also a rapid negative deflection peaking at about 290 ms from stimulus onset was elicited. No reliable stimulus type specific rapid facial reactions were found. The results show that prosodic changes in natural speech stimuli activate pre-attentive neural change-detection mechanisms in school-age children. However, the results do not support the suggestion of automaticity of emotion specific facial muscle responses to non-attended emotional speech stimuli in children. Copyright © 2012 Elsevier B.V. All rights reserved.
Havas, David A; Chapp, Christopher B
2016-01-01
How does language influence the emotions and actions of large audiences? Functionally, emotions help address environmental uncertainty by constraining the body to support adaptive responses and social coordination. We propose emotions provide a similar function in language processing by constraining the mental simulation of language content to facilitate comprehension, and to foster alignment of mental states in message recipients. Consequently, we predicted that emotion-inducing language should be found in speeches specifically designed to create audience alignment - stump speeches of United States presidential candidates. We focused on phrases in the past imperfective verb aspect ("a bad economy was burdening us") that leave a mental simulation of the language content open-ended, and thus unconstrained, relative to past perfective sentences ("we were burdened by a bad economy"). As predicted, imperfective phrases appeared more frequently in stump versus comparison speeches, relative to perfective phrases. In a subsequent experiment, participants rated phrases from presidential speeches as more emotionally intense when written in the imperfective aspect compared to the same phrases written in the perfective aspect, particularly for sentences perceived as negative in valence. These findings are consistent with the notion that emotions have a role in constraining the comprehension of language, a role that may be used in communication with large audiences.
Neural Substrates of Processing Anger in Language: Contributions of Prosody and Semantics.
Castelluccio, Brian C; Myers, Emily B; Schuh, Jillian M; Eigsti, Inge-Marie
2016-12-01
Emotions are conveyed primarily through two channels in language: semantics and prosody. While many studies confirm the role of a left hemisphere network in processing semantic emotion, there has been debate over the role of the right hemisphere in processing prosodic emotion. Some evidence suggests a preferential role for the right hemisphere, and other evidence supports a bilateral model. The relative contributions of semantics and prosody to the overall processing of affect in language are largely unexplored. The present work used functional magnetic resonance imaging to elucidate the neural bases of processing anger conveyed by prosody or semantic content. Results showed a robust, distributed, bilateral network for processing angry prosody and a more modest left hemisphere network for processing angry semantics when compared to emotionally neutral stimuli. Findings suggest the nervous system may be more responsive to prosodic cues in speech than to the semantic content of speech.
Effects of social cognitive impairment on speech disorder in schizophrenia.
Docherty, Nancy M; McCleery, Amanda; Divilbiss, Marielle; Schumann, Emily B; Moe, Aubrey; Shakeel, Mohammed K
2013-05-01
Disordered speech in schizophrenia impairs social functioning because it impedes communication with others. Treatment approaches targeting this symptom have been limited by an incomplete understanding of its causes. This study examined the process underpinnings of speech disorder, assessed in terms of communication failure. Contributions of impairments in 2 social cognitive abilities, emotion perception and theory of mind (ToM), to speech disorder were assessed in 63 patients with schizophrenia or schizoaffective disorder and 21 nonpsychiatric participants, after controlling for the effects of verbal intelligence and impairments in basic language-related neurocognitive abilities. After removal of the effects of the neurocognitive variables, impairments in emotion perception and ToM each explained additional variance in speech disorder in the patients but not the controls. The neurocognitive and social cognitive variables, taken together, explained 51% of the variance in speech disorder in the patients. Schizophrenic disordered speech may be less a concomitant of "positive" psychotic process than of illness-related limitations in neurocognitive and social cognitive functioning.
Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina
2014-11-15
Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on an inter-hemispheric mechanism which exploits both a right-hemispheric sensitivity to pitch information and a left-hemispheric dominance in speech processing. Copyright © 2014 Elsevier Inc. All rights reserved.
On the recognition of emotional vocal expressions: motivations for a holistic approach.
Esposito, Anna; Esposito, Antonietta M
2012-10-01
Human beings seem to be able to recognize emotions from speech very well and information communication technology aims to implement machines and agents that can do the same. However, to be able to automatically recognize affective states from speech signals, it is necessary to solve two main technological problems. The former concerns the identification of effective and efficient processing algorithms capable of capturing emotional acoustic features from speech sentences. The latter focuses on finding computational models able to classify, with an approximation as good as human listeners, a given set of emotional states. This paper will survey these topics and provide some insights for a holistic approach to the automatic analysis, recognition and synthesis of affective states.
Hearing Feelings: Affective Categorization of Music and Speech in Alexithymia, an ERP Study
Goerlich, Katharina Sophia; Witteman, Jurriaan; Aleman, André; Martens, Sander
2011-01-01
Background Alexithymia, a condition characterized by deficits in interpreting and regulating feelings, is a risk factor for a variety of psychiatric conditions. Little is known about how alexithymia influences the processing of emotions in music and speech. Appreciation of such emotional qualities in auditory material is fundamental to human experience and has profound consequences for functioning in daily life. We investigated the neural signature of such emotional processing in alexithymia by means of event-related potentials. Methodology Affective music and speech prosody were presented as targets following affectively congruent or incongruent visual word primes in two conditions. In two further conditions, affective music and speech prosody served as primes and visually presented words with affective connotations were presented as targets. Thirty-two participants (16 male) judged the affective valence of the targets. We tested the influence of alexithymia on cross-modal affective priming and on N400 amplitudes, indicative of individual sensitivity to an affective mismatch between words, prosody, and music. Our results indicate that the affective priming effect for prosody targets tended to be reduced with increasing scores on alexithymia, while no behavioral differences were observed for music and word targets. At the electrophysiological level, alexithymia was associated with significantly smaller N400 amplitudes in response to affectively incongruent music and speech targets, but not to incongruent word targets. Conclusions Our results suggest a reduced sensitivity for the emotional qualities of speech and music in alexithymia during affective categorization. This deficit becomes evident primarily in situations in which a verbalization of emotional information is required. PMID:21573026
Evaluating deep learning architectures for Speech Emotion Recognition.
Fayek, Haytham M; Lech, Margaret; Cavedon, Lawrence
2017-08-01
Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances. Copyright © 2017 Elsevier Ltd. All rights reserved.
Drolet, Matthis; Schubotz, Ricarda I; Fischer, Julia
2013-06-01
Context has been found to have a profound effect on the recognition of social stimuli and correlated brain activation. The present study was designed to determine whether knowledge about emotional authenticity influences emotion recognition expressed through speech intonation. Participants classified emotionally expressive speech in an fMRI experimental design as sad, happy, angry, or fearful. For some trials, stimuli were cued as either authentic or play-acted in order to manipulate participant top-down belief about authenticity, and these labels were presented both congruently and incongruently to the emotional authenticity of the stimulus. Contrasting authentic versus play-acted stimuli during uncued trials indicated that play-acted stimuli spontaneously up-regulate activity in the auditory cortex and regions associated with emotional speech processing. In addition, a clear interaction effect of cue and stimulus authenticity showed up-regulation in the posterior superior temporal sulcus and the anterior cingulate cortex, indicating that cueing had an impact on the perception of authenticity. In particular, when a cue indicating an authentic stimulus was followed by a play-acted stimulus, additional activation occurred in the temporoparietal junction, probably pointing to increased load on perspective taking in such trials. While actual authenticity has a significant impact on brain activation, individual belief about stimulus authenticity can additionally modulate the brain response to differences in emotionally expressive speech.
Gender differences in the activation of inferior frontal cortex during emotional speech perception.
Schirmer, Annett; Zysset, Stefan; Kotz, Sonja A; Yves von Cramon, D
2004-03-01
We investigated the brain regions that mediate the processing of emotional speech in men and women by presenting positive and negative words that were spoken with happy or angry prosody. Hence, emotional prosody and word valence were either congruous or incongruous. We assumed that an fRMI contrast between congruous and incongruous presentations would reveal the structures that mediate the interaction of emotional prosody and word valence. The left inferior frontal gyrus (IFG) was more strongly activated in incongruous as compared to congruous trials. This difference in IFG activity was significantly larger in women than in men. Moreover, the congruence effect was significant in women whereas it only appeared as a tendency in men. As the left IFG has been repeatedly implicated in semantic processing, these findings are taken as evidence that semantic processing in women is more susceptible to influences from emotional prosody than is semantic processing in men. Moreover, the present data suggest that the left IFG mediates increased semantic processing demands imposed by an incongruence between emotional prosody and word valence.
Comparison of formant detection methods used in speech processing applications
NASA Astrophysics Data System (ADS)
Belean, Bogdan
2013-11-01
The paper describes time frequency representations of speech signal together with the formant significance in speech processing applications. Speech formants can be used in emotion recognition, sex discrimination or diagnosing different neurological diseases. Taking into account the various applications of formant detection in speech signal, two methods for detecting formants are presented. First, the poles resulted after a complex analysis of LPC coefficients are used for formants detection. The second approach uses the Kalman filter for formant prediction along the speech signal. Results are presented for both approaches on real life speech spectrograms. A comparison regarding the features of the proposed methods is also performed, in order to establish which method is more suitable in case of different speech processing applications.
Emotion to emotion speech conversion in phoneme level
NASA Astrophysics Data System (ADS)
Bulut, Murtaza; Yildirim, Serdar; Busso, Carlos; Lee, Chul Min; Kazemzadeh, Ebrahim; Lee, Sungbok; Narayanan, Shrikanth
2004-10-01
Having an ability to synthesize emotional speech can make human-machine interaction more natural in spoken dialogue management. This study investigates the effectiveness of prosodic and spectral modification in phoneme level on emotion-to-emotion speech conversion. The prosody modification is performed with the TD-PSOLA algorithm (Moulines and Charpentier, 1990). We also transform the spectral envelopes of source phonemes to match those of target phonemes using LPC-based spectral transformation approach (Kain, 2001). Prosodic speech parameters (F0, duration, and energy) for target phonemes are estimated from the statistics obtained from the analysis of an emotional speech database of happy, angry, sad, and neutral utterances collected from actors. Listening experiments conducted with native American English speakers indicate that the modification of prosody only or spectrum only is not sufficient to elicit targeted emotions. The simultaneous modification of both prosody and spectrum results in higher acceptance rates of target emotions, suggesting that not only modeling speech prosody but also modeling spectral patterns that reflect underlying speech articulations are equally important to synthesize emotional speech with good quality. We are investigating suprasegmental level modifications for further improvement in speech quality and expressiveness.
Random Deep Belief Networks for Recognizing Emotions from Speech Signals.
Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang
2017-01-01
Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
Random Deep Belief Networks for Recognizing Emotions from Speech Signals
Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang
2017-01-01
Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition. PMID:28356908
ERIC Educational Resources Information Center
De Jarnette, Glenda
Vertical and lateral integration are two important nervous system integrations that affect the development of oral behaviors. There are three progressions in the vertical integration process for speech nervous system development: R-complex speech (ritualistic, memorized expressions), limbic speech (emotional expressions), and cortical speech…
Lu, Xuejing; Ho, Hao Tam; Liu, Fang; Wu, Daxing; Thompson, William F
2015-01-01
Congenital amusia is a disorder that is known to affect the processing of musical pitch. Although individuals with amusia rarely show language deficits in daily life, a number of findings point to possible impairments in speech prosody that amusic individuals may compensate for by drawing on linguistic information. Using EEG, we investigated (1) whether the processing of speech prosody is impaired in amusia and (2) whether emotional linguistic information can compensate for this impairment. Twenty Chinese amusics and 22 matched controls were presented pairs of emotional words spoken with either statement or question intonation while their EEG was recorded. Their task was to judge whether the intonations were the same. Amusics exhibited impaired performance on the intonation-matching task for emotional linguistic information, as their performance was significantly worse than that of controls. EEG results showed a reduced N2 response to incongruent intonation pairs in amusics compared with controls, which likely reflects impaired conflict processing in amusia. However, our EEG results also indicated that amusics were intact in early sensory auditory processing, as revealed by a comparable N1 modulation in both groups. We propose that the impairment in discriminating speech intonation observed among amusic individuals may arise from an inability to access information extracted at early processing stages. This, in turn, could reflect a disconnection between low-level and high-level processing.
Lu, Xuejing; Ho, Hao Tam; Liu, Fang; Wu, Daxing; Thompson, William F.
2015-01-01
Background: Congenital amusia is a disorder that is known to affect the processing of musical pitch. Although individuals with amusia rarely show language deficits in daily life, a number of findings point to possible impairments in speech prosody that amusic individuals may compensate for by drawing on linguistic information. Using EEG, we investigated (1) whether the processing of speech prosody is impaired in amusia and (2) whether emotional linguistic information can compensate for this impairment. Method: Twenty Chinese amusics and 22 matched controls were presented pairs of emotional words spoken with either statement or question intonation while their EEG was recorded. Their task was to judge whether the intonations were the same. Results: Amusics exhibited impaired performance on the intonation-matching task for emotional linguistic information, as their performance was significantly worse than that of controls. EEG results showed a reduced N2 response to incongruent intonation pairs in amusics compared with controls, which likely reflects impaired conflict processing in amusia. However, our EEG results also indicated that amusics were intact in early sensory auditory processing, as revealed by a comparable N1 modulation in both groups. Conclusion: We propose that the impairment in discriminating speech intonation observed among amusic individuals may arise from an inability to access information extracted at early processing stages. This, in turn, could reflect a disconnection between low-level and high-level processing. PMID:25914659
Schaadt, Gesa; van der Meer, Elke; Pannekamp, Ann; Oberecker, Regine; Männel, Claudia
2018-01-17
During information processing, individuals benefit from bimodally presented input, as has been demonstrated for speech perception (i.e., printed letters and speech sounds) or the perception of emotional expressions (i.e., facial expression and voice tuning). While typically developing individuals show this bimodal benefit, school children with dyslexia do not. Currently, it is unknown whether the bimodal processing deficit in dyslexia also occurs for visual-auditory speech processing that is independent of reading and spelling acquisition (i.e., no letter-sound knowledge is required). Here, we tested school children with and without spelling problems on their bimodal perception of video-recorded mouth movements pronouncing syllables. We analyzed the event-related potential Mismatch Response (MMR) to visual-auditory speech information and compared this response to the MMR to monomodal speech information (i.e., auditory-only, visual-only). We found a reduced MMR with later onset to visual-auditory speech information in children with spelling problems compared to children without spelling problems. Moreover, when comparing bimodal and monomodal speech perception, we found that children without spelling problems showed significantly larger responses in the visual-auditory experiment compared to the visual-only response, whereas children with spelling problems did not. Our results suggest that children with dyslexia exhibit general difficulties in bimodal speech perception independently of letter-speech sound knowledge, as apparent in altered bimodal speech perception and lacking benefit from bimodal information. This general deficit in children with dyslexia may underlie the previously reported reduced bimodal benefit for letter-speech sound combinations and similar findings in emotion perception. Copyright © 2018 Elsevier Ltd. All rights reserved.
Sound frequency affects speech emotion perception: results from congenital amusia
Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche
2015-01-01
Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718
Aging Affects Identification of Vocal Emotions in Semantically Neutral Sentences
ERIC Educational Resources Information Center
Dupuis, Kate; Pichora-Fuller, M. Kathleen
2015-01-01
Purpose: The authors determined the accuracy of younger and older adults in identifying vocal emotions using the Toronto Emotional Speech Set (TESS; Dupuis & Pichora-Fuller, 2010a) and investigated the possible contributions of auditory acuity and suprathreshold processing to emotion identification accuracy. Method: In 2 experiments, younger…
Speech emotion recognition methods: A literature review
NASA Astrophysics Data System (ADS)
Basharirad, Babak; Moradhaseli, Mohammadreza
2017-10-01
Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.
Abdeltawwab, Mohamed M; Khater, Ahmed; El-Anwar, Mohammad W
2016-01-01
The combination of acoustic and electric stimulation as a way to enhance speech recognition performance in cochlear implant (CI) users has generated considerable interest in the recent years. The purpose of this study was to evaluate the bimodal advantage of the FS4 speech processing strategy in combination with hearing aids (HA) as a means to improve low-frequency resolution in CI patients. Nineteen postlingual CI adults were selected to participate in this study. All patients wore implants on one side and HA on the contralateral side with residual hearing. Monosyllabic word recognition, speech in noise, and emotion and talker identification were assessed using CI with fine structure processing/FS4 and high-definition continuous interleaved sampling strategies, HA alone, and a combination of CI and HA. The bimodal stimulation showed improvement in speech performance and emotion identification for the question/statement/order tasks, which was statistically significant compared to patients with CI alone, but there were no significant statistical differences in intragender talker discrimination and emotion identification for the happy/angry/neutral tasks. The poorest performance was obtained with HA only, and it was statistically significant compared to the other modalities. The bimodal stimulation showed enhanced speech performance in CI patients, and it improves the limitations provided by electric or acoustic stimulation alone. © 2016 S. Karger AG, Basel.
ERIC Educational Resources Information Center
Ben-David, Boaz M.; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H. H. M.
2016-01-01
Purpose: Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. Method: We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5…
Effects of emotion on different phoneme classes
NASA Astrophysics Data System (ADS)
Lee, Chul Min; Yildirim, Serdar; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Abe; Lee, Sungbok; Narayanan, Shrikanth
2004-10-01
This study investigates the effects of emotion on different phoneme classes using short-term spectral features. In the research on emotion in speech, most studies have focused on prosodic features of speech. In this study, based on the hypothesis that different emotions have varying effects on the properties of the different speech sounds, we investigate the usefulness of phoneme-class level acoustic modeling for automatic emotion classification. Hidden Markov models (HMM) based on short-term spectral features for five broad phonetic classes are used for this purpose using data obtained from recordings of two actresses. Each speaker produces 211 sentences with four different emotions (neutral, sad, angry, happy). Using the speech material we trained and compared the performances of two sets of HMM classifiers: a generic set of ``emotional speech'' HMMs (one for each emotion) and a set of broad phonetic-class based HMMs (vowel, glide, nasal, stop, fricative) for each emotion type considered. Comparison of classification results indicates that different phoneme classes were affected differently by emotional change and that the vowel sounds are the most important indicator of emotions in speech. Detailed results and their implications on the underlying speech articulation will be discussed.
Impact of human emotions on physiological characteristics
NASA Astrophysics Data System (ADS)
Partila, P.; Voznak, M.; Peterek, T.; Penhaker, M.; Novak, V.; Tovarek, J.; Mehic, Miralem; Vojtech, L.
2014-05-01
Emotional states of humans and their impact on physiological and neurological characteristics are discussed in this paper. This problem is the goal of many teams who have dealt with this topic. Nowadays, it is necessary to increase the accuracy of methods for obtaining information about correlations between emotional state and physiological changes. To be able to record these changes, we focused on two majority emotional states. Studied subjects were psychologically stimulated to neutral - calm and then to the stress state. Electrocardiography, Electroencephalography and blood pressure represented neurological and physiological samples that were collected during patient's stimulated conditions. Speech activity was recording during the patient was reading selected text. Feature extraction was calculated by speech processing operations. Classifier based on Gaussian Mixture Model was trained and tested using Mel-Frequency Cepstral Coefficients extracted from the patient's speech. All measurements were performed in a chamber with electromagnetic compatibility. The article discusses a method for determining the influence of stress emotional state on the human and his physiological and neurological changes.
Common cues to emotion in the dynamic facial expressions of speech and song.
Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline
2015-01-01
Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.
Emotion recognition from speech: tools and challenges
NASA Astrophysics Data System (ADS)
Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.
2015-05-01
Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.
Caballero-Morales, Santiago-Omar
2013-01-01
An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410
Identification of emotional intonation evaluated by fMRI.
Wildgruber, D; Riecker, A; Hertrich, I; Erb, M; Grodd, W; Ethofer, T; Ackermann, H
2005-02-15
During acoustic communication among human beings, emotional information can be expressed both by the propositional content of verbal utterances and by the modulation of speech melody (affective prosody). It is well established that linguistic processing is bound predominantly to the left hemisphere of the brain. By contrast, the encoding of emotional intonation has been assumed to depend specifically upon right-sided cerebral structures. However, prior clinical and functional imaging studies yielded discrepant data with respect to interhemispheric lateralization and intrahemispheric localization of brain regions contributing to processing of affective prosody. In order to delineate the cerebral network engaged in the perception of emotional tone, functional magnetic resonance imaging (fMRI) was performed during recognition of prosodic expressions of five different basic emotions (happy, sad, angry, fearful, and disgusted) and during phonetic monitoring of the same stimuli. As compared to baseline at rest, both tasks yielded widespread bilateral hemodynamic responses within frontal, temporal, and parietal areas, the thalamus, and the cerebellum. A comparison of the respective activation maps, however, revealed comprehension of affective prosody to be bound to a distinct right-hemisphere pattern of activation, encompassing posterior superior temporal sulcus (Brodmann Area [BA] 22), dorsolateral (BA 44/45), and orbitobasal (BA 47) frontal areas. Activation within left-sided speech areas, in contrast, was observed during the phonetic task. These findings indicate that partially distinct cerebral networks subserve processing of phonetic and intonational information during speech perception.
Study of acoustic correlates associate with emotional speech
NASA Astrophysics Data System (ADS)
Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth
2004-10-01
This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.
Affective Prosody Labeling in Youths with Bipolar Disorder or Severe Mood Dysregulation
ERIC Educational Resources Information Center
Deveney, Christen M.; Brotman, Melissa A.; Decker, Ann Marie; Pine, Daniel S.; Leibenluft, Ellen
2012-01-01
Background: Accurate identification of nonverbal emotional cues is essential to successful social interactions, yet most research is limited to emotional face expression labeling. Little research focuses on the processing of emotional prosody, or tone of verbal speech, in clinical populations. Methods: Using the Diagnostic Analysis of Nonverbal…
Some articulatory details of emotional speech
NASA Astrophysics Data System (ADS)
Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth
2005-09-01
Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.
Psychoacoustic cues to emotion in speech prosody and music.
Coutinho, Eduardo; Dibben, Nicola
2013-01-01
There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.
Wildgruber, D; Hertrich, I; Riecker, A; Erb, M; Anders, S; Grodd, W; Ackermann, H
2004-12-01
In addition to the propositional content of verbal utterances, significant linguistic and emotional information is conveyed by the tone of speech. To differentiate brain regions subserving processing of linguistic and affective aspects of intonation, discrimination of sentences differing in linguistic accentuation and emotional expressiveness was evaluated by functional magnetic resonance imaging. Both tasks yielded rightward lateralization of hemodynamic responses at the level of the dorsolateral frontal cortex as well as bilateral thalamic and temporal activation. Processing of linguistic and affective intonation, thus, seems to be supported by overlapping neural networks comprising partially right-sided brain regions. Comparison of hemodynamic activation during the two different tasks, however, revealed bilateral orbito-frontal responses restricted to the affective condition as opposed to activation of the left lateral inferior frontal gyrus confined to evaluation of linguistic intonation. These findings indicate that distinct frontal regions contribute to higher level processing of intonational information depending on its communicational function. In line with other components of language processing, discrimination of linguistic accentuation seems to be lateralized to the left inferior-lateral frontal region whereas bilateral orbito-frontal areas subserve evaluation of emotional expressiveness.
Degraded speech sound processing in a rat model of fragile X syndrome
Engineer, Crystal T.; Centanni, Tracy M.; Im, Kwok W.; Rahebi, Kimiya C.; Buell, Elizabeth P.; Kilgard, Michael P.
2014-01-01
Fragile X syndrome is the most common inherited form of intellectual disability and the leading genetic cause of autism. Impaired phonological processing in fragile X syndrome interferes with the development of language skills. Although auditory cortex responses are known to be abnormal in fragile X syndrome, it is not clear how these differences impact speech sound processing. This study provides the first evidence that the cortical representation of speech sounds is impaired in Fmr1 knockout rats, despite normal speech discrimination behavior. Evoked potentials and spiking activity in response to speech sounds, noise burst trains, and tones were significantly degraded in primary auditory cortex, anterior auditory field and the ventral auditory field. Neurometric analysis of speech evoked activity using a pattern classifier confirmed that activity in these fields contains significantly less information about speech sound identity in Fmr1 knockout rats compared to control rats. Responses were normal in the posterior auditory field, which is associated with sound localization. The greatest impairment was observed in the ventral auditory field, which is related to emotional regulation. Dysfunction in the ventral auditory field may contribute to poor emotional regulation in fragile X syndrome and may help explain the observation that later auditory evoked responses are more disturbed in fragile X syndrome compared to earlier responses. Rodent models of fragile X syndrome are likely to prove useful for understanding the biological basis of fragile X syndrome and for testing candidate therapies. PMID:24713347
ERP evidence for the recognition of emotional prosody through simulated cochlear implant strategies.
Agrawal, Deepashri; Timm, Lydia; Viola, Filipa Campos; Debener, Stefan; Büchner, Andreas; Dengler, Reinhard; Wittfoth, Matthias
2012-09-20
Emotionally salient information in spoken language can be provided by variations in speech melody (prosody) or by emotional semantics. Emotional prosody is essential to convey feelings through speech. In sensori-neural hearing loss, impaired speech perception can be improved by cochlear implants (CIs). Aim of this study was to investigate the performance of normal-hearing (NH) participants on the perception of emotional prosody with vocoded stimuli. Semantically neutral sentences with emotional (happy, angry and neutral) prosody were used. Sentences were manipulated to simulate two CI speech-coding strategies: the Advance Combination Encoder (ACE) and the newly developed Psychoacoustic Advanced Combination Encoder (PACE). Twenty NH adults were asked to recognize emotional prosody from ACE and PACE simulations. Performance was assessed using behavioral tests and event-related potentials (ERPs). Behavioral data revealed superior performance with original stimuli compared to the simulations. For simulations, better recognition for happy and angry prosody was observed compared to the neutral. Irrespective of simulated or unsimulated stimulus type, a significantly larger P200 event-related potential was observed for happy prosody after sentence onset than the other two emotions. Further, the amplitude of P200 was significantly more positive for PACE strategy use compared to the ACE strategy. Results suggested P200 peak as an indicator of active differentiation and recognition of emotional prosody. Larger P200 peak amplitude for happy prosody indicated importance of fundamental frequency (F0) cues in prosody processing. Advantage of PACE over ACE highlighted a privileged role of the psychoacoustic masking model in improving prosody perception. Taken together, the study emphasizes on the importance of vocoded simulation to better understand the prosodic cues which CI users may be utilizing.
Kotz, Sonja A; Dengler, Reinhard; Wittfoth, Matthias
2015-02-01
Emotional speech comprises of complex multimodal verbal and non-verbal information that allows deducting others' emotional states or thoughts in social interactions. While the neural correlates of verbal and non-verbal aspects and their interaction in emotional speech have been identified, there is very little evidence on how we perceive and resolve incongruity in emotional speech, and whether such incongruity extends to current concepts of task-specific prediction errors as a consequence of unexpected action outcomes ('negative surprise'). Here, we explored this possibility while participants listened to congruent and incongruent angry, happy or neutral utterances and categorized the expressed emotions by their verbal (semantic) content. Results reveal valence-specific incongruity effects: negative verbal content expressed in a happy tone of voice increased activation in the dorso-medial prefrontal cortex (dmPFC) extending its role from conflict moderation to appraisal of valence-specific conflict in emotional speech. Conversely, the caudate head bilaterally responded selectively to positive verbal content expressed in an angry tone of voice broadening previous accounts of the caudate head in linguistic control to moderating valence-specific control in emotional speech. Together, these results suggest that control structures of the human brain (dmPFC and subcompartments of the basal ganglia) impact emotional speech differentially when conflict arises. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Improving Understanding of Emotional Speech Acoustic Content
NASA Astrophysics Data System (ADS)
Tinnemore, Anna
Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.
Gold, Rinat; Gold, Azgad
2018-02-06
The purpose of this study was to examine the attitudes, feelings, and practice characteristics of speech-language pathologists (SLPs) in Israel regarding the subject of delivering bad news. One hundred and seventy-three Israeli SLPs answered an online survey. Respondents represented SLPs in Israel in all stages of vocational experience, with varying academic degrees, from a variety of employment settings. The survey addressed emotions involved in the process of delivering bad news, training on this subject, and background information of the respondents. Frequency distributions of the responses of the participants were determined, and Pearson correlations were computed to determine the relation between years of occupational experience and the following variables: frequency of delivering bad news, opinions regarding training, and emotions experienced during the process of bad news delivery. Our survey showed that bad news delivery is a task that most participants are confronted with from the very beginning of their careers. Participants regarded training in the subject of delivering bad news as important but, at the same time, reported receiving relatively little training on this subject. In addition, our survey showed that negative emotions are involved in the process of delivering bad news. Training SLPs on specific techniques is required for successfully delivering bad news. The emotional burden associated with breaking bad news in the field of speech-language pathology should be noticed and addressed.
Wang, Kun-Ching
2015-01-14
The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.
Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition
Rigoulot, Simon; Wassiliwizky, Eugen; Pell, Marc D.
2013-01-01
Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereas happiness and disgust are recognized relatively slowly; Pell and Kotz, 2011). To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated syllable-by-syllable from the offset rather than the onset of the stimulus). Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell and Kotz (2011). We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners' accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fear, anger, sadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgust were recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400–1200 ms time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech. PMID:23805115
Gregl, Ana; Kirigin, Marin; Bilać, Snjeiana; Sućeska Ligutić, Radojka; Jaksić, Nenad; Jakovljević, Miro
2014-09-01
This research aims to investigate differences in speech comprehension between children with specific language impairment (SLI) and their developmentally normal peers, and the relationship between speech comprehension and emotional/behavioral problems on Achenbach's Child Behavior Checklist (CBCL) and Caregiver Teacher's Report Form (C-TRF) according to the DSMIV The clinical sample comprised 97preschool children with SLI, while the peer sample comprised 60 developmentally normal preschool children. Children with SLI had significant delays in speech comprehension and more emotional/behavioral problems than peers. In children with SLI, speech comprehension significantly correlated with scores on Attention Deficit/Hyperactivity Problems (CBCL and C-TRF), and Pervasive Developmental Problems scales (CBCL)(p<0.05). In the peer sample, speech comprehension significantly correlated with scores on Affective Problems and Attention Deficit/Hyperactivity Problems (C-TRF) scales. Regression analysis showed that 12.8% of variance in speech comprehension is saturated with 5 CBCL variables, of which Attention Deficit/Hyperactivity (beta = -0.281) and Pervasive Developmental Problems (beta = -0.280) are statistically significant (p < 0.05). In the reduced regression model Attention Deficit/Hyperactivity explains 7.3% of the variance in speech comprehension, (beta = -0.270, p < 0.01). It is possible that, to a certain degree, the same neurodevelopmental process lies in the background of problems with speech comprehension, problems with attention and hyperactivity, and pervasive developmental problems. This study confirms the importance of triage for behavioral problems and attention training in the rehabilitation of children with SLI and children with normal language development that exhibit ADHD symptoms.
Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali
2015-01-01
In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141
Advances in natural language processing.
Hirschberg, Julia; Manning, Christopher D
2015-07-17
Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.
Multimodal human communication--targeting facial expressions, speech content and prosody.
Regenbogen, Christina; Schneider, Daniel A; Gur, Raquel E; Schneider, Frank; Habel, Ute; Kellermann, Thilo
2012-05-01
Human communication is based on a dynamic information exchange of the communication channels facial expressions, prosody, and speech content. This fMRI study elucidated the impact of multimodal emotion processing and the specific contribution of each channel on behavioral empathy and its prerequisites. Ninety-six video clips displaying actors who told self-related stories were presented to 27 healthy participants. In two conditions, all channels uniformly transported only emotional or neutral information. Three conditions selectively presented two emotional channels and one neutral channel. Subjects indicated the actors' emotional valence and their own while fMRI was recorded. Activation patterns of tri-channel emotional communication reflected multimodal processing and facilitative effects for empathy. Accordingly, subjects' behavioral empathy rates significantly deteriorated once one source was neutral. However, emotionality expressed via two of three channels yielded activation in a network associated with theory-of-mind-processes. This suggested participants' effort to infer mental states of their counterparts and was accompanied by a decline of behavioral empathy, driven by the participants' emotional responses. Channel-specific emotional contributions were present in modality-specific areas. The identification of different network-nodes associated with human interactions constitutes a prerequisite for understanding dynamics that underlie multimodal integration and explain the observed decline in empathy rates. This task might also shed light on behavioral deficits and neural changes that accompany psychiatric diseases. Copyright © 2012 Elsevier Inc. All rights reserved.
Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech
NASA Astrophysics Data System (ADS)
Přibil, J.; Přibilová, A.
2009-01-01
The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.
Second Language Ability and Emotional Prosody Perception
Bhatara, Anjali; Laukka, Petri; Boll-Avetisyan, Natalie; Granjon, Lionel; Anger Elfenbein, Hillary; Bänziger, Tanja
2016-01-01
The present study examines the effect of language experience on vocal emotion perception in a second language. Native speakers of French with varying levels of self-reported English ability were asked to identify emotions from vocal expressions produced by American actors in a forced-choice task, and to rate their pleasantness, power, alertness and intensity on continuous scales. Stimuli included emotionally expressive English speech (emotional prosody) and non-linguistic vocalizations (affect bursts), and a baseline condition with Swiss-French pseudo-speech. Results revealed effects of English ability on the recognition of emotions in English speech but not in non-linguistic vocalizations. Specifically, higher English ability was associated with less accurate identification of positive emotions, but not with the interpretation of negative emotions. Moreover, higher English ability was associated with lower ratings of pleasantness and power, again only for emotional prosody. This suggests that second language skills may sometimes interfere with emotion recognition from speech prosody, particularly for positive emotions. PMID:27253326
Common cues to emotion in the dynamic facial expressions of speech and song
Livingstone, Steven R.; Thompson, William F.; Wanderley, Marcelo M.; Palmer, Caroline
2015-01-01
Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech–song differences. Vocalists’ jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech–song. Vocalists’ emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists’ facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production. PMID:25424388
Wang, Kun-Ching
2015-01-01
The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590
Emotion Analysis of Telephone Complaints from Customer Based on Affective Computing.
Gong, Shuangping; Dai, Yonghui; Ji, Jun; Wang, Jinzhao; Sun, Hai
2015-01-01
Customer complaint has been the important feedback for modern enterprises to improve their product and service quality as well as the customer's loyalty. As one of the commonly used manners in customer complaint, telephone communication carries rich emotional information of speeches, which provides valuable resources for perceiving the customer's satisfaction and studying the complaint handling skills. This paper studies the characteristics of telephone complaint speeches and proposes an analysis method based on affective computing technology, which can recognize the dynamic changes of customer emotions from the conversations between the service staff and the customer. The recognition process includes speaker recognition, emotional feature parameter extraction, and dynamic emotion recognition. Experimental results show that this method is effective and can reach high recognition rates of happy and angry states. It has been successfully applied to the operation quality and service administration in telecom and Internet service company.
Emotion Analysis of Telephone Complaints from Customer Based on Affective Computing
Gong, Shuangping; Ji, Jun; Wang, Jinzhao; Sun, Hai
2015-01-01
Customer complaint has been the important feedback for modern enterprises to improve their product and service quality as well as the customer's loyalty. As one of the commonly used manners in customer complaint, telephone communication carries rich emotional information of speeches, which provides valuable resources for perceiving the customer's satisfaction and studying the complaint handling skills. This paper studies the characteristics of telephone complaint speeches and proposes an analysis method based on affective computing technology, which can recognize the dynamic changes of customer emotions from the conversations between the service staff and the customer. The recognition process includes speaker recognition, emotional feature parameter extraction, and dynamic emotion recognition. Experimental results show that this method is effective and can reach high recognition rates of happy and angry states. It has been successfully applied to the operation quality and service administration in telecom and Internet service company. PMID:26633967
Hagan, Cindy C; Woods, Will; Johnson, Sam; Calder, Andrew J; Green, Gary G R; Young, Andrew W
2009-11-24
An influential neural model of face perception suggests that the posterior superior temporal sulcus (STS) is sensitive to those aspects of faces that produce transient visual changes, including facial expression. Other researchers note that recognition of expression involves multiple sensory modalities and suggest that the STS also may respond to crossmodal facial signals that change transiently. Indeed, many studies of audiovisual (AV) speech perception show STS involvement in AV speech integration. Here we examine whether these findings extend to AV emotion. We used magnetoencephalography to measure the neural responses of participants as they viewed and heard emotionally congruent fear and minimally congruent neutral face and voice stimuli. We demonstrate significant supra-additive responses (i.e., where AV > [unimodal auditory + unimodal visual]) in the posterior STS within the first 250 ms for emotionally congruent AV stimuli. These findings show a role for the STS in processing crossmodal emotive signals.
Brain Response to a Humanoid Robot in Areas Implicated in the Perception of Human Emotional Gestures
Chaminade, Thierry; Zecca, Massimiliano; Blakemore, Sarah-Jayne; Takanishi, Atsuo; Frith, Chris D.; Micera, Silvestro; Dario, Paolo; Rizzolatti, Giacomo; Gallese, Vittorio; Umiltà, Maria Alessandra
2010-01-01
Background The humanoid robot WE4-RII was designed to express human emotions in order to improve human-robot interaction. We can read the emotions depicted in its gestures, yet might utilize different neural processes than those used for reading the emotions in human agents. Methodology Here, fMRI was used to assess how brain areas activated by the perception of human basic emotions (facial expression of Anger, Joy, Disgust) and silent speech respond to a humanoid robot impersonating the same emotions, while participants were instructed to attend either to the emotion or to the motion depicted. Principal Findings Increased responses to robot compared to human stimuli in the occipital and posterior temporal cortices suggest additional visual processing when perceiving a mechanical anthropomorphic agent. In contrast, activity in cortical areas endowed with mirror properties, like left Broca's area for the perception of speech, and in the processing of emotions like the left anterior insula for the perception of disgust and the orbitofrontal cortex for the perception of anger, is reduced for robot stimuli, suggesting lesser resonance with the mechanical agent. Finally, instructions to explicitly attend to the emotion significantly increased response to robot, but not human facial expressions in the anterior part of the left inferior frontal gyrus, a neural marker of motor resonance. Conclusions Motor resonance towards a humanoid robot, but not a human, display of facial emotion is increased when attention is directed towards judging emotions. Significance Artificial agents can be used to assess how factors like anthropomorphism affect neural response to the perception of human actions. PMID:20657777
Discharge experiences of speech-language pathologists working in Cyprus and Greece.
Kambanaros, Maria
2010-08-01
Post-termination relationships are complex because the client may need additional services and it may be difficult to determine when the speech-language pathologist-client relationship is truly terminated. In my contribution to this scientific forum, discharge experiences from speech-language pathologists working in Cyprus and Greece will be explored in search of commonalities and differences in the way in which pathologists end therapy from different cultural perspectives. Within this context the personal impact on speech-language pathologists of the discharge process will be highlighted. Inherent in this process is how speech-language pathologists learn to hold their feelings, anxieties and reactions when communicating discharge to clients. Overall speech-language pathologists working in Cyprus and Greece experience similar emotional responses to positive and negative therapy endings as speech-language pathologists working in Australia. The major difference is that Cypriot and Greek therapists face serious limitations in moving their clients on after therapy has ended.
Human emotions track changes in the acoustic environment.
Ma, Weiyi; Thompson, William Forde
2015-11-24
Emotional responses to biologically significant events are essential for human survival. Do human emotions lawfully track changes in the acoustic environment? Here we report that changes in acoustic attributes that are well known to interact with human emotions in speech and music also trigger systematic emotional responses when they occur in environmental sounds, including sounds of human actions, animal calls, machinery, or natural phenomena, such as wind and rain. Three changes in acoustic attributes known to signal emotional states in speech and music were imposed upon 24 environmental sounds. Evaluations of stimuli indicated that human emotions track such changes in environmental sounds just as they do for speech and music. Such changes not only influenced evaluations of the sounds themselves, they also affected the way accompanying facial expressions were interpreted emotionally. The findings illustrate that human emotions are highly attuned to changes in the acoustic environment, and reignite a discussion of Charles Darwin's hypothesis that speech and music originated from a common emotional signal system based on the imitation and modification of environmental sounds.
Head movements encode emotions during speech and song.
Livingstone, Steven R; Palmer, Caroline
2016-04-01
When speaking or singing, vocalists often move their heads in an expressive fashion, yet the influence of emotion on vocalists' head motion is unknown. Using a comparative speech/song task, we examined whether vocalists' intended emotions influence head movements and whether those movements influence the perceived emotion. In Experiment 1, vocalists were recorded with motion capture while speaking and singing each statement with different emotional intentions (very happy, happy, neutral, sad, very sad). Functional data analyses showed that head movements differed in translational and rotational displacement across emotional intentions, yet were similar across speech and song, transcending differences in F0 (varied freely in speech, fixed in song) and lexical variability. Head motion specific to emotional state occurred before and after vocalizations, as well as during sound production, confirming that some aspects of movement were not simply a by-product of sound production. In Experiment 2, observers accurately identified vocalists' intended emotion on the basis of silent, face-occluded videos of head movements during speech and song. These results provide the first evidence that head movements encode a vocalist's emotional intent and that observers decode emotional information from these movements. We discuss implications for models of head motion during vocalizations and applied outcomes in social robotics and automated emotion recognition. (c) 2016 APA, all rights reserved).
Cespedes-Guevara, Julian; Eerola, Tuomas
2018-01-01
Basic Emotion theory has had a tremendous influence on the affective sciences, including music psychology, where most researchers have assumed that music expressivity is constrained to a limited set of basic emotions. Several scholars suggested that these constrains to musical expressivity are explained by the existence of a shared acoustic code to the expression of emotions in music and speech prosody. In this article we advocate for a shift from this focus on basic emotions to a constructionist account. This approach proposes that the phenomenon of perception of emotions in music arises from the interaction of music’s ability to express core affects and the influence of top-down and contextual information in the listener’s mind. We start by reviewing the problems with the concept of Basic Emotions, and the inconsistent evidence that supports it. We also demonstrate how decades of developmental and cross-cultural research on music and emotional speech have failed to produce convincing findings to conclude that music expressivity is built upon a set of biologically pre-determined basic emotions. We then examine the cue-emotion consistencies between music and speech, and show how they support a parsimonious explanation, where musical expressivity is grounded on two dimensions of core affect (arousal and valence). Next, we explain how the fact that listeners reliably identify basic emotions in music does not arise from the existence of categorical boundaries in the stimuli, but from processes that facilitate categorical perception, such as using stereotyped stimuli and close-ended response formats, psychological processes of construction of mental prototypes, and contextual information. Finally, we outline our proposal of a constructionist account of perception of emotions in music, and spell out the ways in which this approach is able to make solve past conflicting findings. We conclude by providing explicit pointers about the methodological choices that will be vital to move beyond the popular Basic Emotion paradigm and start untangling the emergence of emotional experiences with music in the actual contexts in which they occur. PMID:29541041
Cespedes-Guevara, Julian; Eerola, Tuomas
2018-01-01
Basic Emotion theory has had a tremendous influence on the affective sciences, including music psychology, where most researchers have assumed that music expressivity is constrained to a limited set of basic emotions. Several scholars suggested that these constrains to musical expressivity are explained by the existence of a shared acoustic code to the expression of emotions in music and speech prosody. In this article we advocate for a shift from this focus on basic emotions to a constructionist account. This approach proposes that the phenomenon of perception of emotions in music arises from the interaction of music's ability to express core affects and the influence of top-down and contextual information in the listener's mind. We start by reviewing the problems with the concept of Basic Emotions, and the inconsistent evidence that supports it. We also demonstrate how decades of developmental and cross-cultural research on music and emotional speech have failed to produce convincing findings to conclude that music expressivity is built upon a set of biologically pre-determined basic emotions. We then examine the cue-emotion consistencies between music and speech, and show how they support a parsimonious explanation, where musical expressivity is grounded on two dimensions of core affect (arousal and valence). Next, we explain how the fact that listeners reliably identify basic emotions in music does not arise from the existence of categorical boundaries in the stimuli, but from processes that facilitate categorical perception, such as using stereotyped stimuli and close-ended response formats, psychological processes of construction of mental prototypes, and contextual information. Finally, we outline our proposal of a constructionist account of perception of emotions in music, and spell out the ways in which this approach is able to make solve past conflicting findings. We conclude by providing explicit pointers about the methodological choices that will be vital to move beyond the popular Basic Emotion paradigm and start untangling the emergence of emotional experiences with music in the actual contexts in which they occur.
Fuller, Christina D.; Galvin, John J.; Maat, Bert; Free, Rolien H.; Başkent, Deniz
2014-01-01
Cochlear implants (CIs) are auditory prostheses that restore hearing via electrical stimulation of the auditory nerve. Compared to normal acoustic hearing, sounds transmitted through the CI are spectro-temporally degraded, causing difficulties in challenging listening tasks such as speech intelligibility in noise and perception of music. In normal hearing (NH), musicians have been shown to better perform than non-musicians in auditory processing and perception, especially for challenging listening tasks. This “musician effect” was attributed to better processing of pitch cues, as well as better overall auditory cognitive functioning in musicians. Does the musician effect persist when pitch cues are degraded, as it would be in signals transmitted through a CI? To answer this question, NH musicians and non-musicians were tested while listening to unprocessed signals or to signals processed by an acoustic CI simulation. The task increasingly depended on pitch perception: (1) speech intelligibility (words and sentences) in quiet or in noise, (2) vocal emotion identification, and (3) melodic contour identification (MCI). For speech perception, there was no musician effect with the unprocessed stimuli, and a small musician effect only for word identification in one noise condition, in the CI simulation. For emotion identification, there was a small musician effect for both. For MCI, there was a large musician effect for both. Overall, the effect was stronger as the importance of pitch in the listening task increased. This suggests that the musician effect may be more rooted in pitch perception, rather than in a global advantage in cognitive processing (in which musicians would have performed better in all tasks). The results further suggest that musical training before (and possibly after) implantation might offer some advantage in pitch processing that could partially benefit speech perception, and more strongly emotion and music perception. PMID:25071428
Dmitrieva, E S; Gel'man, V Ia
2011-01-01
The listener-distinctive features of recognition of different emotional intonations (positive, negative and neutral) of male and female speakers in the presence or absence of background noise were studied in 49 adults aged 20-79 years. In all the listeners noise produced the most pronounced decrease in recognition accuracy for positive emotional intonation ("joy") as compared to other intonations, whereas it did not influence the recognition accuracy of "anger" in 65-79-year-old listeners. The higher emotion recognition rates of a noisy signal were observed for speech emotional intonations expressed by female speakers. Acoustic characteristics of noisy and clear speech signals underlying perception of speech emotional prosody were found for adult listeners of different age and gender.
Chaspari, Theodora; Soldatos, Constantin; Maragos, Petros
2015-01-01
The development of ecologically valid procedures for collecting reliable and unbiased emotional data towards computer interfaces with social and affective intelligence targeting patients with mental disorders. Following its development, presented with, the Athens Emotional States Inventory (AESI) proposes the design, recording and validation of an audiovisual database for five emotional states: anger, fear, joy, sadness and neutral. The items of the AESI consist of sentences each having content indicative of the corresponding emotion. Emotional content was assessed through a survey of 40 young participants with a questionnaire following the Latin square design. The emotional sentences that were correctly identified by 85% of the participants were recorded in a soundproof room with microphones and cameras. A preliminary validation of AESI is performed through automatic emotion recognition experiments from speech. The resulting database contains 696 recorded utterances in Greek language by 20 native speakers and has a total duration of approximately 28 min. Speech classification results yield accuracy up to 75.15% for automatically recognizing the emotions in AESI. These results indicate the usefulness of our approach for collecting emotional data with reliable content, balanced across classes and with reduced environmental variability.
A cross-linguistic fMRI study of perception of intonation and emotion in Chinese.
Gandour, Jack; Wong, Donald; Dzemidzic, Mario; Lowe, Mark; Tong, Yunxia; Li, Xiaojian
2003-03-01
Conflicting data from neurobehavioral studies of the perception of intonation (linguistic) and emotion (affective) in spoken language highlight the need to further examine how functional attributes of prosodic stimuli are related to hemispheric differences in processing capacity. Because of similarities in their acoustic profiles, intonation and emotion permit us to assess to what extent hemispheric lateralization of speech prosody depends on functional instead of acoustical properties. To examine how the brain processes linguistic and affective prosody, an fMRI study was conducted using Chinese, a tone language in which both intonation and emotion may be signaled prosodically, in addition to lexical tones. Ten Chinese and 10 English subjects were asked to perform discrimination judgments of intonation (I: statement, question) and emotion (E: happy, angry, sad) presented in semantically neutral Chinese sentences. A baseline task required passive listening to the same speech stimuli (S). In direct between-group comparisons, the Chinese group showed left-sided frontoparietal activation for both intonation (I vs. S) and emotion (E vs. S) relative to baseline. When comparing intonation relative to emotion (I vs. E), the Chinese group demonstrated prefrontal activation bilaterally; parietal activation in the left hemisphere only. The reverse comparison (E vs. I), on the other hand, revealed that activation occurred in anterior and posterior prefrontal regions of the right hemisphere only. These findings show that some aspects of perceptual processing of emotion are dissociable from intonation, and, moreover, that they are mediated by the right hemisphere. Copyright 2003 Wiley-Liss, Inc.
The minor third communicates sadness in speech, mirroring its use in music.
Curtis, Meagan E; Bharucha, Jamshed J
2010-06-01
There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.
Niedtfeld, Inga; Defiebre, Nadine; Regenbogen, Christina; Mier, Daniela; Fenske, Sabrina; Kirsch, Peter; Lis, Stefanie; Schmahl, Christian
2017-04-01
Previous research has revealed alterations and deficits in facial emotion recognition in patients with borderline personality disorder (BPD). During interpersonal communication in daily life, social signals such as speech content, variation in prosody, and facial expression need to be considered simultaneously. We hypothesized that deficits in higher level integration of social stimuli contribute to difficulties in emotion recognition in BPD, and heightened arousal might explain this effect. Thirty-one patients with BPD and thirty-one healthy controls were asked to identify emotions in short video clips, which were designed to represent different combinations of the three communication channels: facial expression, speech content, and prosody. Skin conductance was recorded as a measure of sympathetic arousal, while controlling for state dissociation. Patients with BPD showed lower mean accuracy scores than healthy control subjects in all conditions comprising emotional facial expressions. This was true for the condition with facial expression only, and for the combination of all three communication channels. Electrodermal responses were enhanced in BPD only in response to auditory stimuli. In line with the major body of facial emotion recognition studies, we conclude that deficits in the interpretation of facial expressions lead to the difficulties observed in multimodal emotion processing in BPD.
[Perception features of emotional intonation of short pseudowords].
Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M
2012-01-01
Reaction time and recognition accuracy of speech emotional intonations in short meaningless words that differed only in one phoneme with background noise and without it were studied in 49 adults of 20-79 years old. The results were compared with the same parameters of emotional intonations in intelligent speech utterances under similar conditions. Perception of emotional intonations at different linguistic levels (phonological and lexico-semantic) was found to have both common features and certain peculiarities. Recognition characteristics of emotional intonations depending on gender and age of listeners appeared to be invariant with regard to linguistic levels of speech stimuli. Phonemic composition of pseudowords was found to influence the emotional perception, especially against the background noise. The most significant stimuli acoustic characteristic responsible for the perception of speech emotional prosody in short meaningless words under the two experimental conditions, i.e. with and without background noise, was the fundamental frequency variation.
Multi-function robots with speech interaction and emotion feedback
NASA Astrophysics Data System (ADS)
Wang, Hongyu; Lou, Guanting; Ma, Mengchao
2018-03-01
Nowadays, the service robots have been applied in many public circumstances; however, most of them still don’t have the function of speech interaction, especially the function of speech-emotion interaction feedback. To make the robot more humanoid, Arduino microcontroller was used in this study for the speech recognition module and servo motor control module to achieve the functions of the robot’s speech interaction and emotion feedback. In addition, W5100 was adopted for network connection to achieve information transmission via Internet, providing broad application prospects for the robot in the area of Internet of Things (IoT).
Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M
2009-01-01
Comparative study of acoustic correlates of emotional intonation was conducted on two types of speech material: sensible speech utterances and short meaningless words. The corpus of speech signals of different emotional intonations (happy, angry, frightened, sad and neutral) was created using the actor's method of simulation of emotions. Native Russian 20-70-year-old speakers (both professional actors and non-actors) participated in the study. In the corpus, the following characteristics were analyzed: mean values and standard deviations of the power, fundamental frequency, frequencies of the first and second formants, and utterance duration. Comparison of each emotional intonation with "neutral" utterances showed the greatest deviations of the fundamental frequency and frequencies of the first formant. The direction of these deviations was independent of the semantic content of speech utterance and its duration, age, gender, and being actor or non-actor, though the personal features of the speakers affected the absolute values of these frequencies.
Santesso, Diane L; Schmidt, Louis A; Trainor, Laurel J
2007-10-01
Many studies have shown that infants prefer infant-directed (ID) speech to adult-directed (AD) speech. ID speech functions to aid language learning, obtain and/or maintain an infant's attention, and create emotional communication between the infant and caregiver. We examined psychophysiological responses to ID speech that varied in affective content (i.e., love/comfort, surprise, fear) in a group of typically developing 9-month-old infants. Regional EEG and heart rate were collected continuously during stimulus presentation. We found the pattern of overall frontal EEG power was linearly related to affective intensity of the ID speech, such that EEG power was greatest in response to fear, than surprise than love/comfort; this linear pattern was specific to the frontal region. We also noted that heart rate decelerated to ID speech independent of affective content. As well, infants who were reported by their mothers as temperamentally distressed tended to exhibit greater relative right frontal EEG activity during baseline and in response to affective ID speech, consistent with previous work with visual stimuli and extending it to the auditory modality. Findings are discussed in terms of how increases in frontal EEG power in response to different affective intensity may reflect the cognitive aspects of emotional processing across sensory domains in infancy.
Coutinho, Eduardo; Schuller, Björn
2017-01-01
Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies-the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain.
Human emotions track changes in the acoustic environment
Ma, Weiyi; Thompson, William Forde
2015-01-01
Emotional responses to biologically significant events are essential for human survival. Do human emotions lawfully track changes in the acoustic environment? Here we report that changes in acoustic attributes that are well known to interact with human emotions in speech and music also trigger systematic emotional responses when they occur in environmental sounds, including sounds of human actions, animal calls, machinery, or natural phenomena, such as wind and rain. Three changes in acoustic attributes known to signal emotional states in speech and music were imposed upon 24 environmental sounds. Evaluations of stimuli indicated that human emotions track such changes in environmental sounds just as they do for speech and music. Such changes not only influenced evaluations of the sounds themselves, they also affected the way accompanying facial expressions were interpreted emotionally. The findings illustrate that human emotions are highly attuned to changes in the acoustic environment, and reignite a discussion of Charles Darwin’s hypothesis that speech and music originated from a common emotional signal system based on the imitation and modification of environmental sounds. PMID:26553987
Comparison of Classification Methods for Detecting Emotion from Mandarin Speech
NASA Astrophysics Data System (ADS)
Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng
It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.
Philippot, Pierre; Vrielynck, Nathalie; Muller, Valérie
2010-12-01
The present study examined the impact of different modes of processing anxious apprehension on subsequent anxiety and performance in a stressful speech task. Participants were informed that they would have to give a speech on a difficult topic while being videotaped and evaluated on their performance. They were then randomly assigned to one of three conditions. In a specific processing condition, they were encouraged to explore in detail all the specific aspects (thoughts, emotions, sensations) they experienced while anticipating giving the speech; in a general processing condition, they had to focus on the generic aspects that they would typically experience during anxious anticipation; and in a control, no-processing condition, participants were distracted. Results revealed that at the end of the speech, participants in the specific processing condition reported less anxiety than those in the two other conditions. They were also evaluated by judges to have performed better than those in the control condition, who in turn did better than those in the general processing condition. Copyright © 2010. Published by Elsevier Ltd.
Disentangling the brain networks supporting affective speech comprehension.
Hervé, Pierre-Yves; Razafimandimby, Annick; Vigneau, Mathieu; Mazoyer, Bernard; Tzourio-Mazoyer, Nathalie
2012-07-16
Areas involved in social cognition, such as the medial prefrontal cortex (mPFC) and the left temporo-parietal junction (TPJ) appear to be active during the classification of sentences according to emotional criteria (happy, angry or sad, [Beaucousin et al., 2007]). These two regions are frequently co-activated in studies about theory of mind (ToM). To confirm that these regions constitute a coherent network during affective speech comprehension, new event-related functional magnetic resonance imaging data were acquired, using the emotional and grammatical-person sentence classification tasks on a larger sample of 51 participants. The comparison of the emotional and grammatical tasks confirmed the previous findings. Functional connectivity analyses established a clear demarcation between a "Medial" network, including the mPFC and TPJ regions, and a bilateral "Language" network, which gathered inferior frontal and temporal areas. These findings suggest that emotional speech comprehension results from interactions between language, ToM and emotion processing networks. The language network, active during both tasks, would be involved in the extraction of lexical and prosodic emotional cues, while the medial network, active only during the emotional task, would drive the making of inferences about the sentences' emotional content, based on their meanings. The left and right amygdalae displayed a stronger response during the emotional condition, but were seldom correlated with the other regions, and thus formed a third entity. Finally, distinct regions belonging to the Language and Medial networks were found in the left angular gyrus, where these two systems could interface. Copyright © 2012 Elsevier Inc. All rights reserved.
Adaptation to Vocal Expressions Reveals Multistep Perception of Auditory Emotion
Maurage, Pierre; Rouger, Julien; Latinus, Marianne; Belin, Pascal
2014-01-01
The human voice carries speech as well as important nonlinguistic signals that influence our social interactions. Among these cues that impact our behavior and communication with other people is the perceived emotional state of the speaker. A theoretical framework for the neural processing stages of emotional prosody has suggested that auditory emotion is perceived in multiple steps (Schirmer and Kotz, 2006) involving low-level auditory analysis and integration of the acoustic information followed by higher-level cognition. Empirical evidence for this multistep processing chain, however, is still sparse. We examined this question using functional magnetic resonance imaging and a continuous carry-over design (Aguirre, 2007) to measure brain activity while volunteers listened to non-speech-affective vocalizations morphed on a continuum between anger and fear. Analyses dissociated neuronal adaptation effects induced by similarity in perceived emotional content between consecutive stimuli from those induced by their acoustic similarity. We found that bilateral voice-sensitive auditory regions as well as right amygdala coded the physical difference between consecutive stimuli. In contrast, activity in bilateral anterior insulae, medial superior frontal cortex, precuneus, and subcortical regions such as bilateral hippocampi depended predominantly on the perceptual difference between morphs. Our results suggest that the processing of vocal affect recognition is a multistep process involving largely distinct neural networks. Amygdala and auditory areas predominantly code emotion-related acoustic information while more anterior insular and prefrontal regions respond to the abstract, cognitive representation of vocal affect. PMID:24920615
Adaptation to vocal expressions reveals multistep perception of auditory emotion.
Bestelmeyer, Patricia E G; Maurage, Pierre; Rouger, Julien; Latinus, Marianne; Belin, Pascal
2014-06-11
The human voice carries speech as well as important nonlinguistic signals that influence our social interactions. Among these cues that impact our behavior and communication with other people is the perceived emotional state of the speaker. A theoretical framework for the neural processing stages of emotional prosody has suggested that auditory emotion is perceived in multiple steps (Schirmer and Kotz, 2006) involving low-level auditory analysis and integration of the acoustic information followed by higher-level cognition. Empirical evidence for this multistep processing chain, however, is still sparse. We examined this question using functional magnetic resonance imaging and a continuous carry-over design (Aguirre, 2007) to measure brain activity while volunteers listened to non-speech-affective vocalizations morphed on a continuum between anger and fear. Analyses dissociated neuronal adaptation effects induced by similarity in perceived emotional content between consecutive stimuli from those induced by their acoustic similarity. We found that bilateral voice-sensitive auditory regions as well as right amygdala coded the physical difference between consecutive stimuli. In contrast, activity in bilateral anterior insulae, medial superior frontal cortex, precuneus, and subcortical regions such as bilateral hippocampi depended predominantly on the perceptual difference between morphs. Our results suggest that the processing of vocal affect recognition is a multistep process involving largely distinct neural networks. Amygdala and auditory areas predominantly code emotion-related acoustic information while more anterior insular and prefrontal regions respond to the abstract, cognitive representation of vocal affect. Copyright © 2014 Bestelmeyer et al.
Vanryckeghem, Martine; Matthews, Michael; Xu, Peixin
2017-11-08
The aim of this study was to evaluate the usefulness of the Speech Situation Checklist for adults who stutter (SSC) in differentiating people who stutter (PWS) from speakers with no stutter based on self-reports of anxiety and speech disruption in communicative settings. The SSC's psychometric properties were examined, norms were established, and suggestions for treatment were formulated. The SSC was administered to 88 PWS seeking treatment and 209 speakers with no stutter between the ages of 18 and 62. The SSC consists of 2 sections investigating negative emotional reaction and speech disruption in 38 speech situations that are identical in both sections. The SSC-Emotional Reaction and SSC-Speech Disruption data show that these self-report tests differentiate PWS from speakers with no stutter to a statistically significant extent and have great discriminative value. The tests have good internal reliability, content, and construct validity. Age and gender do not affect the scores of the PWS. The SSC-Emotional Reaction and SSC-Speech Disruption seem to be powerful measures to investigate negative emotion and speech breakdown in an array of speech situations. The item scores give direction to treatment by suggesting speech situations that need a clinician's attention in terms of generalization and carry-over of within-clinic therapeutic gains into in vivo settings.
ERIC Educational Resources Information Center
Guntupalli, Vijaya K.; Nanjundeswaran, Chayadevie; Dayalu, Vikram N.; Kalinowski, Joseph
2012-01-01
Background: Fluent speakers and people who stutter manifest alterations in autonomic and emotional responses as they view stuttered relative to fluent speech samples. These reactions are indicative of an aroused autonomic state and are hypothesized to be triggered by the abrupt breakdown in fluency exemplified in stuttered speech. Furthermore,…
Emotional recognition from the speech signal for a virtual education agent
NASA Astrophysics Data System (ADS)
Tickle, A.; Raghu, S.; Elshaw, M.
2013-06-01
This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.
Influences of Semantic and Prosodic Cues on Word Repetition and Categorization in Autism
ERIC Educational Resources Information Center
Singh, Leher; Harrow, MariLouise S.
2014-01-01
Purpose: To investigate sensitivity to prosodic and semantic cues to emotion in individuals with high-functioning autism (HFA). Method: Emotional prosody and semantics were independently manipulated to assess the relative influence of prosody versus semantics on speech processing. A sample of 10-year-old typically developing children (n = 10) and…
Analysis and synthesis of laughter
NASA Astrophysics Data System (ADS)
Sundaram, Shiva; Narayanan, Shrikanth
2004-10-01
There is much enthusiasm in the text-to-speech community for synthesis of emotional and natural speech. One idea being proposed is to include emotion dependent paralinguistic cues during synthesis to convey emotions effectively. This requires modeling and synthesis techniques of various cues for different emotions. Motivated by this, a technique to synthesize human laughter is proposed. Laughter is a complex mechanism of expression and has high variability in terms of types and usage in human-human communication. People have their own characteristic way of laughing. Laughter can be seen as a controlled/uncontrolled physiological process of a person resulting from an initial excitation in context. A parametric model based on damped simple harmonic motion to effectively capture these diversities and also maintain the individuals characteristics is developed here. Limited laughter/speech data from actual humans and synthesis ease are the constraints imposed on the accuracy of the model. Analysis techniques are also developed to determine the parameters of the model for a given individual or laughter type. Finally, the effectiveness of the model to capture the individual characteristics and naturalness compared to real human laughter has been analyzed. Through this the factors involved in individual human laughter and their importance can be better understood.
Başkent, Deniz; Fuller, Christina D; Galvin, John J; Schepel, Like; Gaudrain, Etienne; Free, Rolien H
2018-05-01
In adult normal-hearing musicians, perception of music, vocal emotion, and speech in noise has been previously shown to be better than non-musicians, sometimes even with spectro-temporally degraded stimuli. In this study, melodic contour identification, vocal emotion identification, and speech understanding in noise were measured in young adolescent normal-hearing musicians and non-musicians listening to unprocessed or degraded signals. Different from adults, there was no musician effect for vocal emotion identification or speech in noise. Melodic contour identification with degraded signals was significantly better in musicians, suggesting potential benefits from music training for young cochlear-implant users, who experience similar spectro-temporal signal degradations.
Study of wavelet packet energy entropy for emotion classification in speech and glottal signals
NASA Astrophysics Data System (ADS)
He, Ling; Lech, Margaret; Zhang, Jing; Ren, Xiaomei; Deng, Lihua
2013-07-01
The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).
Intelligibility of emotional speech in younger and older adults.
Dupuis, Kate; Pichora-Fuller, M Kathleen
2014-01-01
Little is known about the influence of vocal emotions on speech understanding. Word recognition accuracy for stimuli spoken to portray seven emotions (anger, disgust, fear, sadness, neutral, happiness, and pleasant surprise) was tested in younger and older listeners. Emotions were presented in either mixed (heterogeneous emotions mixed in a list) or blocked (homogeneous emotion blocked in a list) conditions. Three main hypotheses were tested. First, vocal emotion affects word recognition accuracy; specifically, portrayals of fear enhance word recognition accuracy because listeners orient to threatening information and/or distinctive acoustical cues such as high pitch mean and variation. Second, older listeners recognize words less accurately than younger listeners, but the effects of different emotions on intelligibility are similar across age groups. Third, blocking emotions in list results in better word recognition accuracy, especially for older listeners, and reduces the effect of emotion on intelligibility because as listeners develop expectations about vocal emotion, the allocation of processing resources can shift from emotional to lexical processing. Emotion was the within-subjects variable: all participants heard speech stimuli consisting of a carrier phrase followed by a target word spoken by either a younger or an older talker, with an equal number of stimuli portraying each of seven vocal emotions. The speech was presented in multi-talker babble at signal to noise ratios adjusted for each talker and each listener age group. Listener age (younger, older), condition (mixed, blocked), and talker (younger, older) were the main between-subjects variables. Fifty-six students (Mage= 18.3 years) were recruited from an undergraduate psychology course; 56 older adults (Mage= 72.3 years) were recruited from a volunteer pool. All participants had clinically normal pure-tone audiometric thresholds at frequencies ≤3000 Hz. There were significant main effects of emotion, listener age group, and condition on the accuracy of word recognition in noise. Stimuli spoken in a fearful voice were the most intelligible, while those spoken in a sad voice were the least intelligible. Overall, word recognition accuracy was poorer for older than younger adults, but there was no main effect of talker, and the pattern of the effects of different emotions on intelligibility did not differ significantly across age groups. Acoustical analyses helped elucidate the effect of emotion and some intertalker differences. Finally, all participants performed better when emotions were blocked. For both groups, performance improved over repeated presentations of each emotion in both blocked and mixed conditions. These results are the first to demonstrate a relationship between vocal emotion and word recognition accuracy in noise for younger and older listeners. In particular, the enhancement of intelligibility by emotion is greatest for words spoken to portray fear and presented heterogeneously with other emotions. Fear may have a specialized role in orienting attention to words heard in noise. This finding may be an auditory counterpart to the enhanced detection of threat information in visual displays. The effect of vocal emotion on word recognition accuracy is preserved in older listeners with good audiograms and both age groups benefit from blocking and the repetition of emotions.
Mitchell, Rachel L. C.; Jazdzyk, Agnieszka; Stets, Manuela; Kotz, Sonja A.
2016-01-01
We aimed to progress understanding of prosodic emotion expression by establishing brain regions active when expressing specific emotions, those activated irrespective of the target emotion, and those whose activation intensity varied depending on individual performance. BOLD contrast data were acquired whilst participants spoke non-sense words in happy, angry or neutral tones, or performed jaw-movements. Emotion-specific analyses demonstrated that when expressing angry prosody, activated brain regions included the inferior frontal and superior temporal gyri, the insula, and the basal ganglia. When expressing happy prosody, the activated brain regions also included the superior temporal gyrus, insula, and basal ganglia, with additional activation in the anterior cingulate. Conjunction analysis confirmed that the superior temporal gyrus and basal ganglia were activated regardless of the specific emotion concerned. Nevertheless, disjunctive comparisons between the expression of angry and happy prosody established that anterior cingulate activity was significantly higher for angry prosody than for happy prosody production. Degree of inferior frontal gyrus activity correlated with the ability to express the target emotion through prosody. We conclude that expressing prosodic emotions (vs. neutral intonation) requires generic brain regions involved in comprehending numerous aspects of language, emotion-related processes such as experiencing emotions, and in the time-critical integration of speech information. PMID:27803656
From disgust to contempt-speech: The nature of contempt on the map of prejudicial emotions.
Bilewicz, Michal; Kamińska, Olga Katarzyna; Winiewski, Mikołaj; Soral, Wiktor
2017-01-01
Analyzing the contempt as an intergroup emotion, we suggest that contempt and anger are not built upon each other, whereas disgust seems to be the most elementary and specific basic-emotional antecedent of contempt. Concurring with Gervais & Fessler, we suggest that many instances of "hate speech" are in fact instances of "contempt speech" - being based on disgust-driven contempt rather than hate.
Schuller, Björn
2017-01-01
Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies—the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain. PMID:28658285
Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado
2016-01-01
Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).
Speaker emotion recognition: from classical classifiers to deep neural networks
NASA Astrophysics Data System (ADS)
Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri
2018-04-01
Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.
Gebauer, Line; Skewes, Joshua; Westphael, Gitte; Heaton, Pamela; Vuust, Peter
2014-01-01
Music is a potent source for eliciting emotions, but not everybody experience emotions in the same way. Individuals with autism spectrum disorder (ASD) show difficulties with social and emotional cognition. Impairments in emotion recognition are widely studied in ASD, and have been associated with atypical brain activation in response to emotional expressions in faces and speech. Whether these impairments and atypical brain responses generalize to other domains, such as emotional processing of music, is less clear. Using functional magnetic resonance imaging, we investigated neural correlates of emotion recognition in music in high-functioning adults with ASD and neurotypical adults. Both groups engaged similar neural networks during processing of emotional music, and individuals with ASD rated emotional music comparable to the group of neurotypical individuals. However, in the ASD group, increased activity in response to happy compared to sad music was observed in dorsolateral prefrontal regions and in the rolandic operculum/insula, and we propose that this reflects increased cognitive processing and physiological arousal in response to emotional musical stimuli in this group.
Galilee, Alena; Stefanidou, Chrysi; McCleery, Joseph P
2017-01-01
Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6-year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age.
Stefanidou, Chrysi; McCleery, Joseph P.
2017-01-01
Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6—year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age. PMID:28738063
Gender differences in identifying emotions from auditory and visual stimuli.
Waaramaa, Teija
2017-12-01
The present study focused on gender differences in emotion identification from auditory and visual stimuli produced by two male and two female actors. Differences in emotion identification from nonsense samples, language samples and prolonged vowels were investigated. It was also studied whether auditory stimuli can convey the emotional content of speech without visual stimuli, and whether visual stimuli can convey the emotional content of speech without auditory stimuli. The aim was to get a better knowledge of vocal attributes and a more holistic understanding of the nonverbal communication of emotion. Females tended to be more accurate in emotion identification than males. Voice quality parameters played a role in emotion identification in both genders. The emotional content of the samples was best conveyed by nonsense sentences, better than by prolonged vowels or shared native language of the speakers and participants. Thus, vocal non-verbal communication tends to affect the interpretation of emotion even in the absence of language. The emotional stimuli were better recognized from visual stimuli than auditory stimuli by both genders. Visual information about speech may not be connected to the language; instead, it may be based on the human ability to understand the kinetic movements in speech production more readily than the characteristics of the acoustic cues.
Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang Xiaojia; Mao Qirong; Zhan Yongzhao
There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions.more » The experiments show that this method can improve the recognition rate and the time of feature extraction.« less
Emotional speech synchronizes brains across listeners and engages large-scale dynamic brain networks
Nummenmaa, Lauri; Saarimäki, Heini; Glerean, Enrico; Gotsopoulos, Athanasios; Jääskeläinen, Iiro P.; Hari, Riitta; Sams, Mikko
2014-01-01
Speech provides a powerful means for sharing emotions. Here we implement novel intersubject phase synchronization and whole-brain dynamic connectivity measures to show that networks of brain areas become synchronized across participants who are listening to emotional episodes in spoken narratives. Twenty participants' hemodynamic brain activity was measured with functional magnetic resonance imaging (fMRI) while they listened to 45-s narratives describing unpleasant, neutral, and pleasant events spoken in neutral voice. After scanning, participants listened to the narratives again and rated continuously their feelings of pleasantness–unpleasantness (valence) and of arousal–calmness. Instantaneous intersubject phase synchronization (ISPS) measures were computed to derive both multi-subject voxel-wise similarity measures of hemodynamic activity and inter-area functional dynamic connectivity (seed-based phase synchronization, SBPS). Valence and arousal time series were subsequently used to predict the ISPS and SBPS time series. High arousal was associated with increased ISPS in the auditory cortices and in Broca's area, and negative valence was associated with enhanced ISPS in the thalamus, anterior cingulate, lateral prefrontal, and orbitofrontal cortices. Negative valence affected functional connectivity of fronto-parietal, limbic (insula, cingulum) and fronto-opercular circuitries, and positive arousal affected the connectivity of the striatum, amygdala, thalamus, cerebellum, and dorsal frontal cortex. Positive valence and negative arousal had markedly smaller effects. We propose that high arousal synchronizes the listeners' sound-processing and speech-comprehension networks, whereas negative valence synchronizes circuitries supporting emotional and self-referential processing. PMID:25128711
NASA Astrophysics Data System (ADS)
Sauter, Disa
This PhD is an investigation of vocal expressions of emotions, mainly focusing on non-verbal sounds such as laughter, cries and sighs. The research examines the roles of categorical and dimensional factors, the contributions of a number of acoustic cues, and the influence of culture. A series of studies established that naive listeners can reliably identify non-verbal vocalisations of positive and negative emotions in forced-choice and rating tasks. Some evidence for underlying dimensions of arousal and valence is found, although each emotion had a discrete expression. The role of acoustic characteristics of the sounds is investigated experimentally and analytically. This work shows that the cues used to identify different emotions vary, although pitch and pitch variation play a central role. The cues used to identify emotions in non-verbal vocalisations differ from the cues used when comprehending speech. An additional set of studies using stimuli consisting of emotional speech demonstrates that these sounds can also be reliably identified, and rely on similar acoustic cues. A series of studies with a pre-literate Namibian tribe shows that non-verbal vocalisations can be recognized across cultures. An fMRI study carried out to investigate the neural processing of non-verbal vocalisations of emotions is presented. The results show activation in pre-motor regions arising from passive listening to non-verbal emotional vocalisations, suggesting neural auditory-motor interactions in the perception of these sounds. In sum, this thesis demonstrates that non-verbal vocalisations of emotions are reliably identifiable tokens of information that belong to discrete categories. These vocalisations are recognisable across vastly different cultures and thus seem to, like facial expressions of emotions, comprise human universals. Listeners rely mainly on pitch and pitch variation to identify emotions in non verbal vocalisations, which differs with the cues used to comprehend speech. When listening to others' emotional vocalisations, a neural system of preparatory motor activation is engaged.
Çek, Demet; Sánchez, Alvaro; Timpano, Kiara R
2016-05-01
Attention bias to threat (e.g., disgust faces) is a cognitive vulnerability factor for social anxiety occurring in early stages of information processing. Few studies have investigated the relationship between social anxiety and attention biases, in conjunction with emotional and cognitive responses to a social stressor. Elucidating these links would shed light on maintenance factors of social anxiety and could help identify malleable treatment targets. This study examined the associations between social anxiety level, attention bias to disgust (AB-disgust), subjective emotional and physiological reactivity to a social stressor, and subsequent post-event processing (PEP). We tested a mediational model where social anxiety level indirectly predicted subsequent PEP via its association with AB-disgust and immediate subjective emotional reactivity to social stress. Fifty-five undergraduates (45% female) completed a passive viewing task. Eye movements were tracked during the presentation of social stimuli (e.g., disgust faces) and used to calculate AB-disgust. Next, participants gave an impromptu speech in front of a video camera and watched a neutral video, followed by the completion of a PEP measure. Although there was no association between AB-disgust and physiological reactivity to the stressor, AB-disgust was significantly associated with greater subjective emotional reactivity from baseline to the speech. Analyses supported a partial mediation model where AB-disgust and subjective emotional reactivity to a social stressor partially accounted for the link between social anxiety levels and PEP. Copyright © 2016. Published by Elsevier Ltd.
The Voice of Emotion: Acoustic Properties of Six Emotional Expressions.
NASA Astrophysics Data System (ADS)
Baldwin, Carol May
Studies in the perceptual identification of emotional states suggested that listeners seemed to depend on a limited set of vocal cues to distinguish among emotions. Linguistics and speech science literatures have indicated that this small set of cues included intensity, fundamental frequency, and temporal properties such as speech rate and duration. Little research has been done, however, to validate these cues in the production of emotional speech, or to determine if specific dimensions of each cue are associated with the production of a particular emotion for a variety of speakers. This study addressed deficiencies in understanding of the acoustical properties of duration and intensity as components of emotional speech by means of speech science instrumentation. Acoustic data were conveyed in a brief sentence spoken by twelve English speaking adult male and female subjects, half with dramatic training, and half without such training. Simulated expressions included: happiness, surprise, sadness, fear, anger, and disgust. The study demonstrated that the acoustic property of mean intensity served as an important cue for a vocal taxonomy. Overall duration was rejected as an element for a general taxonomy due to interactions involving gender and role. Findings suggested a gender-related taxonomy, however, based on differences in the ways in which men and women use the duration cue in their emotional expressions. Results also indicated that speaker training may influence greater use of the duration cue in expressions of emotion, particularly for male actors. Discussion of these results provided linkages to (1) practical management of emotional interactions in clinical and interpersonal environments, (2) implications for differences in the ways in which males and females may be socialized to express emotions, and (3) guidelines for future perceptual studies of emotional sensitivity.
Musical melody and speech intonation: singing a different tune.
Zatorre, Robert J; Baum, Shari R
2012-01-01
Music and speech are often cited as characteristically human forms of communication. Both share the features of hierarchical structure, complex sound systems, and sensorimotor sequencing demands, and both are used to convey and influence emotions, among other functions [1]. Both music and speech also prominently use acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, and the fact that pitch perception and production involve the same peripheral transduction system (cochlea) and the same production mechanism (vocal tract), it might be natural to assume that pitch processing in speech and music would also depend on the same underlying cognitive and neural mechanisms. In this essay we argue that the processing of pitch information differs significantly for speech and music; specifically, we suggest that there are two pitch-related processing systems, one for more coarse-grained, approximate analysis and one for more fine-grained accurate representation, and that the latter is unique to music. More broadly, this dissociation offers clues about the interface between sensory and motor systems, and highlights the idea that multiple processing streams are a ubiquitous feature of neuro-cognitive architectures.
Crossmodal and Incremental Perception of Audiovisual Cues to Emotional Speech
ERIC Educational Resources Information Center
Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc
2010-01-01
In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests…
Martinelli, Eugenio; Mencattini, Arianna; Di Natale, Corrado
2016-01-01
Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present ‘intelligent personal assistants’, and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants’ emotional state, selective/differential data collection based on emotional content, etc.). PMID:27563724
Action Unit Models of Facial Expression of Emotion in the Presence of Speech
Shah, Miraj; Cooper, David G.; Cao, Houwei; Gur, Ruben C.; Nenkova, Ani; Verma, Ragini
2014-01-01
Automatic recognition of emotion using facial expressions in the presence of speech poses a unique challenge because talking reveals clues for the affective state of the speaker but distorts the canonical expression of emotion on the face. We introduce a corpus of acted emotion expression where speech is either present (talking) or absent (silent). The corpus is uniquely suited for analysis of the interplay between the two conditions. We use a multimodal decision level fusion classifier to combine models of emotion from talking and silent faces as well as from audio to recognize five basic emotions: anger, disgust, fear, happy and sad. Our results strongly indicate that emotion prediction in the presence of speech from action unit facial features is less accurate when the person is talking. Modeling talking and silent expressions separately and fusing the two models greatly improves accuracy of prediction in the talking setting. The advantages are most pronounced when silent and talking face models are fused with predictions from audio features. In this multi-modal prediction both the combination of modalities and the separate models of talking and silent facial expression of emotion contribute to the improvement. PMID:25525561
Private Speech Moderates the Effects of Effortful Control on Emotionality
ERIC Educational Resources Information Center
Day, Kimberly L.; Smith, Cynthia L.; Neal, Amy; Dunsmore, Julie C.
2018-01-01
Research Findings: In addition to being a regulatory strategy, children's private speech may enhance or interfere with their effortful control used to regulate emotion. The goal of the current study was to investigate whether children's private speech during a selective attention task moderated the relations of their effortful control to their…
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion
Schutz, Michael
2017-01-01
Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy”) pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech. PMID:29249997
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion.
Schutz, Michael
2017-01-01
Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally "happy") pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers "trade off" cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music-widely recognized for its artistic significance-complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.
Playing Music for a Smarter Ear: Cognitive, Perceptual and Neurobiological Evidence
Strait, Dana; Kraus, Nina
2012-01-01
Human hearing depends on a combination of cognitive and sensory processes that function by means of an interactive circuitry of bottom-up and top-down neural pathways, extending from the cochlea to the cortex and back again. Given that similar neural pathways are recruited to process sounds related to both music and language, it is not surprising that the auditory expertise gained over years of consistent music practice fine-tunes the human auditory system in a comprehensive fashion, strengthening neurobiological and cognitive underpinnings of both music and speech processing. In this review we argue not only that common neural mechanisms for speech and music exist, but that experience in music leads to enhancements in sensory and cognitive contributors to speech processing. Of specific interest is the potential for music training to bolster neural mechanisms that undergird language-related skills, such as reading and hearing speech in background noise, which are critical to academic progress, emotional health, and vocational success. PMID:22993456
Crossmodal and incremental perception of audiovisual cues to emotional speech.
Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc
2010-01-01
In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores raise faster for positive than negative conditions. Finally, the gating results suggest that incongruent emotions are perceived as more intense than congruent emotions, as the former get more extreme recognition scores than the latter, already after a short period of exposure.
Doctors' voices in patients' narratives: coping with emotions in storytelling.
Lucius-Hoene, Gabriele; Thiele, Ulrike; Breuning, Martina; Haug, Stephanie
2012-09-01
To understand doctors' impacts on the emotional coping of patients, their stories about encounters with doctors are used. These accounts reflect meaning-making processes and biographically contextualized experiences. We investigate how patients characterize their doctors by voicing them in their stories, thus assigning them functions in their coping process. 394 narrated scenes with reported speech of doctors were extracted from interviews with 26 patients with type 2 diabetes and 30 with chronic pain. Constructed speech acts were investigated by means of positioning and narrative analysis, and assigned into thematic categories by a bottom-up coding procedure. Patients use narratives as coping strategies when confronted with illness and their encounters with doctors by constructing them in a supportive and face-saving way. In correspondence with the variance of illness conditions, differing moral problems in dealing with doctors arise. Different evaluative stances towards the same events within interviews show that positionings are not fixed, but vary according to contexts and purposes. Our narrative approach deepens the standardized and predominantly cognitive statements of questionnaires in research on doctor-patient relations by individualized emotional and biographical aspects of patients' perspective. Doctors should be trained to become aware of their impact in patients' coping processes.
An experiment with spectral analysis of emotional speech affected by orthodontic appliances
NASA Astrophysics Data System (ADS)
Přibil, Jiří; Přibilová, Anna; Ďuračková, Daniela
2012-11-01
The contribution describes the effect of the fixed and removable orthodontic appliances on spectral properties of emotional speech. Spectral changes were analyzed and evaluated by spectrograms and mean Welch’s periodograms. This alternative approach to the standard listening test enables to obtain objective comparison based on statistical analysis by ANOVA and hypothesis tests. Obtained results of analysis performed on short sentences of a female speaker in four emotional states (joyous, sad, angry, and neutral) show that, first of all, the removable orthodontic appliance affects the spectrograms of produced speech.
One approach to design of speech emotion database
NASA Astrophysics Data System (ADS)
Uhrin, Dominik; Chmelikova, Zdenka; Tovarek, Jaromir; Partila, Pavol; Voznak, Miroslav
2016-05-01
This article describes a system for evaluating the credibility of recordings with emotional character. Sound recordings form Czech language database for training and testing systems of speech emotion recognition. These systems are designed to detect human emotions in his voice. The emotional state of man is useful in the security forces and emergency call service. Man in action (soldier, police officer and firefighter) is often exposed to stress. Information about the emotional state (his voice) will help to dispatch to adapt control commands for procedure intervention. Call agents of emergency call service must recognize the mental state of the caller to adjust the mood of the conversation. In this case, the evaluation of the psychological state is the key factor for successful intervention. A quality database of sound recordings is essential for the creation of the mentioned systems. There are quality databases such as Berlin Database of Emotional Speech or Humaine. The actors have created these databases in an audio studio. It means that the recordings contain simulated emotions, not real. Our research aims at creating a database of the Czech emotional recordings of real human speech. Collecting sound samples to the database is only one of the tasks. Another one, no less important, is to evaluate the significance of recordings from the perspective of emotional states. The design of a methodology for evaluating emotional recordings credibility is described in this article. The results describe the advantages and applicability of the developed method.
Jürgens, Rebecca; Grass, Annika; Drolet, Matthis; Fischer, Julia
Both in the performative arts and in emotion research, professional actors are assumed to be capable of delivering emotions comparable to spontaneous emotional expressions. This study examines the effects of acting training on vocal emotion depiction and recognition. We predicted that professional actors express emotions in a more realistic fashion than non-professional actors. However, professional acting training may lead to a particular speech pattern; this might account for vocal expressions by actors that are less comparable to authentic samples than the ones by non-professional actors. We compared 80 emotional speech tokens from radio interviews with 80 re-enactments by professional and inexperienced actors, respectively. We analyzed recognition accuracies for emotion and authenticity ratings and compared the acoustic structure of the speech tokens. Both play-acted conditions yielded similar recognition accuracies and possessed more variable pitch contours than the spontaneous recordings. However, professional actors exhibited signs of different articulation patterns compared to non-trained speakers. Our results indicate that for emotion research, emotional expressions by professional actors are not better suited than those from non-actors.
Processing of affective speech prosody is impaired in Asperger syndrome.
Korpilahti, Pirjo; Jansson-Verkasalo, Eira; Mattila, Marja-Leena; Kuusikko, Sanna; Suominen, Kalervo; Rytky, Seppo; Pauls, David L; Moilanen, Irma
2007-09-01
Many people with the diagnosis of Asperger syndrome (AS) show poorly developed skills in understanding emotional messages. The present study addressed discrimination of speech prosody in children with AS at neurophysiological level. Detection of affective prosody was investigated in one-word utterances as indexed by the N1 and the mismatch negativity (MMN) of auditory event-related potentials (ERPs). Data from fourteen boys with AS were compared with those for thirteen typically developed boys. These results suggest atypical neural responses to affective prosody in children with AS and their fathers, especially over the RH, and that this impairment can already be seen at low-level information processes. Our results provide evidence for familial patterns of abnormal auditory brain reactions to prosodic features of speech.
The Influence of Negative Emotion on Cognitive and Emotional Control Remains Intact in Aging
Zinchenko, Artyom; Obermeier, Christian; Kanske, Philipp; Schröger, Erich; Villringer, Arno; Kotz, Sonja A.
2017-01-01
Healthy aging is characterized by a gradual decline in cognitive control and inhibition of interferences, while emotional control is either preserved or facilitated. Emotional control regulates the processing of emotional conflicts such as in irony in speech, and cognitive control resolves conflict between non-affective tendencies. While negative emotion can trigger control processes and speed up resolution of both cognitive and emotional conflicts, we know little about how aging affects the interaction of emotion and control. In two EEG experiments, we compared the influence of negative emotion on cognitive and emotional conflict processing in groups of younger adults (mean age = 25.2 years) and older adults (69.4 years). Participants viewed short video clips and either categorized spoken vowels (cognitive conflict) or their emotional valence (emotional conflict), while the visual facial information was congruent or incongruent. Results show that negative emotion modulates both cognitive and emotional conflict processing in younger and older adults as indicated in reduced response times and/or enhanced event-related potentials (ERPs). In emotional conflict processing, we observed a valence-specific N100 ERP component in both age groups. In cognitive conflict processing, we observed an interaction of emotion by congruence in the N100 responses in both age groups, and a main effect of congruence in the P200 and N200. Thus, the influence of emotion on conflict processing remains intact in aging, despite a marked decline in cognitive control. Older adults may prioritize emotional wellbeing and preserve the role of emotion in cognitive and emotional control. PMID:29163132
The Influence of Negative Emotion on Cognitive and Emotional Control Remains Intact in Aging.
Zinchenko, Artyom; Obermeier, Christian; Kanske, Philipp; Schröger, Erich; Villringer, Arno; Kotz, Sonja A
2017-01-01
Healthy aging is characterized by a gradual decline in cognitive control and inhibition of interferences, while emotional control is either preserved or facilitated. Emotional control regulates the processing of emotional conflicts such as in irony in speech, and cognitive control resolves conflict between non-affective tendencies. While negative emotion can trigger control processes and speed up resolution of both cognitive and emotional conflicts, we know little about how aging affects the interaction of emotion and control. In two EEG experiments, we compared the influence of negative emotion on cognitive and emotional conflict processing in groups of younger adults (mean age = 25.2 years) and older adults (69.4 years). Participants viewed short video clips and either categorized spoken vowels (cognitive conflict) or their emotional valence (emotional conflict), while the visual facial information was congruent or incongruent. Results show that negative emotion modulates both cognitive and emotional conflict processing in younger and older adults as indicated in reduced response times and/or enhanced event-related potentials (ERPs). In emotional conflict processing, we observed a valence-specific N100 ERP component in both age groups. In cognitive conflict processing, we observed an interaction of emotion by congruence in the N100 responses in both age groups, and a main effect of congruence in the P200 and N200. Thus, the influence of emotion on conflict processing remains intact in aging, despite a marked decline in cognitive control. Older adults may prioritize emotional wellbeing and preserve the role of emotion in cognitive and emotional control.
Understanding speaker attitudes from prosody by adults with Parkinson's disease.
Monetta, Laura; Cheang, Henry S; Pell, Marc D
2008-09-01
The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).
Self-organizing map classifier for stressed speech recognition
NASA Astrophysics Data System (ADS)
Partila, Pavol; Tovarek, Jaromir; Voznak, Miroslav
2016-05-01
This paper presents a method for detecting speech under stress using Self-Organizing Maps. Most people who are exposed to stressful situations can not adequately respond to stimuli. Army, police, and fire department occupy the largest part of the environment that are typical of an increased number of stressful situations. The role of men in action is controlled by the control center. Control commands should be adapted to the psychological state of a man in action. It is known that the psychological changes of the human body are also reflected physiologically, which consequently means the stress effected speech. Therefore, it is clear that the speech stress recognizing system is required in the security forces. One of the possible classifiers, which are popular for its flexibility, is a self-organizing map. It is one type of the artificial neural networks. Flexibility means independence classifier on the character of the input data. This feature is suitable for speech processing. Human Stress can be seen as a kind of emotional state. Mel-frequency cepstral coefficients, LPC coefficients, and prosody features were selected for input data. These coefficients were selected for their sensitivity to emotional changes. The calculation of the parameters was performed on speech recordings, which can be divided into two classes, namely the stress state recordings and normal state recordings. The benefit of the experiment is a method using SOM classifier for stress speech detection. Results showed the advantage of this method, which is input data flexibility.
Intentional Voice Command Detection for Trigger-Free Speech Interface
NASA Astrophysics Data System (ADS)
Obuchi, Yasunari; Sumiyoshi, Takashi
In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
An audiovisual emotion recognition system
NASA Astrophysics Data System (ADS)
Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun
2007-12-01
Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.
Linguistic Correlates of Social Anxiety Disorder
Hofmann, Stefan G.; Moore, Philippa M.; Gutner, Cassidy; Weeks, Justin W.
2012-01-01
The goal of this study was to examine the linguistic correlates of social anxiety disorder (SAD). Twenty-four individuals with SAD (8 of them with a generalized subtype) and 21 nonanxious controls were asked to give speeches in front of an audience. The transcribed speeches were examined for the frequency of negations, I-statements, we-statements, negative emotion words, and positive emotion words. During their speech, individuals with either SAD subtype used positive emotion words more often than controls. No significant differences were observed in the other linguistic categories. These results are discussed in the context of evolutionary and cognitive perspectives of SAD. PMID:21851248
On the Time Course of Vocal Emotion Recognition
Pell, Marc D.; Kotz, Sonja A.
2011-01-01
How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing. PMID:22087275
The impact of threat and cognitive stress on speech motor control in people who stutter.
Lieshout, Pascal van; Ben-David, Boaz; Lipski, Melinda; Namasivayam, Aravind
2014-06-01
In the present study, an Emotional Stroop and Classical Stroop task were used to separate the effect of threat content and cognitive stress from the phonetic features of words on motor preparation and execution processes. A group of 10 people who stutter (PWS) and 10 matched people who do not stutter (PNS) repeated colour names for threat content words and neutral words, as well as for traditional Stroop stimuli. Data collection included speech acoustics and movement data from upper lip and lower lip using 3D EMA. PWS in both tasks were slower to respond and showed smaller upper lip movement ranges than PNS. For the Emotional Stroop task only, PWS were found to show larger inter-lip phase differences compared to PNS. General threat words were executed with faster lower lip movements (larger range and shorter duration) in both groups, but only PWS showed a change in upper lip movements. For stutter specific threat words, both groups showed a more variable lip coordination pattern, but only PWS showed a delay in reaction time compared to neutral words. Individual stuttered words showed no effects. Both groups showed a classical Stroop interference effect in reaction time but no changes in motor variables. This study shows differential motor responses in PWS compared to controls for specific threat words. Cognitive stress was not found to affect stuttering individuals differently than controls or that its impact spreads to motor execution processes. After reading this article, the reader will be able to: (1) discuss the importance of understanding how threat content influences speech motor control in people who stutter and non-stuttering speakers; (2) discuss the need to use tasks like the Emotional Stroop and Regular Stroop to separate phonetic (word-bound) based impact on fluency from other factors in people who stutter; and (3) describe the role of anxiety and cognitive stress on speech motor processes. Copyright © 2014 Elsevier Inc. All rights reserved.
Ambert-Dahan, Emmanuèle; Giraud, Anne-Lise; Mecheri, Halima; Sterkers, Olivier; Mosnier, Isabelle; Samson, Séverine
2017-10-01
Visual processing has been extensively explored in deaf subjects in the context of verbal communication, through the assessment of speech reading and sign language abilities. However, little is known about visual emotional processing in adult progressive deafness, and after cochlear implantation. The goal of our study was thus to assess the influence of acquired post-lingual progressive deafness on the recognition of dynamic facial emotions that were selected to express canonical fear, happiness, sadness, and anger. A total of 23 adults with post-lingual deafness separated into two groups; those assessed either before (n = 10) and those assessed after (n = 13) cochlear implantation (CI); and 13 normal hearing (NH) individuals participated in the current study. Participants were asked to rate the expression of the four cardinal emotions, and to evaluate both their emotional valence (unpleasant-pleasant) and arousal potential (relaxing-stimulating). We found that patients with deafness were impaired in the recognition of sad faces, and that patients equipped with a CI were additionally impaired in the recognition of happiness and fear (but not anger). Relative to controls, all patients with deafness showed a deficit in perceiving arousal expressed in faces, while valence ratings remained unaffected. The current results show for the first time that acquired and progressive deafness is associated with a reduction of emotional sensitivity to visual stimuli. This negative impact of progressive deafness on the perception of dynamic facial cues for emotion recognition contrasts with the proficiency of deaf subjects with and without CIs in processing visual speech cues (Rouger et al., 2007; Strelnikov et al., 2009; Lazard and Giraud, 2017). Altogether these results suggest there to be a trade-off between the processing of linguistic and non-linguistic visual stimuli. Copyright © 2017. Published by Elsevier B.V.
[The role of sex in voice restoration and emotional functioning after laryngectomy].
Keszte, J; Wollbrück, D; Meyer, A; Fuchs, M; Meister, E; Pabst, F; Oeken, J; Schock, J; Wulke, C; Singer, S
2012-04-01
Data on psychosocial factors of laryngectomized women is rare. All means of alaryngeal voice production sound male due to low fundamental frequency and roughness, which makes postlaryngectomy voice rehabilitation especially challenging to women. Aim of this study was to investigate whether women use alaryngeal speech more seldomly and therefore are more emotionally distressed. In a cross-sectional multi-centred study 12 female and 138 male laryngectomees were interviewed. To identify risc factors on seldom use of alaryngeal speech and emotional functioning, logistic regression was used and odds ratios were adjusted to age, time since laryngectomy, physical functioning, social activity and feelings of stigmatization. Esophageal speech was used by 83% of the female and 57% of the male patients, prosthetic speech was used by 17% of the female and 20% of the male patients and electrolaryngeal speech was used by 17% of the female and 29% of the male patients. There was a higher risk for laryngectomees to be more emotionally distressed when feeling physically bad (OR=2,48; p=0,02) or having feelings of stigmatization (OR=3,94; p≤0,00). Besides more women tended to be socially active than men (83% vs. 54%; p=0,05). There was no influence of sex neither on use of alaryngeal speech nor on emotional functioning. Since there is evidence for a different psychosocial adjustment in laryngectomized men and women, more investigation including bigger sample sizes will be needed on this special issue. © Georg Thieme Verlag KG Stuttgart · New York.
Anxiety and speaking in people who stutter: an investigation using the emotional Stroop task.
Hennessey, Neville W; Dourado, Esther; Beilby, Janet M
2014-06-01
People with anxiety disorders show an attentional bias towards threat or negative emotion words. This exploratory study examined whether people who stutter (PWS), who can be anxious when speaking, show similar bias and whether reactions to threat words also influence speech motor planning and execution. Comparisons were made between 31 PWS and 31 fluent controls in a modified emotional Stroop task where, depending on a visual cue, participants named the colour of threat and neutral words at either a normal or fast articulation rate. In a manual version of the same task participants pressed the corresponding colour button with either a long or short duration. PWS but not controls were slower to respond to threat words than neutral words, however, this emotionality effect was only evident for verbal responding. Emotionality did not interact with speech rate, but the size of the emotionality effect among PWS did correlate with frequency of stuttering. Results suggest PWS show an attentional bias to threat words similar to that found in people with anxiety disorder. In addition, this bias appears to be contingent on engaging the speech production system as a response modality. No evidence was found to indicate that emotional reactivity during the Stroop task constrains or destabilises, perhaps via arousal mechanisms, speech motor adjustment or execution for PWS. The reader will be able to: (1) explain the importance of cognitive aspects of anxiety, such as attentional biases, in the possible cause and/or maintenance of anxiety in people who stutter, (2) explain how the emotional Stroop task can be used as a measure of attentional bias to threat information, and (3) evaluate the findings with respect to the relationship between attentional bias to threat information and speech production in people who stutter. Copyright © 2013 Elsevier Inc. All rights reserved.
2009-04-01
Available Military Speech Databases 2-2 2.3.1 FELIN Database 2-2 2.3.1.1 Overview 2-2 2.3.1.2 Technical Specifications 2-3 2.3.1.3 Limitations...emotion, confusion due to conflicting information, psychological tension, pain , and other typical conditions encountered in the modern battlefield...too, the number of possible language combinations scale with N3. It is clear that in a field of research that has only recently started and with so
Marsh, John E; Yang, Jingqi; Qualter, Pamela; Richardson, Cassandra; Perham, Nick; Vachon, François; Hughes, Robert W
2018-06-01
Task-irrelevant speech impairs short-term serial recall appreciably. On the interference-by-process account, the processing of physical (i.e., precategorical) changes in speech yields order cues that conflict with the serial-ordering process deployed to perform the serial recall task. In this view, the postcategorical properties (e.g., phonology, meaning) of speech play no role. The present study reassessed the implications of recent demonstrations of auditory postcategorical distraction in serial recall that have been taken as support for an alternative, attentional-diversion, account of the irrelevant speech effect. Focusing on the disruptive effect of emotionally valent compared with neutral words on serial recall, we show that the distracter-valence effect is eliminated under conditions-high task-encoding load-thought to shield against attentional diversion whereas the general effect of speech (neutral words compared with quiet) remains unaffected (Experiment 1). Furthermore, the distracter-valence effect generalizes to a task that does not require the processing of serial order-the missing-item task-whereas the effect of speech per se is attenuated in this task (Experiment 2). We conclude that postcategorical auditory distraction phenomena in serial short-term memory (STM) are incidental: they are observable in such a setting but, unlike the acoustically driven irrelevant speech effect, are not integral to it. As such, the findings support a duplex-mechanism account over a unitary view of auditory distraction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Real-time speech-driven animation of expressive talking faces
NASA Astrophysics Data System (ADS)
Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli
2011-05-01
In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.
ERIC Educational Resources Information Center
Mitchell, Sibyl
Examined were the expectations and characteristics of the parents in 25 families involved with due process concerning the education of their learning disabled, emotionally disturbed, gifted, or speech impaired children (7-21 years old). Families availing themselves of the appeals process under Chapter 766 (a Massachusetts law providing for the…
Emotional Expression in Husbands and Wives.
ERIC Educational Resources Information Center
Notarius, Clifford I.; Johnson, Jennifer S.
1982-01-01
Investigated the emotional expression and physiological reactivity of spouses (N=6) as they discussed a salient interpersonal issue. Results indicated that wive's speech was characterized by less neutral and more negative behavior. Wives also reciprocated their husbands' positive and negative speech, while husbands did not reciprocate their wives'…
Classification Influence of Features on Given Emotions and Its Application in Feature Selection
NASA Astrophysics Data System (ADS)
Xing, Yin; Chen, Chuang; Liu, Li-Long
2018-04-01
In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.
Nashiro, Kaoru; Sakaki, Michiko; Braskie, Meredith N; Mather, Mara
2017-06-01
Correlations in activity across disparate brain regions during rest reveal functional networks in the brain. Although previous studies largely agree that there is an age-related decline in the "default mode network," how age affects other resting-state networks, such as emotion-related networks, is still controversial. Here we used a dual-regression approach to investigate age-related alterations in resting-state networks. The results revealed age-related disruptions in functional connectivity in all 5 identified cognitive networks, namely the default mode network, cognitive-auditory, cognitive-speech (or speech-related somatosensory), and right and left frontoparietal networks, whereas such age effects were not observed in the 3 identified emotion networks. In addition, we observed age-related decline in functional connectivity in 3 visual and 3 motor/visuospatial networks. Older adults showed greater functional connectivity in regions outside 4 out of the 5 identified cognitive networks, consistent with the dedifferentiation effect previously observed in task-based functional magnetic resonance imaging studies. Both reduced within-network connectivity and increased out-of-network connectivity were correlated with poor cognitive performance, providing potential biomarkers for cognitive aging. Copyright © 2017 Elsevier Inc. All rights reserved.
Hollien, Harry; Huntley Bahr, Ruth; Harnsberger, James D
2014-03-01
The following article provides a general review of an area that can be referred to as Forensic Voice. Its goals will be outlined and that discussion will be followed by a description of its major elements. Considered are (1) the processing and analysis of spoken utterances, (2) distorted speech, (3) enhancement of speech intelligibility (re: surveillance and other recordings), (4) transcripts, (5) authentication of recordings, (6) speaker identification, and (7) the detection of deception, intoxication, and emotions in speech. Stress in speech and the psychological stress evaluation systems (that some individuals attempt to use as lie detectors) also will be considered. Points of entry will be suggested for individuals with the kinds of backgrounds possessed by professionals already working in the voice area. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Poon, Matthew; Schutz, Michael
2015-01-01
Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music.
Loutrari, Ariadne; Lorch, Marjorie Perlman
2017-07-01
We present a follow-up study on the case of a Greek amusic adult, B.Z., whose impaired performance on scale, contour, interval, and meter was reported by Paraskevopoulos, Tsapkini, and Peretz in 2010, employing a culturally-tailored version of the Montreal Battery of Evaluation of Amusia. In the present study, we administered a novel set of perceptual judgement tasks designed to investigate the ability to appreciate holistic prosodic aspects of 'expressiveness' and emotion in phrase length music and speech stimuli. Our results show that, although diagnosed as a congenital amusic, B.Z. scored as well as healthy controls (N=24) on judging 'expressiveness' and emotional prosody in both speech and music stimuli. These findings suggest that the ability to make perceptual judgements about such prosodic qualities may be preserved in individuals who demonstrate difficulties perceiving basic musical features such as melody or rhythm. B.Z.'s case yields new insights into amusia and the processing of speech and music prosody through a holistic approach. The employment of novel stimuli with relatively fewer non-naturalistic manipulations, as developed for this study, may be a useful tool for revealing unexplored aspects of music and speech cognition and offer the possibility to further the investigation of the perception of acoustic streams in more authentic auditory conditions. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Mitchell, Rachel L. C.; Xu, Yi
2015-01-01
In computerized technology, artificial speech is becoming increasingly important, and is already used in ATMs, online gaming and healthcare contexts. However, today’s artificial speech typically sounds monotonous, a main reason for this being the lack of meaningful prosody. One particularly important function of prosody is to convey different emotions. This is because successful encoding and decoding of emotions is vital for effective social cognition, which is increasingly recognized in human–computer interaction contexts. Current attempts to artificially synthesize emotional prosody are much improved relative to early attempts, but there remains much work to be done due to methodological problems, lack of agreed acoustic correlates, and lack of theoretical grounding. If the addition of synthetic emotional prosody is not of sufficient quality, it may risk alienating users instead of enhancing their experience. So the value of embedding emotion cues in artificial speech may ultimately depend on the quality of the synthetic emotional prosody. However, early evidence on reactions to synthesized non-verbal cues in the facial modality bodes well. Attempts to implement the recognition of emotional prosody into artificial applications and interfaces have perhaps been met with greater success, but the ultimate test of synthetic emotional prosody will be to critically compare how people react to synthetic emotional prosody vs. natural emotional prosody, at the behavioral, socio-cognitive and neural levels. PMID:26617563
Specificity of regional brain activity in anxiety types during emotion processing.
Engels, Anna S; Heller, Wendy; Mohanty, Aprajita; Herrington, John D; Banich, Marie T; Webb, Andrew G; Miller, Gregory A
2007-05-01
The present study tested the hypothesis that anxious apprehension involves more left- than right-hemisphere activity and that anxious arousal is associated with the opposite pattern. Behavioral and fMRI responses to threat stimuli in an emotional Stroop task were examined in nonpatient groups reporting anxious apprehension, anxious arousal, or neither. Reaction times were longer for negative than for neutral words. As predicted, brain activation distinguished anxious groups in a left inferior frontal region associated with speech production and in a right-hemisphere inferior temporal area. Addressing a second hypothesis about left-frontal involvement in emotion, distinct left frontal regions were associated with anxious apprehension versus processing of positive information. Results support the proposed distinction between the two types of anxiety and resolve an inconsistency about the role of left-frontal activation in emotion and psychopathology.
Exploring Speech Recognition Technology: Children with Learning and Emotional/Behavioral Disorders.
ERIC Educational Resources Information Center
Faris-Cole, Debra; Lewis, Rena
2001-01-01
Intermediate grade students with disabilities in written expression and emotional/behavioral disorders were trained to use discrete or continuous speech input devices for written work. The study found extreme variability in the fidelity of the devices, PowerSecretary and Dragon NaturallySpeaking ranging from 49 percent to 87 percent. Both devices…
Emotional Speech Acts and the Educational Perlocutions of Speech
ERIC Educational Resources Information Center
Gasparatou, Renia
2016-01-01
Over the past decades, there has been an ongoing debate about whether education should aim at the cultivation of emotional wellbeing of self-esteeming personalities or whether it should prioritise literacy and the cognitive development of students. However, it might be the case that the two are not easily distinguished in educational contexts. In…
Böhm, Birgit
2004-01-01
In Germany, an increasing number of children live with one parent alone and have to cope with the separation or divorce of their parents. Emotional drawbacks have frequently been hypothesized for these children. Thus, we studied if such experiences are reflected in speech behavior. Twenty-eight 10- to 13-year-old boys from separated parents (physical separation of the parents was 2 years before the investigation) were compared with 26 boys from parents living together in an interview focusing on attachment-related themes and everyday situations. The interviews were analyzed with regard to coherence of speech, coping with emotional problems, reflectivity, child representation of both parents, and verbal and nonverbal expression of feelings. Boys from separated parents had incoherent speech, difficulties in coping with emotional problems, a poorer reflectivity (thinking about their own mental states and those of others), they represented neither parent supportively and did not show their feelings openly. These results can be traced back to an insecure attachment representation of the boys with separated parents. Copyright 2004 S. Karger AG, Basel
Uskul, Ayse K; Paulmann, Silke; Weick, Mario
2016-02-01
Listeners have to pay close attention to a speaker's tone of voice (prosody) during daily conversations. This is particularly important when trying to infer the emotional state of the speaker. Although a growing body of research has explored how emotions are processed from speech in general, little is known about how psychosocial factors such as social power can shape the perception of vocal emotional attributes. Thus, the present studies explored how social power affects emotional prosody recognition. In a correlational study (Study 1) and an experimental study (Study 2), we show that high power is associated with lower accuracy in emotional prosody recognition than low power. These results, for the first time, suggest that individuals experiencing high or low power perceive emotional tone of voice differently. (c) 2016 APA, all rights reserved).
Poon, Matthew; Schutz, Michael
2015-01-01
Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound “happier” than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of “balanced” major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach’s Well-Tempered Clavier (book 1), as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma (“A,” “B,” “C,” etc.). Consistent with predictions derived from speech, we found major-key (nominally “happy”) pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally “sad”) pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music. PMID:26578990
Neural Processing of Musical and Vocal Emotions Through Cochlear Implants Simulation.
Ahmed, Duha G; Paquette, Sebastian; Zeitouni, Anthony; Lehmann, Alexandre
2018-05-01
Cochlear implants (CIs) partially restore the sense of hearing in the deaf. However, the ability to recognize emotions in speech and music is reduced due to the implant's electrical signal limitations and the patient's altered neural pathways. Electrophysiological correlations of these limitations are not yet well established. Here we aimed to characterize the effect of CIs on auditory emotion processing and, for the first time, directly compare vocal and musical emotion processing through a CI-simulator. We recorded 16 normal hearing participants' electroencephalographic activity while listening to vocal and musical emotional bursts in their original form and in a degraded (CI-simulated) condition. We found prolonged P50 latency and reduced N100-P200 complex amplitude in the CI-simulated condition. This points to a limitation in encoding sound signals processed through CI simulation. When comparing the processing of vocal and musical bursts, we found a delay in latency with the musical bursts compared to the vocal bursts in both conditions (original and CI-simulated). This suggests that despite the cochlear implants' limitations, the auditory cortex can distinguish between vocal and musical stimuli. In addition, it adds to the literature supporting the complexity of musical emotion. Replicating this study with actual CI users might lead to characterizing emotional processing in CI users and could ultimately help develop optimal rehabilitation programs or device processing strategies to improve CI users' quality of life.
Attitudes toward Speech Disorders: Sampling the Views of Cantonese-Speaking Americans.
ERIC Educational Resources Information Center
Bebout, Linda; Arthur, Bradford
1997-01-01
A study of 60 Chinese Americans and 46 controls found the Chinese Americans were more likely to believe persons with speech disorders could improve speech by "trying hard," to view people using deaf speech and people with cleft palates as perhaps being emotionally disturbed, and to regard deaf speech as a limitation. (Author/CR)
Children with bilateral cochlear implants identify emotion in speech and music.
Volkova, Anna; Trehub, Sandra E; Schellenberg, E Glenn; Papsin, Blake C; Gordon, Karen A
2013-03-01
This study examined the ability of prelingually deaf children with bilateral implants to identify emotion (i.e. happiness or sadness) in speech and music. Participants in Experiment 1 were 14 prelingually deaf children from 5-7 years of age who had bilateral implants and 18 normally hearing children from 4-6 years of age. They judged whether linguistically neutral utterances produced by a man and woman sounded happy or sad. Participants in Experiment 2 were 14 bilateral implant users from 4-6 years of age and the same normally hearing children as in Experiment 1. They judged whether synthesized piano excerpts sounded happy or sad. Child implant users' accuracy of identifying happiness and sadness in speech was well above chance levels but significantly below the accuracy achieved by children with normal hearing. Similarly, their accuracy of identifying happiness and sadness in music was well above chance levels but significantly below that of children with normal hearing, who performed at ceiling. For the 12 implant users who participated in both experiments, performance on the speech task correlated significantly with performance on the music task and implant experience was correlated with performance on both tasks. Child implant users' accurate identification of emotion in speech exceeded performance in previous studies, which may be attributable to fewer response alternatives and the use of child-directed speech. Moreover, child implant users' successful identification of emotion in music indicates that the relevant cues are accessible at a relatively young age.
The Effects of the Literal Meaning of Emotional Phrases on the Identification of Vocal Emotions.
Shigeno, Sumi
2018-02-01
This study investigates the discrepancy between the literal emotional content of speech and emotional tone in the identification of speakers' vocal emotions in both the listeners' native language (Japanese), and in an unfamiliar language (random-spliced Japanese). Both experiments involve a "congruent condition," in which the emotion contained in the literal meaning of speech (words and phrases) was compatible with vocal emotion, and an "incongruent condition," in which these forms of emotional information were discordant. Results for Japanese indicated that performance in identifying emotions did not differ significantly between the congruent and incongruent conditions. However, the results for random-spliced Japanese indicated that vocal emotion was correctly identified more often in the congruent than in the incongruent condition. The different results for Japanese and random-spliced Japanese suggested that the literal meaning of emotional phrases influences the listener's perception of the speaker's emotion, and that Japanese participants could infer speakers' intended emotions in the incongruent condition.
Inhibitory Control as a Moderator of Threat-related Interference Biases in Social Anxiety
Gorlin, Eugenia I.; Teachman, Bethany A.
2014-01-01
Prior findings are mixed regarding the presence and direction of threat-related interference biases in social anxiety. The current study examined general inhibitory control (IC), measured by the classic color-word Stroop, as a moderator of the relationship between both threat interference biases (indexed by the emotional Stroop) and several social anxiety indicators. High socially anxious undergraduate students (N=159) completed the emotional and color-word Stroop tasks, followed by an anxiety-inducing speech task. Participants completed measures of trait social anxiety, state anxiety before and during the speech, negative task-interfering cognitions during the speech, and overall self-evaluation of speech performance. Speech duration was used to measure behavioral avoidance. In line with hypotheses, IC moderated the relationship between emotional Stroop bias and every anxiety indicator (with the exception of behavioral avoidance), such that greater social-threat interference was associated with higher anxiety among those with weak IC, whereas lesser social-threat interference was associated with higher anxiety among those with strong IC. Implications for the theory and treatment of threat interference biases in socially anxious individuals are discussed. PMID:24967719
ERIC Educational Resources Information Center
Ullrich, Dieter; Ullrich, Katja; Marten, Magret
2017-01-01
Speech-/language-impaired (SL)-children face problems in school and later life. The significance of "non-cognitive, social-emotional skills" (NCSES) in these children is often underestimated. Aim: Present study of affected SL-children was assessed to analyse the influence of NCSES for long-term school education. Methods: Nineteen…
Speech-rhythm characteristics of client-centered, Gestalt, and rational-emotive therapy interviews.
Chen, C L
1981-07-01
The aim of this study was to discover whether client-centered, Gestalt, and rational-emotive psychotherapy interviews could be described and differentiated on the basis of quantitative measurement of their speech rhythms. These measures were taken from the sound portion of a film showing interviews by Carl Rogers, Frederick Perls, and Albert Ellis. The variables used were total session and percentage of speaking times, speaking turns, vocalizations, interruptions, inside and switching pauses, and speaking rates. The three types of interview had very distinctive patterns of speech-rhythm variables. These patterns suggested that Rogers's Client-centered therapy interview was patient dominated, that Ellis's rational-emotive therapy interview was therapist dominated, and that Perls's Gestalt therapy interview was neither therapist nor patient dominated.
Elements of a Plan-Based Theory of Speech Acts. Technical Report No. 141.
ERIC Educational Resources Information Center
Cohen, Philip R.; Perrault, C. Raymond
This report proposes that people often plan their speech acts to affect their listeners' beliefs, goals, and emotional states and that such language use can be modeled by viewing speech acts as operators in a planning system, allowing both physical and speech acts to be integrated into plans. Methodological issues of how speech acts should be…
ERIC Educational Resources Information Center
Santesso, Diane L.; Schmidt, Louis A.; Trainor, Laurel J.
2007-01-01
Many studies have shown that infants prefer infant-directed (ID) speech to adult-directed (AD) speech. ID speech functions to aid language learning, obtain and/or maintain an infant's attention, and create emotional communication between the infant and caregiver. We examined psychophysiological responses to ID speech that varied in affective…
Visual gut punch: persuasion, emotion, and the constitutional meaning of graphic disclosure.
Goodman, Ellen P
2014-01-01
The ability of government to "nudge" with information mandates, or merely to inform consumers of risks, is circumscribed by First Amendment interests that have been poorly articulated. New graphic cigarette warning labels supplied courts with the first opportunity to assess the informational interests attending novel forms of product disclosures. The D.C. Circuit enjoined them as unconstitutional, compelled by a narrative that the graphic labels converted government from objective informer to ideological persuader, shouting its warning to manipulate consumer decisions. This interpretation will leave little room for graphic disclosure and is already being used to challenge textual disclosure requirements (such as county-of-origin labeling) as unconstitutional. Graphic warning and the increasing reliance on regulation-by-disclosure present new free speech quandaries related to consumer autonomy, state normativity, and speaker liberty. This Article examines the distinct goals of product disclosure requirements and how those goals may serve to vindicate, or to frustrate, listener interests. I argue that many disclosures, and especially warnings, are necessarily both normative and informative, expressing value along with fact. It is not the existence of a norm that raises constitutional concern but rather the insistence on a controversial norm. Turning to the means of disclosure, this Article examines how emotional and graphic communication might change the constitutional calculus. Using autonomy theory and the communications research on speech processing, I conclude that disclosures do not bypass reason simply by reaching for the heart. If large graphic labels are unconstitutional, it will be because of undue burden on the speaker, not because they are emotionally powerful. This Article makes the following distinct contributions to the compelled commercial speech literature: critiques the leading precedent, Zauderer v. Office of Disciplinary Counsel, from a consumer autonomy standpoint; brings to bear empirical communications research on questions of facticity and rationality in emotional and graphic communications; and teases apart and distinguishes among various free speech dangers and contributions of commercial disclosure mandates with a view towards informing policy, law, and research.
Expression of Emotion in Eastern and Western Music Mirrors Vocalization
Bowling, Daniel Liu; Sundararajan, Janani; Han, Shui'er; Purves, Dale
2012-01-01
In Western music, the major mode is typically used to convey excited, happy, bright or martial emotions, whereas the minor mode typically conveys subdued, sad or dark emotions. Recent studies indicate that the differences between these modes parallel differences between the prosodic and spectral characteristics of voiced speech sounds uttered in corresponding emotional states. Here we ask whether tonality and emotion are similarly linked in an Eastern musical tradition. The results show that the tonal relationships used to express positive/excited and negative/subdued emotions in classical South Indian music are much the same as those used in Western music. Moreover, tonal variations in the prosody of English and Tamil speech uttered in different emotional states are parallel to the tonal trends in music. These results are consistent with the hypothesis that the association between musical tonality and emotion is based on universal vocal characteristics of different affective states. PMID:22431970
Expression of emotion in Eastern and Western music mirrors vocalization.
Bowling, Daniel Liu; Sundararajan, Janani; Han, Shui'er; Purves, Dale
2012-01-01
In Western music, the major mode is typically used to convey excited, happy, bright or martial emotions, whereas the minor mode typically conveys subdued, sad or dark emotions. Recent studies indicate that the differences between these modes parallel differences between the prosodic and spectral characteristics of voiced speech sounds uttered in corresponding emotional states. Here we ask whether tonality and emotion are similarly linked in an Eastern musical tradition. The results show that the tonal relationships used to express positive/excited and negative/subdued emotions in classical South Indian music are much the same as those used in Western music. Moreover, tonal variations in the prosody of English and Tamil speech uttered in different emotional states are parallel to the tonal trends in music. These results are consistent with the hypothesis that the association between musical tonality and emotion is based on universal vocal characteristics of different affective states.
ERIC Educational Resources Information Center
Morgan, Shae D.; Ferguson, Sarah Hargus
2017-01-01
Purpose: In this study, we investigated the emotion perceived by young listeners with normal hearing (YNH listeners) and older adults with hearing impairment (OHI listeners) when listening to speech produced conversationally or in a clear speaking style. Method: The first experiment included 18 YNH listeners, and the second included 10 additional…
Seeing Emotion with Your Ears: Emotional Prosody Implicitly Guides Visual Attention to Faces
Rigoulot, Simon; Pell, Marc D.
2012-01-01
Interpersonal communication involves the processing of multimodal emotional cues, particularly facial expressions (visual modality) and emotional speech prosody (auditory modality) which can interact during information processing. Here, we investigated whether the implicit processing of emotional prosody systematically influences gaze behavior to facial expressions of emotion. We analyzed the eye movements of 31 participants as they scanned a visual array of four emotional faces portraying fear, anger, happiness, and neutrality, while listening to an emotionally-inflected pseudo-utterance (Someone migged the pazing) uttered in a congruent or incongruent tone. Participants heard the emotional utterance during the first 1250 milliseconds of a five-second visual array and then performed an immediate recall decision about the face they had just seen. The frequency and duration of first saccades and of total looks in three temporal windows ([0–1250 ms], [1250–2500 ms], [2500–5000 ms]) were analyzed according to the emotional content of faces and voices. Results showed that participants looked longer and more frequently at faces that matched the prosody in all three time windows (emotion congruency effect), although this effect was often emotion-specific (with greatest effects for fear). Effects of prosody on visual attention to faces persisted over time and could be detected long after the auditory information was no longer present. These data imply that emotional prosody is processed automatically during communication and that these cues play a critical role in how humans respond to related visual cues in the environment, such as facial expressions. PMID:22303454
What Do You Mean by That?! An Electrophysiological Study of Emotional and Attitudinal Prosody.
Wickens, Steven; Perry, Conrad
2015-01-01
The use of prosody during verbal communication is pervasive in everyday language and whilst there is a wealth of research examining the prosodic processing of emotional information, much less is known about the prosodic processing of attitudinal information. The current study investigated the online neural processes underlying the prosodic processing of non-verbal emotional and attitudinal components of speech via the analysis of event-related brain potentials related to the processing of anger and sarcasm. To examine these, sentences with prosodic expectancy violations created by cross-splicing a prosodically neutral head ('he has') and a prosodically neutral, angry, or sarcastic ending (e.g., 'a serious face') were used. Task demands were also manipulated, with participants in one experiment performing prosodic classification and participants in another performing probe-verification. Overall, whilst minor differences were found across the tasks, the results suggest that angry and sarcastic prosodic expectancy violations follow a similar processing time-course underpinned by similar neural resources.
The Role of Visual Image and Perception in Speech Development of Children with Speech Pathology
ERIC Educational Resources Information Center
Tsvetkova, L. S.; Kuznetsova, T. M.
1977-01-01
Investigated with 125 children (4-14 years old) with speech, language, or emotional disorders was the assumption that the naming function can be underdeveloped because of defects in the word's gnostic base. (Author/DB)
Emotion regulation choice in an evaluative context: the moderating role of self-esteem.
Shafir, Roni; Guarino, Tara; Lee, Ihno A; Sheppes, Gal
2017-12-01
Evaluative contexts can be stressful, but relatively little is known about how different individuals who vary in responses to self-evaluation make emotion regulatory choices to cope in these situations. To address this gap, participants who vary in self-esteem gave an impromptu speech, rated how they perceived they had performed on multiple evaluative dimensions, and subsequently chose between disengaging attention from emotional processing (distraction) and engaging with emotional processing via changing its meaning (reappraisal), while waiting to receive feedback regarding these evaluative dimensions. According to our framework, distraction can offer stronger short-term relief than reappraisal, but, distraction is costly in the long run relative to reappraisal because it does not allow learning from evaluative feedback. We predicted and found that participants with lower (but not higher) self-esteem react defensively to threat of failure by seeking short-term relief via distraction over the long-term benefit of reappraisal, as perceived failure increases. Implications for the understanding of emotion regulation and self-esteem are discussed.
Saslow, Laura R; McCoy, Shannon; van der Löwe, Ilmo; Cosley, Brandon; Vartan, Arbi; Oveis, Christopher; Keltner, Dacher; Moskowitz, Judith T; Epel, Elissa S
2014-03-01
What can a speech reveal about someone's state? We tested the idea that greater stress reactivity would relate to lower linguistic cognitive complexity while speaking. In Study 1, we tested whether heart rate and emotional stress reactivity to a stressful discussion would relate to lower linguistic complexity. In Studies 2 and 3, we tested whether a greater cortisol response to a standardized stressful task including a speech (Trier Social Stress Test) would be linked to speaking with less linguistic complexity during the task. We found evidence that measures of stress responsivity (emotional and physiological) and chronic stress are tied to variability in the cognitive complexity of speech. Taken together, these results provide evidence that our individual experiences of stress or "stress signatures"-how our body and mind react to stress both in the moment and over the longer term-are linked to how complex our speech under stress. Copyright © 2013 Society for Psychophysiological Research.
Russo, Frank A.
2018-01-01
The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976. PMID:29768426
Dynamic Facial Expressions Prime the Processing of Emotional Prosody.
Garrido-Vásquez, Patricia; Pell, Marc D; Paulmann, Silke; Kotz, Sonja A
2018-01-01
Evidence suggests that emotion is represented supramodally in the human brain. Emotional facial expressions, which often precede vocally expressed emotion in real life, can modulate event-related potentials (N100 and P200) during emotional prosody processing. To investigate these cross-modal emotional interactions, two lines of research have been put forward: cross-modal integration and cross-modal priming. In cross-modal integration studies, visual and auditory channels are temporally aligned, while in priming studies they are presented consecutively. Here we used cross-modal emotional priming to study the interaction of dynamic visual and auditory emotional information. Specifically, we presented dynamic facial expressions (angry, happy, neutral) as primes and emotionally-intoned pseudo-speech sentences (angry, happy) as targets. We were interested in how prime-target congruency would affect early auditory event-related potentials, i.e., N100 and P200, in order to shed more light on how dynamic facial information is used in cross-modal emotional prediction. Results showed enhanced N100 amplitudes for incongruently primed compared to congruently and neutrally primed emotional prosody, while the latter two conditions did not significantly differ. However, N100 peak latency was significantly delayed in the neutral condition compared to the other two conditions. Source reconstruction revealed that the right parahippocampal gyrus was activated in incongruent compared to congruent trials in the N100 time window. No significant ERP effects were observed in the P200 range. Our results indicate that dynamic facial expressions influence vocal emotion processing at an early point in time, and that an emotional mismatch between a facial expression and its ensuing vocal emotional signal induces additional processing costs in the brain, potentially because the cross-modal emotional prediction mechanism is violated in case of emotional prime-target incongruency.
Emotions in freely varying and mono-pitched vowels, acoustic and EGG analyses.
Waaramaa, Teija; Palo, Pertti; Kankare, Elina
2015-12-01
Vocal emotions are expressed either by speech or singing. The difference is that in singing the pitch is predetermined while in speech it may vary freely. It was of interest to study whether there were voice quality differences between freely varying and mono-pitched vowels expressed by professional actors. Given their profession, actors have to be able to express emotions both by speech and singing. Electroglottogram and acoustic analyses of emotional utterances embedded in expressions of freely varying vowels [a:], [i:], [u:] (96 samples) and mono-pitched protracted vowels (96 samples) were studied. Contact quotient (CQEGG) was calculated using 35%, 55%, and 80% threshold levels. Three different threshold levels were used in order to evaluate their effects on emotions. Genders were studied separately. The results suggested significant gender differences for CQEGG 80% threshold level. SPL, CQEGG, and F4 were used to convey emotions, but to a lesser degree, when F0 was predetermined. Moreover, females showed fewer significant variations than males. Both genders used more hypofunctional phonation type in mono-pitched utterances than in the expressions with freely varying pitch. The present material warrants further study of the interplay between CQEGG threshold levels and formant frequencies, and listening tests to investigate the perceptual value of the mono-pitched vowels in the communication of emotions.
Geers, Ann E; Davidson, Lisa S; Uchanski, Rosalie M; Nicholas, Johanna G
2013-09-01
This study documented the ability of experienced pediatric cochlear implant (CI) users to perceive linguistic properties (what is said) and indexical attributes (emotional intent and talker identity) of speech, and examined the extent to which linguistic (LSP) and indexical (ISP) perception skills are related. Preimplant-aided hearing, age at implantation, speech processor technology, CI-aided thresholds, sequential bilateral cochlear implantation, and academic integration with hearing age-mates were examined for their possible relationships to both LSP and ISP skills. Sixty 9- to 12-year olds, first implanted at an early age (12 to 38 months), participated in a comprehensive test battery that included the following LSP skills: (1) recognition of monosyllabic words at loud and soft levels, (2) repetition of phonemes and suprasegmental features from nonwords, and (3) recognition of key words from sentences presented within a noise background, and the following ISP skills: (1) discrimination of across-gender and within-gender (female) talkers and (2) identification and discrimination of emotional content from spoken sentences. A group of 30 age-matched children without hearing loss completed the nonword repetition, and talker- and emotion-perception tasks for comparison. Word-recognition scores decreased with signal level from a mean of 77% correct at 70 dB SPL to 52% at 50 dB SPL. On average, CI users recognized 50% of key words presented in sentences that were 9.8 dB above background noise. Phonetic properties were repeated from nonword stimuli at about the same level of accuracy as suprasegmental attributes (70 and 75%, respectively). The majority of CI users identified emotional content and differentiated talkers significantly above chance levels. Scores on LSP and ISP measures were combined into separate principal component scores and these components were highly correlated (r = 0.76). Both LSP and ISP component scores were higher for children who received a CI at the youngest ages, upgraded to more recent CI technology and had lower CI-aided thresholds. Higher scores, for both LSP and ISP components, were also associated with higher language levels and mainstreaming at younger ages. Higher ISP scores were associated with better social skills. Results strongly support a link between indexical and linguistic properties in perceptual analysis of speech. These two channels of information appear to be processed together in parallel by the auditory system and are inseparable in perception. Better speech performance, for both linguistic and indexical perception, is associated with younger age at implantation and use of more recent speech processor technology. Children with better speech perception demonstrated better spoken language, earlier academic mainstreaming, and placement in more typically sized classrooms (i.e., >20 students). Well-developed social skills were more highly associated with the ability to discriminate the nuances of talker identity and emotion than with the ability to recognize words and sentences through listening. The extent to which early cochlear implantation enabled these early-implanted children to make use of both linguistic and indexical properties of speech influenced not only their development of spoken language, but also their ability to function successfully in a hearing world.
Geers, Ann; Davidson, Lisa; Uchanski, Rosalie; Nicholas, Johanna
2013-01-01
Objectives This study documented the ability of experienced pediatric cochlear implant (CI) users to perceive linguistic properties (what is said) and indexical attributes (emotional intent and talker identity) of speech, and examined the extent to which linguistic (LSP) and indexical (ISP) perception skills are related. Pre-implant aided hearing, age at implantation, speech processor technology, CI-aided thresholds, sequential bilateral cochlear implantation, and academic integration with hearing age-mates were examined for their possible relationships to both LSP and ISP skills. Design Sixty 9–12 year olds, first implanted at an early age (12–38 months), participated in a comprehensive test battery that included the following LSP skills: 1) recognition of monosyllabic words at loud and soft levels, 2) repetition of phonemes and suprasegmental features from non-words, and 3) recognition of keywords from sentences presented within a noise background, and the following ISP skills: 1) discrimination of male from female and female from female talkers and 2) identification and discrimination of emotional content from spoken sentences. A group of 30 age-matched children without hearing loss completed the non-word repetition, and talker- and emotion-perception tasks for comparison. Results Word recognition scores decreased with signal level from a mean of 77% correct at 70 dB SPL to 52% at 50 dB SPL. On average, CI users recognized 50% of keywords presented in sentences that were 9.8 dB above background noise. Phonetic properties were repeated from non-word stimuli at about the same level of accuracy as suprasegmental attributes (70% and 75%, respectively). The majority of CI users identified emotional content and differentiated talkers significantly above chance levels. Scores on LSP and ISP measures were combined into separate principal component scores and these components were highly correlated (r = .76). Both LSP and ISP component scores were higher for children who received a CI at the youngest ages, upgraded to more recent CI technology and had lower CI-aided thresholds. Higher scores, for both LSP and ISP components, were also associated with higher language levels and mainstreaming at younger ages. Higher ISP scores were associated with better social skills. Conclusions Results strongly support a link between indexical and linguistic properties in perceptual analysis of speech. These two channels of information appear to be processed together in parallel by the auditory system and are inseparable in perception. Better speech performance, for both linguistic and indexical perception, is associated with younger age at implantation and use of more recent speech processor technology. Children with better speech perception demonstrated better spoken language, earlier academic mainstreaming, and placement in more typically-sized classrooms (i.e., >20 students). Well-developed social skills were more highly associated with the ability to discriminate the nuances of talker identity and emotion than with the ability to recognize words and sentences through listening. The extent to which early cochlear implantation enabled these early-implanted children to make use of both linguistic and indexical properties of speech influenced not only their development of spoken language, but also their ability to function successfully in a hearing world. PMID:23652814
Rhythm as a Coordinating Device: Entrainment With Disordered Speech
Borrie, Stephanie A.; Liss, Julie M.
2014-01-01
Purpose The rhythmic entrainment (coordination) of behavior during human interaction is a powerful phenomenon, considered essential for successful communication, supporting social and emotional connection, and facilitating sense-making and information exchange. Disruption in entrainment likely occurs in conversations involving those with speech and language impairment, but its contribution to communication disorders has not been defined. As a first step to exploring this phenomenon in clinical populations, the present investigation examined the influence of disordered speech on the speech production properties of healthy interactants. Method Twenty-nine neurologically healthy interactants participated in a quasi-conversational paradigm, in which they read sentences (response) in response to hearing prerecorded sentences (exposure) from speakers with dysarthria (n = 4) and healthy controls (n = 4). Recordings of read sentences prior to the task were also collected (habitual). Results Findings revealed that interactants modified their speaking rate and pitch variation to align more closely with the disordered speech. Production shifts in these rhythmic properties, however, remained significantly different from corresponding properties in dysarthric speech. Conclusion Entrainment offers a new avenue for exploring speech and language impairment, addressing a communication process not currently explained by existing frameworks. This article offers direction for advancing this line of inquiry. PMID:24686410
ERIC Educational Resources Information Center
Stiles, Matthew
2013-01-01
Research has identified a significant relationship between social, emotional and behavioural difficulties (SEBD) and speech, language and communication difficulties (SLCD). However, little has been published regarding the levels of knowledge and skill that practitioners working with pupils experiencing SEBD have in this important area, nor how…
From Speech to Emotional Interaction: EmotiRob Project
NASA Astrophysics Data System (ADS)
Le Tallec, Marc; Saint-Aimé, Sébastien; Jost, Céline; Villaneau, Jeanne; Antoine, Jean-Yves; Letellier-Zarshenas, Sabine; Le-Pévédic, Brigitte; Duhaut, Dominique
This article presents research work done in the domain of nonverbal emotional interaction for the EmotiRob project. It is a component of the MAPH project, the objective of which is to give comfort to vulnerable children and/or those undergoing long-term hospitalisation through the help of an emotional robot companion. It is important to note that we are not trying to reproduce human emotion and behavior, but trying to make a robot emotionally expressive. This paper will present the different hypotheses we have used from understanding to emotional reaction. We begin the article with a presentation of the MAPH and EmotiRob project. Then, we quickly describe the speech undestanding system, the iGrace computational model of emotions and integration of dynamics behavior. We conclude with a description of the architecture of Emi, as well as improvements to be made to its next generation.
Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M
2003-01-01
In order to explore the process of adaptation of children to school environment psychophysiological characteristics of perception of emotional speech information and school progress were experimentally studied. Forty-six schoolchildren of three age groups (7-10, 11-13, and 14-17 years old) participated in the study. In experimental session, a test sentence was presented to a subject through headphones with two emotional intonations (joy and anger) and without emotional expression. A subject had to recognize the type of emotion. His/her answers were recorded. School progress was determined by year grades in Russian, foreign language, and mathematics. Analysis of variance and linear regression analysis showed that ontogenetic features of a correlation between psychophysiological mechanisms of emotion recognition and school progress were gender- and subject-dependent. This correlation was stronger in 7-13-year-old children than in senior children. This age boundary was passed by the girls earlier than by the boys.
How Stuttering Develops: The Multifactorial Dynamic Pathways Theory
Weber, Christine
2017-01-01
Purpose We advanced a multifactorial, dynamic account of the complex, nonlinear interactions of motor, linguistic, and emotional factors contributing to the development of stuttering. Our purpose here is to update our account as the multifactorial dynamic pathways theory. Method We review evidence related to how stuttering develops, including genetic/epigenetic factors; motor, linguistic, and emotional features; and advances in neuroimaging studies. We update evidence for our earlier claim: Although stuttering ultimately reflects impairment in speech sensorimotor processes, its course over the life span is strongly conditioned by linguistic and emotional factors. Results Our current account places primary emphasis on the dynamic developmental context in which stuttering emerges and follows its course during the preschool years. Rapid changes in many neurobehavioral systems are ongoing, and critical interactions among these systems likely play a major role in determining persistence of or recovery from stuttering. Conclusion Stuttering, or childhood onset fluency disorder (Diagnostic and Statistical Manual of Mental Disorders, 5th edition; American Psychiatric Association [APA], 2013), is a neurodevelopmental disorder that begins when neural networks supporting speech, language, and emotional functions are rapidly developing. The multifactorial dynamic pathways theory motivates experimental and clinical work to determine the specific factors that contribute to each child's pathway to the diagnosis of stuttering and those most likely to promote recovery. PMID:28837728
How Stuttering Develops: The Multifactorial Dynamic Pathways Theory.
Smith, Anne; Weber, Christine
2017-09-18
We advanced a multifactorial, dynamic account of the complex, nonlinear interactions of motor, linguistic, and emotional factors contributing to the development of stuttering. Our purpose here is to update our account as the multifactorial dynamic pathways theory. We review evidence related to how stuttering develops, including genetic/epigenetic factors; motor, linguistic, and emotional features; and advances in neuroimaging studies. We update evidence for our earlier claim: Although stuttering ultimately reflects impairment in speech sensorimotor processes, its course over the life span is strongly conditioned by linguistic and emotional factors. Our current account places primary emphasis on the dynamic developmental context in which stuttering emerges and follows its course during the preschool years. Rapid changes in many neurobehavioral systems are ongoing, and critical interactions among these systems likely play a major role in determining persistence of or recovery from stuttering. Stuttering, or childhood onset fluency disorder (Diagnostic and Statistical Manual of Mental Disorders, 5th edition; American Psychiatric Association [APA], 2013), is a neurodevelopmental disorder that begins when neural networks supporting speech, language, and emotional functions are rapidly developing. The multifactorial dynamic pathways theory motivates experimental and clinical work to determine the specific factors that contribute to each child's pathway to the diagnosis of stuttering and those most likely to promote recovery.
Music and Its Inductive Power: A Psychobiological and Evolutionary Approach to Musical Emotions
Reybrouck, Mark; Eerola, Tuomas
2017-01-01
The aim of this contribution is to broaden the concept of musical meaning from an abstract and emotionally neutral cognitive representation to an emotion-integrating description that is related to the evolutionary approach to music. Starting from the dispositional machinery for dealing with music as a temporal and sounding phenomenon, musical emotions are considered as adaptive responses to be aroused in human beings as the product of neural structures that are specialized for their processing. A theoretical and empirical background is provided in order to bring together the findings of music and emotion studies and the evolutionary approach to musical meaning. The theoretical grounding elaborates on the transition from referential to affective semantics, the distinction between expression and induction of emotions, and the tension between discrete-digital and analog-continuous processing of the sounds. The empirical background provides evidence from several findings such as infant-directed speech, referential emotive vocalizations and separation calls in lower mammals, the distinction between the acoustic and vehicle mode of sound perception, and the bodily and physiological reactions to the sounds. It is argued, finally, that early affective processing reflects the way emotions make our bodies feel, which in turn reflects on the emotions expressed and decoded. As such there is a dynamic tension between nature and nurture, which is reflected in the nature-nurture-nature cycle of musical sense-making. PMID:28421015
Music and Its Inductive Power: A Psychobiological and Evolutionary Approach to Musical Emotions.
Reybrouck, Mark; Eerola, Tuomas
2017-01-01
The aim of this contribution is to broaden the concept of musical meaning from an abstract and emotionally neutral cognitive representation to an emotion-integrating description that is related to the evolutionary approach to music. Starting from the dispositional machinery for dealing with music as a temporal and sounding phenomenon, musical emotions are considered as adaptive responses to be aroused in human beings as the product of neural structures that are specialized for their processing. A theoretical and empirical background is provided in order to bring together the findings of music and emotion studies and the evolutionary approach to musical meaning. The theoretical grounding elaborates on the transition from referential to affective semantics, the distinction between expression and induction of emotions, and the tension between discrete-digital and analog-continuous processing of the sounds. The empirical background provides evidence from several findings such as infant-directed speech, referential emotive vocalizations and separation calls in lower mammals, the distinction between the acoustic and vehicle mode of sound perception, and the bodily and physiological reactions to the sounds. It is argued, finally, that early affective processing reflects the way emotions make our bodies feel, which in turn reflects on the emotions expressed and decoded. As such there is a dynamic tension between nature and nurture, which is reflected in the nature-nurture-nature cycle of musical sense-making.
Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin
2016-01-01
Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
Consensus Paper: Cerebellum and Emotion.
Adamaszek, M; D'Agata, F; Ferrucci, R; Habas, C; Keulen, S; Kirkby, K C; Leggio, M; Mariën, P; Molinari, M; Moulton, E; Orsi, L; Van Overwalle, F; Papadelis, C; Priori, A; Sacchetti, B; Schutter, D J; Styliadis, C; Verhoeven, J
2017-04-01
Over the past three decades, insights into the role of the cerebellum in emotional processing have substantially increased. Indeed, methodological refinements in cerebellar lesion studies and major technological advancements in the field of neuroscience are in particular responsible to an exponential growth of knowledge on the topic. It is timely to review the available data and to critically evaluate the current status of the role of the cerebellum in emotion and related domains. The main aim of this article is to present an overview of current facts and ongoing debates relating to clinical, neuroimaging, and neurophysiological findings on the role of the cerebellum in key aspects of emotion. Experts in the field of cerebellar research discuss the range of cerebellar contributions to emotion in nine topics. Topics include the role of the cerebellum in perception and recognition, forwarding and encoding of emotional information, and the experience and regulation of emotional states in relation to motor, cognitive, and social behaviors. In addition, perspectives including cerebellar involvement in emotional learning, pain, emotional aspects of speech, and neuropsychiatric aspects of the cerebellum in mood disorders are briefly discussed. Results of this consensus paper illustrate how theory and empirical research have converged to produce a composite picture of brain topography, physiology, and function that establishes the role of the cerebellum in many aspects of emotional processing.
Musical anhedonia: selective loss of emotional experience in listening to music.
Satoh, Masayuki; Nakase, Taizen; Nagata, Ken; Tomimoto, Hidekazu
2011-10-01
Recent case studies have suggested that emotion perception and emotional experience of music have independent cognitive processing. We report a patient who showed selective impairment of emotional experience only in listening to music, that is musical anhednia. A 71-year-old right-handed man developed an infarction in the right parietal lobe. He found himself unable to experience emotion in listening to music, even to which he had listened pleasantly before the illness. In neuropsychological assessments, his intellectual, memory, and constructional abilities were normal. Speech audiometry and recognition of environmental sounds were within normal limits. Neuromusicological assessments revealed no abnormality in the perception of elementary components of music, expression and emotion perception of music. Brain MRI identified the infarct lesion in the right inferior parietal lobule. These findings suggest that emotional experience of music could be selectively impaired without any disturbance of other musical, neuropsychological abilities. The right parietal lobe might participate in emotional experience in listening to music.
A study of speech emotion recognition based on hybrid algorithm
NASA Astrophysics Data System (ADS)
Zhu, Ju-xia; Zhang, Chao; Lv, Zhao; Rao, Yao-quan; Wu, Xiao-pei
2011-10-01
To effectively improve the recognition accuracy of the speech emotion recognition system, a hybrid algorithm which combines Continuous Hidden Markov Model (CHMM), All-Class-in-One Neural Network (ACON) and Support Vector Machine (SVM) is proposed. In SVM and ACON methods, some global statistics are used as emotional features, while in CHMM method, instantaneous features are employed. The recognition rate by the proposed method is 92.25%, with the rejection rate to be 0.78%. Furthermore, it obtains the relative increasing of 8.53%, 4.69% and 0.78% compared with ACON, CHMM and SVM methods respectively. The experiment result confirms the efficiency of distinguishing anger, happiness, neutral and sadness emotional states.
Authentic and Play-Acted Vocal Emotion Expressions Reveal Acoustic Differences
Jürgens, Rebecca; Hammerschmidt, Kurt; Fischer, Julia
2011-01-01
Play-acted emotional expressions are a frequent aspect in our life, ranging from deception to theater, film, and radio drama, to emotion research. To date, however, it remained unclear whether play-acted emotions correspond to spontaneous emotion expressions. To test whether acting influences the vocal expression of emotion, we compared radio sequences of naturally occurring emotions to actors’ portrayals. It was hypothesized that play-acted expressions were performed in a more stereotyped and aroused fashion. Our results demonstrate that speech segments extracted from play-acted and authentic expressions differ in their voice quality. Additionally, the play-acted speech tokens revealed a more variable F0-contour. Despite these differences, the results did not support the hypothesis that the variation was due to changes in arousal. This analysis revealed that differences in perception of play-acted and authentic emotional stimuli reported previously cannot simply be attributed to differences in arousal, but by slight and implicitly perceptible differences in encoding. PMID:21847385
Job Stress of School-Based Speech-Language Pathologists
ERIC Educational Resources Information Center
Harris, Stephanie Ferney; Prater, Mary Anne; Dyches, Tina Taylor; Heath, Melissa Allen
2009-01-01
Stress and burnout contribute significantly to the shortages of school-based speech-language pathologists (SLPs). At the request of the Utah State Office of Education, the researchers measured the stress levels of 97 school-based SLPs using the "Speech-Language Pathologist Stress Inventory." Results indicated that participants' emotional-fatigue…
Selective attention modulates early human evoked potentials during emotional face-voice processing.
Ho, Hao Tam; Schröger, Erich; Kotz, Sonja A
2015-04-01
Recent findings on multisensory integration suggest that selective attention influences cross-sensory interactions from an early processing stage. Yet, in the field of emotional face-voice integration, the hypothesis prevails that facial and vocal emotional information interacts preattentively. Using ERPs, we investigated the influence of selective attention on the perception of congruent versus incongruent combinations of neutral and angry facial and vocal expressions. Attention was manipulated via four tasks that directed participants to (i) the facial expression, (ii) the vocal expression, (iii) the emotional congruence between the face and the voice, and (iv) the synchrony between lip movement and speech onset. Our results revealed early interactions between facial and vocal emotional expressions, manifested as modulations of the auditory N1 and P2 amplitude by incongruent emotional face-voice combinations. Although audiovisual emotional interactions within the N1 time window were affected by the attentional manipulations, interactions within the P2 modulation showed no such attentional influence. Thus, we propose that the N1 and P2 are functionally dissociated in terms of emotional face-voice processing and discuss evidence in support of the notion that the N1 is associated with cross-sensory prediction, whereas the P2 relates to the derivation of an emotional percept. Essentially, our findings put the integration of facial and vocal emotional expressions into a new perspective-one that regards the integration process as a composite of multiple, possibly independent subprocesses, some of which are susceptible to attentional modulation, whereas others may be influenced by additional factors.
Saxbe, Darby E; Yang, Xiao-Fei; Borofsky, Larissa A; Immordino-Yang, Mary Helen
2013-10-01
Complex social emotions involve both abstract cognitions and bodily sensations, and individuals may differ on their relative reliance on these. We hypothesized that individuals' descriptions of their feelings during a semi-structured emotion induction interview would reveal two distinct psychological styles-a more abstract, cognitive style and a more body-based, affective style-and that these would be associated with somatosensory neural activity. We examined 28 participants' open-ended verbal responses to admiration- and compassion-provoking narratives in an interview and BOLD activity to the same narratives during subsequent functional magnetic resonance imaging scanning. Consistent with hypotheses, individuals' affective and cognitive word use were stable across emotion conditions, negatively correlated and unrelated to reported emotion strength in the scanner. Greater use of affective relative to cognitive words predicted more activation in SI, SII, middle anterior cingulate cortex and insula during emotion trials. The results suggest that individuals' verbal descriptions of their feelings reflect differential recruitment of neural regions supporting physical body awareness. Although somatosensation has long been recognized as an important component of emotion processing, these results offer 'proof of concept' that individual differences in open-ended speech reflect different processing styles at the neurobiological level. This study also demonstrates SI involvement during social emotional experience.
Saxbe, Darby E.; Yang, Xiao-Fei; Borofsky, Larissa A.
2013-01-01
Complex social emotions involve both abstract cognitions and bodily sensations, and individuals may differ on their relative reliance on these. We hypothesized that individuals’ descriptions of their feelings during a semi-structured emotion induction interview would reveal two distinct psychological styles—a more abstract, cognitive style and a more body-based, affective style—and that these would be associated with somatosensory neural activity. We examined 28 participants’ open-ended verbal responses to admiration- and compassion-provoking narratives in an interview and BOLD activity to the same narratives during subsequent functional magnetic resonance imaging scanning. Consistent with hypotheses, individuals’ affective and cognitive word use were stable across emotion conditions, negatively correlated and unrelated to reported emotion strength in the scanner. Greater use of affective relative to cognitive words predicted more activation in SI, SII, middle anterior cingulate cortex and insula during emotion trials. The results suggest that individuals’ verbal descriptions of their feelings reflect differential recruitment of neural regions supporting physical body awareness. Although somatosensation has long been recognized as an important component of emotion processing, these results offer ‘proof of concept’ that individual differences in open-ended speech reflect different processing styles at the neurobiological level. This study also demonstrates SI involvement during social emotional experience. PMID:22798396
Wegrzyn, Martin; Herbert, Cornelia; Ethofer, Thomas; Flaisch, Tobias; Kissler, Johanna
2017-11-01
Visually presented emotional words are processed preferentially and effects of emotional content are similar to those of explicit attention deployment in that both amplify visual processing. However, auditory processing of emotional words is less well characterized and interactions between emotional content and task-induced attention have not been fully understood. Here, we investigate auditory processing of emotional words, focussing on how auditory attention to positive and negative words impacts their cerebral processing. A Functional magnetic resonance imaging (fMRI) study manipulating word valence and attention allocation was performed. Participants heard negative, positive and neutral words to which they either listened passively or attended by counting negative or positive words, respectively. Regardless of valence, active processing compared to passive listening increased activity in primary auditory cortex, left intraparietal sulcus, and right superior frontal gyrus (SFG). The attended valence elicited stronger activity in left inferior frontal gyrus (IFG) and left SFG, in line with these regions' role in semantic retrieval and evaluative processing. No evidence for valence-specific attentional modulation in auditory regions or distinct valence-specific regional activations (i.e., negative > positive or positive > negative) was obtained. Thus, allocation of auditory attention to positive and negative words can substantially increase their processing in higher-order language and evaluative brain areas without modulating early stages of auditory processing. Inferior and superior frontal brain structures mediate interactions between emotional content, attention, and working memory when prosodically neutral speech is processed. Copyright © 2017 Elsevier Ltd. All rights reserved.
Audiovisual integration of emotional signals in voice and face: an event-related fMRI study.
Kreifelts, Benjamin; Ethofer, Thomas; Grodd, Wolfgang; Erb, Michael; Wildgruber, Dirk
2007-10-01
In a natural environment, non-verbal emotional communication is multimodal (i.e. speech melody, facial expression) and multifaceted concerning the variety of expressed emotions. Understanding these communicative signals and integrating them into a common percept is paramount to successful social behaviour. While many previous studies have focused on the neurobiology of emotional communication in the auditory or visual modality alone, far less is known about multimodal integration of auditory and visual non-verbal emotional information. The present study investigated this process using event-related fMRI. Behavioural data revealed that audiovisual presentation of non-verbal emotional information resulted in a significant increase in correctly classified stimuli when compared with visual and auditory stimulation. This behavioural gain was paralleled by enhanced activation in bilateral posterior superior temporal gyrus (pSTG) and right thalamus, when contrasting audiovisual to auditory and visual conditions. Further, a characteristic of these brain regions, substantiating their role in the emotional integration process, is a linear relationship between the gain in classification accuracy and the strength of the BOLD response during the bimodal condition. Additionally, enhanced effective connectivity between audiovisual integration areas and associative auditory and visual cortices was observed during audiovisual stimulation, offering further insight into the neural process accomplishing multimodal integration. Finally, we were able to document an enhanced sensitivity of the putative integration sites to stimuli with emotional non-verbal content as compared to neutral stimuli.
van den Broek, Egon L
2004-01-01
The voice embodies three sources of information: speech, the identity, and the emotional state of the speaker (i.e., emotional prosody). The latter feature is resembled by the variability of the F0 (also named fundamental frequency of pitch) (SD F0). To extract this feature, Emotional Prosody Measurement (EPM) was developed, which consists of 1) speech recording, 2) removal of speckle noise, 3) a Fourier Transform to extract the F0-signal, and 4) the determination of SD F0. After a pilot study in which six participants mimicked emotions by their voice, the core experiment was conducted to see whether EPM is successful. Twenty-five patients suffering from a panic disorder with agoraphobia participated. Two methods (story-telling and reliving) were used to trigger anxiety and were compared with comparable but more relaxed conditions. This resulted in a unique database of speech samples that was used to compare the EPM with the Subjective Unit of Distress to validate it as measure for anxiety/stress. The experimental manipulation of anxiety proved to be successful and EPM proved to be a successful evaluation method for psychological therapy effectiveness.
[Rehabilitative measures in hearing-impaired children].
von Wedel, H; von Wedel, U C; Zorowka, P
1991-12-01
On the basis of certain fundamental data on the maturation processes of the central auditory pathways in early childhood the importance of early intervention with hearing aids is discussed and emphasized. Pathological hearing, that is acoustical deprivation in early childhood will influence the maturation process. Very often speech development is delayed if diagnosis and therapy or rehabilitation are not early enough. Anamnesis, early diagnosis and clinical differential diagnosis are required before a hearing aid can be fitted. Selection criteria and adjustment parameters are discussed, showing that the hearing aid fitting procedure must be embedded in a complex matrix of requirements related to the development of speech as well as to the cognitive, emotional and social development of the child. As a rule, finding and preparing the "best" hearing aids (binaural fitting is obligatory) for a child is a long and often difficult process, which can only be performed by specialists who are pedo-audiologists. After the binaural fitting of hearing aids an intensive hearing and speech education in close cooperation between parents, pedo-audiologist and teacher must support the whole development of the child.
Situational influences on rhythmicity in speech, music, and their interaction
Hawkins, Sarah
2014-01-01
Brain processes underlying the production and perception of rhythm indicate considerable flexibility in how physical signals are interpreted. This paper explores how that flexibility might play out in rhythmicity in speech and music. There is much in common across the two domains, but there are also significant differences. Interpretations are explored that reconcile some of the differences, particularly with respect to how functional properties modify the rhythmicity of speech, within limits imposed by its structural constraints. Functional and structural differences mean that music is typically more rhythmic than speech, and that speech will be more rhythmic when the emotions are more strongly engaged, or intended to be engaged. The influence of rhythmicity on attention is acknowledged, and it is suggested that local increases in rhythmicity occur at times when attention is required to coordinate joint action, whether in talking or music-making. Evidence is presented which suggests that while these short phases of heightened rhythmical behaviour are crucial to the success of transitions in communicative interaction, their modality is immaterial: they all function to enhance precise temporal prediction and hence tightly coordinated joint action. PMID:25385776
Saslow, Laura R.; McCoy, Shannon; van der Löwe, Ilmo; Cosley, Brandon; Vartan, Arbi; Oveis, Christopher; Keltner, Dacher; Moskowitz, Judith T.; Epel, Elissa S.
2014-01-01
What can a speech reveal about someone's state? We tested the idea that greater stress reactivity would relate to lower linguistic cognitive complexity while speaking. In Study 1, we tested whether heart rate and emotional stress reactivity to a stressful discussion would relate to lower linguistic complexity. In Studies 2 and 3 we tested whether a greater cortisol response to a standardized stressful task including a speech (Trier Social Stress Test) would be linked to speaking with less linguistic complexity during the task. We found evidence that measures of stress responsivity (emotional and physiological) and chronic stress are tied to variability in the cognitive complexity of speech. Taken together, these results provide evidence that our individual experiences of stress or ‘stress signatures’—how our body and mind react to stress both in the moment and over the longer term—are linked to how complexly we speak under stress. PMID:24354732
A hypothesis on the biological origins and social evolution of music and dance.
Wang, Tianyan
2015-01-01
The origins of music and musical emotions is still an enigma, here I propose a comprehensive hypothesis on the origins and evolution of music, dance, and speech from a biological and sociological perspective. I suggest that every pitch interval between neighboring notes in music represents corresponding movement pattern through interpreting the Doppler effect of sound, which not only provides a possible explanation for the transposition invariance of music, but also integrates music and dance into a common form-rhythmic movements. Accordingly, investigating the origins of music poses the question: why do humans appreciate rhythmic movements? I suggest that human appreciation of rhythmic movements and rhythmic events developed from the natural selection of organisms adapting to the internal and external rhythmic environments. The perception and production of, as well as synchronization with external and internal rhythms are so vital for an organism's survival and reproduction, that animals have a rhythm-related reward and emotion (RRRE) system. The RRRE system enables the appreciation of rhythmic movements and events, and is integral to the origination of music, dance and speech. The first type of rewards and emotions (rhythm-related rewards and emotions, RRREs) are evoked by music and dance, and have biological and social functions, which in turn, promote the evolution of music, dance and speech. These functions also evoke a second type of rewards and emotions, which I name society-related rewards and emotions (SRREs). The neural circuits of RRREs and SRREs develop in species formation and personal growth, with congenital and acquired characteristics, respectively, namely music is the combination of nature and culture. This hypothesis provides probable selection pressures and outlines the evolution of music, dance, and speech. The links between the Doppler effect and the RRREs and SRREs can be empirically tested, making the current hypothesis scientifically concrete.
Speaker recognition with temporal cues in acoustic and electric hearing
NASA Astrophysics Data System (ADS)
Vongphoe, Michael; Zeng, Fan-Gang
2005-08-01
Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
ERIC Educational Resources Information Center
Bowers, Andrew L.; Crawcour, Stephen C.; Saltuklaroglu, Tim; Kalinowski, Joseph
2010-01-01
Background: People who stutter are often acutely aware that their speech disruptions, halted communication, and aberrant struggle behaviours evoke reactions in communication partners. Considering that eye gaze behaviours have emotional, cognitive, and pragmatic overtones for communicative interactions and that previous studies have indicated…
ERIC Educational Resources Information Center
Zhang, Jianliang; Kalinowski, Joseph; Saltuklaroglu, Tim; Hudock, Daniel
2010-01-01
Background: Previous studies have found simultaneous increases in skin conductance response and decreases in heart rate when normally fluent speakers watched and listened to stuttered speech compared with fluent speech, suggesting that stuttering induces arousal and emotional unpleasantness in listeners. However, physiological responses of persons…
ERIC Educational Resources Information Center
Guntupalli, Vijaya K.; Everhart, D. Erik; Kalinowski, Joseph; Nanjundeswaran, Chayadevie; Saltuklaroglu, Tim
2007-01-01
Background: People who stutter produce speech that is characterized by intermittent, involuntary part-word repetitions and prolongations. In addition to these signature acoustic manifestations, those who stutter often display repetitive and fixated behaviours outside the speech producing mechanism (e.g. in the head, arm, fingers, nares, etc.).…
Perspective taking in children's narratives about jealousy.
Aldrich, Naomi J; Tenenbaum, Harriet R; Brooks, Patricia J; Harrison, Karine; Sines, Jennie
2011-03-01
This study explored relationships between perspective-taking, emotion understanding, and children's narrative abilities. Younger (23 5-/6-year-olds) and older (24 7-/8-year-olds) children generated fictional narratives, using a wordless picture book, about a frog experiencing jealousy. Children's emotion understanding was assessed through a standardized test of emotion comprehension and their ability to convey the jealousy theme of the story. Perspective-taking ability was assessed with respect to children's use of narrative evaluation (i.e., narrative coherence, mental state language, supplementary evaluative speech, use of subjective language, and placement of emotion expression). Older children scored higher than younger children on emotion comprehension and on understanding the story's complex emotional theme, including the ability to identify a rival. They were more advanced in perspective-taking abilities, and selectively used emotion expressions to highlight story episodes. Subjective perspective taking and narrative coherence were predictive of children's elaboration of the jealousy theme. Use of supplementary evaluative speech, in turn, was predictive of both subjective perspective taking and narrative coherence. ©2010 The British Psychological Society.
Intimate insight: MDMA changes how people talk about significant others
Baggott, Matthew J.; Kirkpatrick, Matthew G.; Bedi, Gillinder; de Wit, Harriet
2015-01-01
Rationale ±3,4-methylenedioxymethamphetamine (MDMA) is widely believed to increase sociability. The drug alters speech production and fluency, and may influence speech content. Here, we investigated the effect of MDMA on speech content, which may reveal how this drug affects social interactions. Method 35 healthy volunteers with prior MDMA experience completed this two-session, within-subjects, double-blind study during which they received 1.5 mg/kg oral MDMA and placebo. Participants completed a 5-min standardized talking task during which they discussed a close personal relationship (e.g., a friend or family member) with a research assistant. The conversations were analyzed for selected content categories (e.g., words pertaining to affect, social interaction, and cognition), using both a standard dictionary method (Pennebaker’s Linguistic Inquiry and Word Count: LIWC) and a machine learning method using random forest classifiers. Results Both analytic methods revealed that MDMA altered speech content relative to placebo. Using LIWC scores, the drug increased use of social and sexual words, consistent with reports that MDMA increases willingness to disclose. Using the machine learning algorithm, we found that MDMA increased use of social words and words relating to both positive and negative emotions. Conclusions These findings are consistent with reports that MDMA acutely alters speech content, specifically increasing emotional and social content during a brief semistructured dyadic interaction. Studying effects of psychoactive drugs on speech content may offer new insights into drug effects on mental states, and on emotional and psychosocial interaction. PMID:25922420
Intimate insight: MDMA changes how people talk about significant others.
Baggott, Matthew J; Kirkpatrick, Matthew G; Bedi, Gillinder; de Wit, Harriet
2015-06-01
±3,4-methylenedioxymethamphetamine (MDMA) is widely believed to increase sociability. The drug alters speech production and fluency, and may influence speech content. Here, we investigated the effect of MDMA on speech content, which may reveal how this drug affects social interactions. Thirty-five healthy volunteers with prior MDMA experience completed this two-session, within-subjects, double-blind study during which they received 1.5 mg/kg oral MDMA and placebo. Participants completed a five-minute standardized talking task during which they discussed a close personal relationship (e.g. a friend or family member) with a research assistant. The conversations were analyzed for selected content categories (e.g. words pertaining to affect, social interaction, and cognition), using both a standard dictionary method (Pennebaker's Linguistic Inquiry and Word Count: LIWC) and a machine learning method using random forest classifiers. Both analytic methods revealed that MDMA altered speech content relative to placebo. Using LIWC scores, the drug increased use of social and sexual words, consistent with reports that MDMA increases willingness to disclose. Using the machine learning algorithm, we found that MDMA increased use of social words and words relating to both positive and negative emotions. These findings are consistent with reports that MDMA acutely alters speech content, specifically increasing emotional and social content during a brief semistructured dyadic interaction. Studying effects of psychoactive drugs on speech content may offer new insights into drug effects on mental states, and on emotional and psychosocial interaction. © The Author(s) 2015.
Morin, Alain; Hamper, Breanne
2012-01-01
Inner speech involvement in self-reflection was examined by reviewing 130 studies assessing brain activation during self-referential processing in key self-domains: agency, self-recognition, emotions, personality traits, autobiographical memory, and miscellaneous (e.g., prospection, judgments). The left inferior frontal gyrus (LIFG) has been shown to be reliably recruited during inner speech production. The percentage of studies reporting LIFG activity for each self-dimension was calculated. Fifty five percent of all studies reviewed indicated LIFG (and presumably inner speech) activity during self-reflection tasks; on average LIFG activation is observed 16% of the time during completion of non-self tasks (e.g., attention, perception). The highest LIFG activation rate was observed during retrieval of autobiographical information. The LIFG was significantly more recruited during conceptual tasks (e.g., prospection, traits) than during perceptual tasks (agency and self-recognition). This constitutes additional evidence supporting the idea of a participation of inner speech in self-related thinking. PMID:23049653
Morin, Alain; Hamper, Breanne
2012-01-01
Inner speech involvement in self-reflection was examined by reviewing 130 studies assessing brain activation during self-referential processing in key self-domains: agency, self-recognition, emotions, personality traits, autobiographical memory, and miscellaneous (e.g., prospection, judgments). The left inferior frontal gyrus (LIFG) has been shown to be reliably recruited during inner speech production. The percentage of studies reporting LIFG activity for each self-dimension was calculated. Fifty five percent of all studies reviewed indicated LIFG (and presumably inner speech) activity during self-reflection tasks; on average LIFG activation is observed 16% of the time during completion of non-self tasks (e.g., attention, perception). The highest LIFG activation rate was observed during retrieval of autobiographical information. The LIFG was significantly more recruited during conceptual tasks (e.g., prospection, traits) than during perceptual tasks (agency and self-recognition). This constitutes additional evidence supporting the idea of a participation of inner speech in self-related thinking.
Evidence, Goals, and Outcomes in Stuttering Treatment: Applications With an Adolescent Who Stutters.
Marcotte, Anne K
2018-01-09
The purpose of this clinical focus article is to summarize 1 possible process that a clinician might follow in designing and conducting a treatment program with John, a 14-year-old male individual who stutters. The available research evidence, practitioner experience, and consideration of individual preferences are combined to address goals, treatment procedures, and outcomes for John. The stuttering treatment research literature includes multiple well-designed reviews and individual studies that have shown the effectiveness of prolonged speech (and smooth speech and related variations) for improving stuttered speech and for improving social, emotional, cognitive, and related variables in adolescents who stutter. Based on that evidence, and incorporating the additional elements of practitioner experience and client preferences, this clinical focus article suggests that John would be likely to benefit from a treatment program based on prolonged speech. The basic structure of 1 possible such program is also described, with an emphasis on the goals and outcomes that John could be expected to achieve.
Emotional voices in context: A neurobiological model of multimodal affective information processing
NASA Astrophysics Data System (ADS)
Brück, Carolin; Kreifelts, Benjamin; Wildgruber, Dirk
2011-12-01
Just as eyes are often considered a gateway to the soul, the human voice offers a window through which we gain access to our fellow human beings' minds - their attitudes, intentions and feelings. Whether in talking or singing, crying or laughing, sighing or screaming, the sheer sound of a voice communicates a wealth of information that, in turn, may serve the observant listener as valuable guidepost in social interaction. But how do human beings extract information from the tone of a voice? In an attempt to answer this question, the present article reviews empirical evidence detailing the cerebral processes that underlie our ability to decode emotional information from vocal signals. The review will focus primarily on two prominent classes of vocal emotion cues: laughter and speech prosody (i.e. the tone of voice while speaking). Following a brief introduction, behavioral as well as neuroimaging data will be summarized that allows to outline cerebral mechanisms associated with the decoding of emotional voice cues, as well as the influence of various context variables (e.g. co-occurring facial and verbal emotional signals, attention focus, person-specific parameters such as gender and personality) on the respective processes. Building on the presented evidence, a cerebral network model will be introduced that proposes a differential contribution of various cortical and subcortical brain structures to the processing of emotional voice signals both in isolation and in context of accompanying (facial and verbal) emotional cues.
Emotional voices in context: a neurobiological model of multimodal affective information processing.
Brück, Carolin; Kreifelts, Benjamin; Wildgruber, Dirk
2011-12-01
Just as eyes are often considered a gateway to the soul, the human voice offers a window through which we gain access to our fellow human beings' minds - their attitudes, intentions and feelings. Whether in talking or singing, crying or laughing, sighing or screaming, the sheer sound of a voice communicates a wealth of information that, in turn, may serve the observant listener as valuable guidepost in social interaction. But how do human beings extract information from the tone of a voice? In an attempt to answer this question, the present article reviews empirical evidence detailing the cerebral processes that underlie our ability to decode emotional information from vocal signals. The review will focus primarily on two prominent classes of vocal emotion cues: laughter and speech prosody (i.e. the tone of voice while speaking). Following a brief introduction, behavioral as well as neuroimaging data will be summarized that allows to outline cerebral mechanisms associated with the decoding of emotional voice cues, as well as the influence of various context variables (e.g. co-occurring facial and verbal emotional signals, attention focus, person-specific parameters such as gender and personality) on the respective processes. Building on the presented evidence, a cerebral network model will be introduced that proposes a differential contribution of various cortical and subcortical brain structures to the processing of emotional voice signals both in isolation and in context of accompanying (facial and verbal) emotional cues. Copyright © 2011 Elsevier B.V. All rights reserved.
Gorlin, Eugenia I; Teachman, Bethany A
2015-07-01
The current study brings together two typically distinct lines of research. First, social anxiety is inconsistently associated with behavioral deficits in social performance, and the factors accounting for these deficits remain poorly understood. Second, research on selective processing of threat cues, termed cognitive biases, suggests these biases typically predict negative outcomes, but may sometimes be adaptive, depending on the context. Integrating these research areas, the current study examined whether conscious and/or unconscious threat interference biases (indexed by the unmasked and masked emotional Stroop) can explain unique variance, beyond self-reported anxiety measures, in behavioral avoidance and observer-rated anxious behavior during a public speaking task. Minute of speech and general inhibitory control (indexed by the color-word Stroop) were examined as within-subject and between-subject moderators, respectively. Highly socially anxious participants (N=135) completed the emotional and color-word Stroop blocks prior to completing a 4-minute videotaped speech task, which was later coded for anxious behaviors (e.g., speech dysfluency). Mixed-effects regression analyses revealed that general inhibitory control moderated the relationship between both conscious and unconscious threat interference bias and anxious behavior (though not avoidance), such that lower threat interference predicted higher levels of anxious behavior, but only among those with relatively weaker (versus stronger) inhibitory control. Minute of speech further moderated this relationship for unconscious (but not conscious) social-threat interference, such that lower social-threat interference predicted a steeper increase in anxious behaviors over the course of the speech (but only among those with weaker inhibitory control). Thus, both trait and state differences in inhibitory control resources may influence the behavioral impact of threat biases in social anxiety. Copyright © 2015. Published by Elsevier Ltd.
Long Term Suboxone™ Emotional Reactivity As Measured by Automatic Detection in Speech
Hill, Edward; Han, David; Dumouchel, Pierre; Dehak, Najim; Quatieri, Thomas; Moehs, Charles; Oscar-Berman, Marlene; Giordano, John; Simpatico, Thomas; Blum, Kenneth
2013-01-01
Addictions to illicit drugs are among the nation’s most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX) as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states (“true ground emotionality”) in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of “true” emotionality in 36 SUBX patients compared to 44 individuals from the general population (GP) and 33 members of Alcoholics Anonymous (AA). Other less objective studies have investigated emotional reactivity of heroin, methadone and opioid abstinent patients. These studies indicate that current opioid users have abnormal emotional experience, characterized by heightened response to unpleasant stimuli and blunted response to pleasant stimuli. However, this is the first study to our knowledge to evaluate “true ground” emotionality in long-term buprenorphine/naloxone combination (Suboxone™). We found in long-term SUBX patients a significantly flat affect (p<0.01), and they had less self-awareness of being happy, sad, and anxious compared to both the GP and AA groups. We caution definitive interpretation of these seemingly important results until we compare the emotional reactivity of an opioid abstinent control using automatic detection in speech. These findings encourage continued research strategies in SUBX patients to target the specific brain regions responsible for relapse prevention of opioid addiction. PMID:23874860
Speech Volume Indexes Sex Differences in the Social-Emotional Effects of Alcohol
Fairbairn, Catharine E.; Sayette, Michael A.; Amole, Marlissa C.; Dimoff, John D.; Cohn, Jeffrey F.; Girard, Jeffrey M.
2015-01-01
Men and women differ dramatically in their rates of alcohol use disorder (AUD), and researchers have long been interested in identifying mechanisms underlying male vulnerability to problem drinking. Surveys suggest that social processes underlie sex differences in drinking patterns, with men reporting greater social enhancement from alcohol than women, and all-male social drinking contexts being associated with particularly high rates of hazardous drinking. But experimental evidence for sex differences in social-emotional response to alcohol has heretofore been lacking. Research using larger sample sizes, a social context, and more sensitive measures of alcohol’s rewarding effects may be necessary to better understand sex differences in the etiology of AUD. This study explored the acute effects of alcohol during social exchange on speech volume –an objective measure of social-emotional experience that was reliably captured at the group level. Social drinkers (360 male; 360 female) consumed alcohol (.82g/kg males; .74g/kg females), placebo, or a no-alcohol control beverage in groups of three over 36-minutes. Within each of the three beverage conditions, equal numbers of groups consisted of all males, all females, 2 females and 1 male, and 1 female and 2 males. Speech volume was monitored continuously throughout the drink period, and group volume emerged as a robust correlate of self-report and facial indexes of social reward. Notably, alcohol-related increases in group volume were observed selectively in all-male groups but not in groups containing any females. Results point to social enhancement as a promising direction for research exploring factors underlying sex differences in problem drinking. PMID:26237323
Underconnectivity between voice-selective cortex and reward circuitry in children with autism.
Abrams, Daniel A; Lynch, Charles J; Cheng, Katherine M; Phillips, Jennifer; Supekar, Kaustubh; Ryali, Srikanth; Uddin, Lucina Q; Menon, Vinod
2013-07-16
Individuals with autism spectrum disorders (ASDs) often show insensitivity to the human voice, a deficit that is thought to play a key role in communication deficits in this population. The social motivation theory of ASD predicts that impaired function of reward and emotional systems impedes children with ASD from actively engaging with speech. Here we explore this theory by investigating distributed brain systems underlying human voice perception in children with ASD. Using resting-state functional MRI data acquired from 20 children with ASD and 19 age- and intelligence quotient-matched typically developing children, we examined intrinsic functional connectivity of voice-selective bilateral posterior superior temporal sulcus (pSTS). Children with ASD showed a striking pattern of underconnectivity between left-hemisphere pSTS and distributed nodes of the dopaminergic reward pathway, including bilateral ventral tegmental areas and nucleus accumbens, left-hemisphere insula, orbitofrontal cortex, and ventromedial prefrontal cortex. Children with ASD also showed underconnectivity between right-hemisphere pSTS, a region known for processing speech prosody, and the orbitofrontal cortex and amygdala, brain regions critical for emotion-related associative learning. The degree of underconnectivity between voice-selective cortex and reward pathways predicted symptom severity for communication deficits in children with ASD. Our results suggest that weak connectivity of voice-selective cortex and brain structures involved in reward and emotion may impair the ability of children with ASD to experience speech as a pleasurable stimulus, thereby impacting language and social skill development in this population. Our study provides support for the social motivation theory of ASD.
Assessing attentional biases with stuttering.
Lowe, Robyn; Menzies, Ross; Packman, Ann; O'Brian, Sue; Jones, Mark; Onslow, Mark
2016-01-01
Many adults who stutter presenting for speech treatment experience social anxiety disorder. The presence of mental health disorders in adults who stutter has been implicated in a failure to maintain speech treatment benefits. Contemporary theories of social anxiety disorder propose that the condition is maintained by negative cognitions and information processing biases. Consistent with cognitive theories, the probe detection task has shown that social anxiety is associated with an attentional bias to avoid social information. This information processing bias is suggested to be involved in maintaining anxiety. Evidence is emerging for information processing biases being involved with stuttering. This study investigated information processing in adults who stutter using the probe detection task. Information processing biases have been implicated in anxiety maintenance in social anxiety disorder and therefore may have implications for the assessment and treatment of stuttering. It was hypothesized that stuttering participants compared with control participants would display an attentional bias to avoid attending to social information. Twenty-three adults who stutter and 23 controls completed a probe detection task in which they were presented with pairs of photographs: a face displaying an emotional expression-positive, negative or neutral-and an everyday household object. All participants were subjected to a mild social threat induction being told they would speak to a small group of people on completion of the task. The stuttering group scored significantly higher than controls for trait anxiety, but did not differ from controls on measures of social anxiety. Non-socially anxious adults who stutter did not display an attentional bias to avoid looking at photographs of faces relative to everyday objects. Higher scores on trait anxiety were positively correlated with attention towards photographs of negative faces. Attentional biases as assessed by the probe detection task may not be a characteristic of non-socially anxious adults who stutter. A vigilance to attend to threat information with high trait anxiety is consistent with findings of studies using the emotional Stroop task in stuttering and social anxiety disorder. Future research should investigate attentional processing in people who stutter who are socially anxious. It will also be useful for future studies to employ research paradigms that involve speaking. Continued research is warranted to explore information processing and potential biases that could be involved in the maintenance of anxiety and failure to maintain the benefits of speech treatment outcomes. © 2015 Royal College of Speech and Language Therapists.
de Sousa, Paulo; Sellwood, William; Spray, Amy; Bentall, Richard P
2016-04-01
Thought disorder (TD) has been shown to vary in relation to negative affect. Here we examine the role internal source monitoring (iSM, i.e. ability to discriminate between inner speech and verbalized speech) in TD and whether changes in iSM performance are implicated in the affective reactivity effect (deterioration of TD when participants are asked to talk about emotionally-laden topics). Eighty patients diagnosed with schizophrenia-spectrum disorder and thirty healthy controls received interviews that promoted personal disclosure (emotionally salient) and interviews on everyday topics (non-salient) on separate days. During the interviews, participants were tested on iSM, self-reported affect and immediate auditory recall. Patients had more TD, poorer ability to discriminate between inner and verbalized speech, poorer immediate auditory recall and reported more negative affect than controls. Both groups displayed more TD and negative affect in salient interviews but only patients showed poorer performance on iSM. Immediate auditory recall did not change significantly across affective conditions. In patients, the relationship between self-reported negative affect and TD was mediated by deterioration in the ability to discriminate between inner speech and speech that was directed to others and socially shared (performance on the iSM) in both interviews. Furthermore, deterioration in patients' performance on iSM across conditions significantly predicted deterioration in TD across the interviews (affective reactivity of speech). Poor iSM is significantly associated with TD. Negative affect, leading to further impaired iSM, leads to increased TD in patients with psychosis. Avenues for future research as well as clinical implications of these findings are discussed. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Hannesdóttir, Dagmar Kr; Doxie, Jacquelyn; Bell, Martha Ann; Ollendick, Thomas H; Wolfe, Christy D
2010-03-01
We investigated whether brain electrical activity during early childhood was associated with anxiety symptoms and emotion regulation during a stressful situation during middle childhood. Frontal electroencephalogram (EEG) asymmetries were measured during baseline and during a cognitive control task at 4 1/2 years. Anxiety and emotion regulation were assessed during a stressful situation at age 9 (speech task), along with measures of heart rate (HR) and heart rate variability (HRV). Questionnaires were also used to assess anxiety and emotion regulation at age 9. Results from this longitudinal study indicated that children who exhibited right frontal asymmetry in early childhood experienced more physiological arousal (increased HR, decreased HRV) during the speech task at age 9 and less ability to regulate their emotions as reported by their parents. Findings are discussed in light of the associations between temperament and development of anxiety disorders.
Eliciting and maintaining ruminative thought: the role of social-evaluative threat.
Zoccola, Peggy M; Dickerson, Sally S; Lam, Suman
2012-08-01
This study tested whether a performance stressor characterized by social-evaluative threat (SET) elicits more rumination than a stressor without this explicit evaluative component and whether this difference persists minutes, hours, and days later. The mediating role of shame-related cognition and emotion (SRCE) was also examined. During a laboratory visit, 144 undergraduates (50% female) were randomly assigned to complete a speech stressor in a social-evaluative threat condition (SET; n = 86), in which an audience was present, or a nonexplicit social-evaluative threat condition (ne-SET; n = 58), in which they were alone in a room. Participants completed measures of stressor-related rumination 10 and 40 min posttask, later that night, and upon returning to the laboratory 3-5 days later. SRCE and other emotions experienced during the stressor (fear, anger, and sadness) were assessed immediately posttask. As hypothesized, the SET speech stressor elicited more rumination than the ne-SET speech stressor, and these differences persisted for 3-5 days. SRCE-but not other specific negative emotions or general emotional arousal-mediated the effect of stressor context on rumination. Stressors characterized by SET may be likely candidates for eliciting and maintaining ruminative thought immediately and also days later, potentially by eliciting shame-related emotions and cognitions.
45 CFR 1214.103 - Definitions.
Code of Federal Regulations, 2010 CFR
2010-10-01
..., including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and lymphatic; skin... impairment” includes, but is not limited to, such diseases and conditions as orthopedic, visual, speech, and... disease, diabetes, mental retardation, emotional illness, and drug addiction and alcoholism. (2) Major...
Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting.
Wöllmer, Martin; Marchi, Erik; Squartini, Stefano; Schuller, Björn
2011-09-01
Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today's automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database-a corpus containing emotionally colored conversations with a cognitive system for "Sensitive Artificial Listening".
Lahnakoski, Juha M; Glerean, Enrico; Salmi, Juha; Jääskeläinen, Iiro P; Sams, Mikko; Hari, Riitta; Nummenmaa, Lauri
2012-01-01
Despite the abundant data on brain networks processing static social signals, such as pictures of faces, the neural systems supporting social perception in naturalistic conditions are still poorly understood. Here we delineated brain networks subserving social perception under naturalistic conditions in 19 healthy humans who watched, during 3-T functional magnetic resonance imaging (fMRI), a set of 137 short (approximately 16 s each, total 27 min) audiovisual movie clips depicting pre-selected social signals. Two independent raters estimated how well each clip represented eight social features (faces, human bodies, biological motion, goal-oriented actions, emotion, social interaction, pain, and speech) and six filler features (places, objects, rigid motion, people not in social interaction, non-goal-oriented action, and non-human sounds) lacking social content. These ratings were used as predictors in the fMRI analysis. The posterior superior temporal sulcus (STS) responded to all social features but not to any non-social features, and the anterior STS responded to all social features except bodies and biological motion. We also found four partially segregated, extended networks for processing of specific social signals: (1) a fronto-temporal network responding to multiple social categories, (2) a fronto-parietal network preferentially activated to bodies, motion, and pain, (3) a temporo-amygdalar network responding to faces, social interaction, and speech, and (4) a fronto-insular network responding to pain, emotions, social interactions, and speech. Our results highlight the role of the pSTS in processing multiple aspects of social information, as well as the feasibility and efficiency of fMRI mapping under conditions that resemble the complexity of real life.
Jürgens, Rebecca; Fischer, Julia; Schacht, Annekathrin
2018-01-01
Emotional expressions provide strong signals in social interactions and can function as emotion inducers in a perceiver. Although speech provides one of the most important channels for human communication, its physiological correlates, such as activations of the autonomous nervous system (ANS) while listening to spoken utterances, have received far less attention than in other domains of emotion processing. Our study aimed at filling this gap by investigating autonomic activation in response to spoken utterances that were embedded into larger semantic contexts. Emotional salience was manipulated by providing information on alleged speaker similarity. We compared these autonomic responses to activations triggered by affective sounds, such as exploding bombs, and applause. These sounds had been rated and validated as being either positive, negative, or neutral. As physiological markers of ANS activity, we recorded skin conductance responses (SCRs) and changes of pupil size while participants classified both prosodic and sound stimuli according to their hedonic valence. As expected, affective sounds elicited increased arousal in the receiver, as reflected in increased SCR and pupil size. In contrast, SCRs to angry and joyful prosodic expressions did not differ from responses to neutral ones. Pupil size, however, was modulated by affective prosodic utterances, with increased dilations for angry and joyful compared to neutral prosody, although the similarity manipulation had no effect. These results indicate that cues provided by emotional prosody in spoken semantically neutral utterances might be too subtle to trigger SCR, although variation in pupil size indicated the salience of stimulus variation. Our findings further demonstrate a functional dissociation between pupil dilation and skin conductance that presumably origins from their differential innervation. PMID:29541045
Comparison of Two Music Training Approaches on Music and Speech Perception in Cochlear Implant Users
Fuller, Christina D.; Galvin, John J.; Maat, Bert; Başkent, Deniz; Free, Rolien H.
2018-01-01
In normal-hearing (NH) adults, long-term music training may benefit music and speech perception, even when listening to spectro-temporally degraded signals as experienced by cochlear implant (CI) users. In this study, we compared two different music training approaches in CI users and their effects on speech and music perception, as it remains unclear which approach to music training might be best. The approaches differed in terms of music exercises and social interaction. For the pitch/timbre group, melodic contour identification (MCI) training was performed using computer software. For the music therapy group, training involved face-to-face group exercises (rhythm perception, musical speech perception, music perception, singing, vocal emotion identification, and music improvisation). For the control group, training involved group nonmusic activities (e.g., writing, cooking, and woodworking). Training consisted of weekly 2-hr sessions over a 6-week period. Speech intelligibility in quiet and noise, vocal emotion identification, MCI, and quality of life (QoL) were measured before and after training. The different training approaches appeared to offer different benefits for music and speech perception. Training effects were observed within-domain (better MCI performance for the pitch/timbre group), with little cross-domain transfer of music training (emotion identification significantly improved for the music therapy group). While training had no significant effect on QoL, the music therapy group reported better perceptual skills across training sessions. These results suggest that more extensive and intensive training approaches that combine pitch training with the social aspects of music therapy may further benefit CI users. PMID:29621947
Fuller, Christina D; Galvin, John J; Maat, Bert; Başkent, Deniz; Free, Rolien H
2018-01-01
In normal-hearing (NH) adults, long-term music training may benefit music and speech perception, even when listening to spectro-temporally degraded signals as experienced by cochlear implant (CI) users. In this study, we compared two different music training approaches in CI users and their effects on speech and music perception, as it remains unclear which approach to music training might be best. The approaches differed in terms of music exercises and social interaction. For the pitch/timbre group, melodic contour identification (MCI) training was performed using computer software. For the music therapy group, training involved face-to-face group exercises (rhythm perception, musical speech perception, music perception, singing, vocal emotion identification, and music improvisation). For the control group, training involved group nonmusic activities (e.g., writing, cooking, and woodworking). Training consisted of weekly 2-hr sessions over a 6-week period. Speech intelligibility in quiet and noise, vocal emotion identification, MCI, and quality of life (QoL) were measured before and after training. The different training approaches appeared to offer different benefits for music and speech perception. Training effects were observed within-domain (better MCI performance for the pitch/timbre group), with little cross-domain transfer of music training (emotion identification significantly improved for the music therapy group). While training had no significant effect on QoL, the music therapy group reported better perceptual skills across training sessions. These results suggest that more extensive and intensive training approaches that combine pitch training with the social aspects of music therapy may further benefit CI users.
Connecting multimodality in human communication
Regenbogen, Christina; Habel, Ute; Kellermann, Thilo
2013-01-01
A successful reciprocal evaluation of social signals serves as a prerequisite for social coherence and empathy. In a previous fMRI study we studied naturalistic communication situations by presenting video clips to our participants and recording their behavioral responses regarding empathy and its components. In two conditions, all three channels transported congruent emotional or neutral information, respectively. Three conditions selectively presented two emotional channels and one neutral channel and were thus bimodally emotional. We reported channel-specific emotional contributions in modality-related areas, elicited by dynamic video clips with varying combinations of emotionality in facial expressions, prosody, and speech content. However, to better understand the underlying mechanisms accompanying a naturalistically displayed human social interaction in some key regions that presumably serve as specific processing hubs for facial expressions, prosody, and speech content, we pursued a reanalysis of the data. Here, we focused on two different descriptions of temporal characteristics within these three modality-related regions [right fusiform gyrus (FFG), left auditory cortex (AC), left angular gyrus (AG) and left dorsomedial prefrontal cortex (dmPFC)]. By means of a finite impulse response (FIR) analysis within each of the three regions we examined the post-stimulus time-courses as a description of the temporal characteristics of the BOLD response during the video clips. Second, effective connectivity between these areas and the left dmPFC was analyzed using dynamic causal modeling (DCM) in order to describe condition-related modulatory influences on the coupling between these regions. The FIR analysis showed initially diminished activation in bimodally emotional conditions but stronger activation than that observed in neutral videos toward the end of the stimuli, possibly by bottom-up processes in order to compensate for a lack of emotional information. The DCM analysis instead showed a pronounced top-down control. Remarkably, all connections from the dmPFC to the three other regions were modulated by the experimental conditions. This observation is in line with the presumed role of the dmPFC in the allocation of attention. In contrary, all incoming connections to the AG were modulated, indicating its key role in integrating multimodal information and supporting comprehension. Notably, the input from the FFG to the AG was enhanced when facial expressions conveyed emotional information. These findings serve as preliminary results in understanding network dynamics in human emotional communication and empathy. PMID:24265613
Connecting multimodality in human communication.
Regenbogen, Christina; Habel, Ute; Kellermann, Thilo
2013-01-01
A successful reciprocal evaluation of social signals serves as a prerequisite for social coherence and empathy. In a previous fMRI study we studied naturalistic communication situations by presenting video clips to our participants and recording their behavioral responses regarding empathy and its components. In two conditions, all three channels transported congruent emotional or neutral information, respectively. Three conditions selectively presented two emotional channels and one neutral channel and were thus bimodally emotional. We reported channel-specific emotional contributions in modality-related areas, elicited by dynamic video clips with varying combinations of emotionality in facial expressions, prosody, and speech content. However, to better understand the underlying mechanisms accompanying a naturalistically displayed human social interaction in some key regions that presumably serve as specific processing hubs for facial expressions, prosody, and speech content, we pursued a reanalysis of the data. Here, we focused on two different descriptions of temporal characteristics within these three modality-related regions [right fusiform gyrus (FFG), left auditory cortex (AC), left angular gyrus (AG) and left dorsomedial prefrontal cortex (dmPFC)]. By means of a finite impulse response (FIR) analysis within each of the three regions we examined the post-stimulus time-courses as a description of the temporal characteristics of the BOLD response during the video clips. Second, effective connectivity between these areas and the left dmPFC was analyzed using dynamic causal modeling (DCM) in order to describe condition-related modulatory influences on the coupling between these regions. The FIR analysis showed initially diminished activation in bimodally emotional conditions but stronger activation than that observed in neutral videos toward the end of the stimuli, possibly by bottom-up processes in order to compensate for a lack of emotional information. The DCM analysis instead showed a pronounced top-down control. Remarkably, all connections from the dmPFC to the three other regions were modulated by the experimental conditions. This observation is in line with the presumed role of the dmPFC in the allocation of attention. In contrary, all incoming connections to the AG were modulated, indicating its key role in integrating multimodal information and supporting comprehension. Notably, the input from the FFG to the AG was enhanced when facial expressions conveyed emotional information. These findings serve as preliminary results in understanding network dynamics in human emotional communication and empathy.
Perception of affective and linguistic prosody: an ALE meta-analysis of neuroimaging studies
Brown, Steven
2014-01-01
Prosody refers to the melodic and rhythmic aspects of speech. Two forms of prosody are typically distinguished: ‘affective prosody’ refers to the expression of emotion in speech, whereas ‘linguistic prosody’ relates to the intonation of sentences, including the specification of focus within sentences and stress within polysyllabic words. While these two processes are united by their use of vocal pitch modulation, they are functionally distinct. In order to examine the localization and lateralization of speech prosody in the brain, we performed two voxel-based meta-analyses of neuroimaging studies of the perception of affective and linguistic prosody. There was substantial sharing of brain activations between analyses, particularly in right-hemisphere auditory areas. However, a major point of divergence was observed in the inferior frontal gyrus: affective prosody was more likely to activate Brodmann area 47, while linguistic prosody was more likely to activate the ventral part of area 44. PMID:23934416
NASA Astrophysics Data System (ADS)
Kasyidi, Fatan; Puji Lestari, Dessi
2018-03-01
One of the important aspects in human to human communication is to understand emotion of each party. Recently, interactions between human and computer continues to develop, especially affective interaction where emotion recognition is one of its important components. This paper presents our extended works on emotion recognition of Indonesian spoken language to identify four main class of emotions: Happy, Sad, Angry, and Contentment using combination of acoustic/prosodic features and lexical features. We construct emotion speech corpus from Indonesia television talk show where the situations are as close as possible to the natural situation. After constructing the emotion speech corpus, the acoustic/prosodic and lexical features are extracted to train the emotion model. We employ some machine learning algorithms such as Support Vector Machine (SVM), Naive Bayes, and Random Forest to get the best model. The experiment result of testing data shows that the best model has an F-measure score of 0.447 by using only the acoustic/prosodic feature and F-measure score of 0.488 by using both acoustic/prosodic and lexical features to recognize four class emotion using the SVM RBF Kernel.
ERIC Educational Resources Information Center
Gar, Natalie S.; Hudson, Jennifer L.
2009-01-01
The aim of this study was to determine whether maternal expressed emotion (criticism and emotional overinvolvement) decreased across treatment for childhood anxiety. Mothers of 48 clinically anxious children (aged 6-14 years) were rated on levels of criticism (CRIT) and emotional overinvolvement (EOI), as measured by a Five Minute Speech Sample…
ERIC Educational Resources Information Center
Kim, Sujin; Dorner, Lisa M.
2013-01-01
This article examines the relationship between language and emotion, especially drawing attention to the experiences and perspectives of second language (SL) learners. Informed by the sociocultural perspective on the construction of emotion and its representation, this study highlights the intertwined relationship among emotions, cultural…
Benders, Titia
2013-12-01
Exaggeration of the vowel space in infant-directed speech (IDS) is well documented for English, but not consistently replicated in other languages or for other speech-sound contrasts. A second attested, but less discussed, pattern of change in IDS is an overall rise of the formant frequencies, which may reflect an affective speaking style. The present study investigates longitudinally how Dutch mothers change their corner vowels, voiceless fricatives, and pitch when speaking to their infant at 11 and 15 months of age. In comparison to adult-directed speech (ADS), Dutch IDS has a smaller vowel space, higher second and third formant frequencies in the vowels, and a higher spectral frequency in the fricatives. The formants of the vowels and spectral frequency of the fricatives are raised more strongly for infants at 11 than at 15 months, while the pitch is more extreme in IDS to 15-month olds. These results show that enhanced positive affect is the main factor influencing Dutch mothers' realisation of speech sounds in IDS, especially to younger infants. This study provides evidence that mothers' expression of emotion in IDS can influence the realisation of speech sounds, and that the loss or gain of speech clarity may be secondary effects of affect. Copyright © 2013 Elsevier Inc. All rights reserved.
Cohen, Alex S; Hong, S Lee; Guevara, Alvaro
2010-06-01
Emotional expression is an essential function for daily life that can be severely affected in some psychological disorders. Laboratory-based procedures designed to measure prosodic expression from natural speech have shown early promise for measuring individual differences in emotional expression but have yet to produce robust within-group prosodic changes across various evocative conditions. This report presents data from three separate studies (total N = 464) that digitally recorded subjects as they verbalized their reactions to various stimuli. Format and stimuli were modified to maximize prosodic expression. Our results suggest that use of evocative slides organized according to either a dimensional (e.g., high and low arousal - pleasant, unpleasant and neutral valence) or categorical (e.g., fear, surprise, happiness) models produced robust changes in subjective state but only negligible change in prosodic expression. Alternatively, speech from the recall of autobiographical memories resulted in meaningful changes in both subjective state and prosodic expression. Implications for the study of psychological disorders are discussed.
ERIC Educational Resources Information Center
Gottschalk, Louis A.
This paper examines the use of content analysis of speech in the objective recording and measurement of changes in emotional and cognitive function of humans in whom natural or experimental changes in neural status have occurred. A brief description of the data gathering process, details of numerous physiological effects, an anxiety scale, and a…
Situational influences on rhythmicity in speech, music, and their interaction.
Hawkins, Sarah
2014-12-19
Brain processes underlying the production and perception of rhythm indicate considerable flexibility in how physical signals are interpreted. This paper explores how that flexibility might play out in rhythmicity in speech and music. There is much in common across the two domains, but there are also significant differences. Interpretations are explored that reconcile some of the differences, particularly with respect to how functional properties modify the rhythmicity of speech, within limits imposed by its structural constraints. Functional and structural differences mean that music is typically more rhythmic than speech, and that speech will be more rhythmic when the emotions are more strongly engaged, or intended to be engaged. The influence of rhythmicity on attention is acknowledged, and it is suggested that local increases in rhythmicity occur at times when attention is required to coordinate joint action, whether in talking or music-making. Evidence is presented which suggests that while these short phases of heightened rhythmical behaviour are crucial to the success of transitions in communicative interaction, their modality is immaterial: they all function to enhance precise temporal prediction and hence tightly coordinated joint action. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Moberly, Aaron C; Patel, Tirth R; Castellanos, Irina
2018-02-01
As a result of their hearing loss, adults with cochlear implants (CIs) would self-report poorer executive functioning (EF) skills than normal-hearing (NH) peers, and these EF skills would be associated with performance on speech recognition tasks. EF refers to a group of high order neurocognitive skills responsible for behavioral and emotional regulation during goal-directed activity, and EF has been found to be poorer in children with CIs than their NH age-matched peers. Moreover, there is increasing evidence that neurocognitive skills, including some EF skills, contribute to the ability to recognize speech through a CI. Thirty postlingually deafened adults with CIs and 42 age-matched NH adults were enrolled. Participants and their spouses or significant others (informants) completed well-validated self-reports or informant-reports of EF, the Behavior Rating Inventory of Executive Function - Adult (BRIEF-A). CI users' speech recognition skills were assessed in quiet using several measures of sentence recognition. NH peers were tested for recognition of noise-vocoded versions of the same speech stimuli. CI users self-reported difficulty on EF tasks of shifting and task monitoring. In CI users, measures of speech recognition correlated with several self-reported EF skills. The present findings provide further evidence that neurocognitive factors, including specific EF skills, may decline in association with hearing loss, and that some of these EF skills contribute to speech processing under degraded listening conditions.
Mental health nurses' experiences of managing work-related emotions through supervision.
MacLaren, Jessica; Stenhouse, Rosie; Ritchie, Deborah
2016-10-01
The aim of this study was to explore emotion cultures constructed in supervision and consider how supervision functions as an emotionally safe space promoting critical reflection. Research published between 1995-2015 suggests supervision has a positive impact on nurses' emotional well-being, but there is little understanding of the processes involved in this and how styles of emotion interaction are established in supervision. A narrative approach was used to investigate mental health nurses' understandings and experiences of supervision. Eight semi-structured interviews were conducted with community mental health nurses in the UK during 2011. Analysis of audio data used features of speech to identify narrative discourse and illuminate meanings. A topic-centred analysis of interview narratives explored discourses shared between the participants. This supported the identification of feeling rules in participants' narratives and the exploration of the emotion context of supervision. Effective supervision was associated with three feeling rules: safety and reflexivity; staying professional; managing feelings. These feeling rules allowed the expression and exploration of emotions, promoting critical reflection. A contrast was identified between the emotion culture of supervision and the nurses' experience of their workplace cultures as requiring the suppression of difficult emotions. Despite this, contrast supervision functioned as an emotion micro-culture with its own distinctive feeling rules. The analytical construct of feeling rules allows us to connect individual emotional experiences to shared normative discourses, highlighting how these shape emotional processes taking place in supervision. This understanding supports an explanation of how supervision may positively influence nurses' emotion management and perhaps reduce burnout. © 2016 John Wiley & Sons Ltd.
Voice emotion recognition by cochlear-implanted children and their normally-hearing peers
Chatterjee, Monita; Zion, Danielle; Deroche, Mickael L.; Burianek, Brooke; Limb, Charles; Goren, Alison; Kulkarni, Aditya M.; Christensen, Julie A.
2014-01-01
Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups’ mean performance is similar to aNHs’ performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. PMID:25448167
Post-stroke acquired amusia: A comparison between right- and left-brain hemispheric damages.
Jafari, Zahra; Esmaili, Mahdiye; Delbari, Ahmad; Mehrpour, Masoud; Mohajerani, Majid H
2017-01-01
Although extensive research has been published about the emotional consequences of stroke, most studies have focused on emotional words, speech prosody, voices, or facial expressions. The emotional processing of musical excerpts following stroke has been relatively unexplored. The present study was conducted to investigate the effects of chronic stroke on the recognition of basic emotions in music. Seventy persons, including 25 normal controls (NC), 25 persons with right brain damage (RBD) from stroke, and 20 persons with left brain damage (LBD) from stroke between the ages of 31-71 years were studied. The Musical Emotional Bursts (MEB) test, which consists of a set of short musical pieces expressing basic emotional states (happiness, sadness, and fear) and neutrality, was used to test musical emotional perception. Both stroke groups were significantly poorer than normal controls for the MEB total score and its subtests (p < 0.001). The RBD group was significantly less able than the LBD group to recognize sadness (p = 0.047) and neutrality (p = 0.015). Negative correlations were found between age and MEB scores for all groups, particularly the NC and RBD groups. Our findings indicated that stroke affecting the auditory cerebrum can cause acquired amusia with greater severity in RBD than LBD. These results supported the "valence hypothesis" of right hemisphere dominance in processing negative emotions.
Separated from family, students chalk up their emotions > U.S. Air Force >
The Book Speeches Archive Former AF Top 3 Viewpoints and Speeches Air Force Warrior Games 2017 Events 2018 Air Force Strategic Documents Desert Storm 25th Anniversary Observances DoD Warrior Games
Cross-Cultural Attitudes toward Speech Disorders.
ERIC Educational Resources Information Center
Bebout, Linda; Arthur, Bradford
1992-01-01
University students (n=166) representing English-speaking North American culture and several other cultures completed questionnaires examining attitudes toward four speech disorders (cleft palate, dysfluency, hearing impairment, and misarticulations). Results showed significant group differences in beliefs about the emotional health of persons…
Towards Real-Time Speech Emotion Recognition for Affective E-Learning
ERIC Educational Resources Information Center
Bahreini, Kiavash; Nadolski, Rob; Westera, Wim
2016-01-01
This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…
The voices of seduction: cross-gender effects in processing of erotic prosody
Ethofer, Thomas; Wiethoff, Sarah; Anders, Silke; Kreifelts, Benjamin; Grodd, Wolfgang
2007-01-01
Gender specific differences in cognitive functions have been widely discussed. Considering social cognition such as emotion perception conveyed by non-verbal cues, generally a female advantage is assumed. In the present study, however, we revealed a cross-gender interaction with increasing responses to the voice of opposite sex in male and female subjects. This effect was confined to erotic tone of speech in behavioural data and haemodynamic responses within voice sensitive brain areas (right middle superior temporal gyrus). The observed response pattern, thus, indicates a particular sensitivity to emotional voices that have a high behavioural relevance for the listener. PMID:18985138
A hypothesis on the biological origins and social evolution of music and dance
Wang, Tianyan
2015-01-01
The origins of music and musical emotions is still an enigma, here I propose a comprehensive hypothesis on the origins and evolution of music, dance, and speech from a biological and sociological perspective. I suggest that every pitch interval between neighboring notes in music represents corresponding movement pattern through interpreting the Doppler effect of sound, which not only provides a possible explanation for the transposition invariance of music, but also integrates music and dance into a common form—rhythmic movements. Accordingly, investigating the origins of music poses the question: why do humans appreciate rhythmic movements? I suggest that human appreciation of rhythmic movements and rhythmic events developed from the natural selection of organisms adapting to the internal and external rhythmic environments. The perception and production of, as well as synchronization with external and internal rhythms are so vital for an organism's survival and reproduction, that animals have a rhythm-related reward and emotion (RRRE) system. The RRRE system enables the appreciation of rhythmic movements and events, and is integral to the origination of music, dance and speech. The first type of rewards and emotions (rhythm-related rewards and emotions, RRREs) are evoked by music and dance, and have biological and social functions, which in turn, promote the evolution of music, dance and speech. These functions also evoke a second type of rewards and emotions, which I name society-related rewards and emotions (SRREs). The neural circuits of RRREs and SRREs develop in species formation and personal growth, with congenital and acquired characteristics, respectively, namely music is the combination of nature and culture. This hypothesis provides probable selection pressures and outlines the evolution of music, dance, and speech. The links between the Doppler effect and the RRREs and SRREs can be empirically tested, making the current hypothesis scientifically concrete. PMID:25741232
On the role of attention for the processing of emotions in speech: sex differences revisited.
Schirmer, Annett; Kotz, Sonja A; Friederici, Angela D
2005-08-01
In a previous cross-modal priming study [A. Schirmer, A.S. Kotz, A.D. Friederici, Sex differentiates the role of emotional prosody during word processing, Cogn. Brain Res. 14 (2002) 228-233.], we found that women integrated emotional prosody and word valence earlier than men. Both sexes showed a smaller N400 in the event-related potential to emotional words when these words were preceded by a sentence with congruous compared to incongruous emotional prosody. However, women showed this effect with a 200-ms interval between prime sentence and target word whereas men showed the effect with a 750-ms interval. The present study was designed to determine whether these sex differences prevail when attention is directed towards the emotional content of prosody and word meaning. To this end, we presented the same prime sentences and target words as in our previous study. Sentences were spoken with happy or sad prosody and followed by a congruous or incongruous emotional word or pseudoword. The interval between sentence offset and target onset was 200 ms. In addition to performing a lexical decision, participants were asked to decide whether or not a word matched the emotional prosody of the preceding sentence. The combined lexical and congruence judgment failed to reveal differences in emotional-prosodic priming between men and women. Both sexes showed smaller N400 amplitudes to emotionally congruent compared to incongruent words. This suggests that the presence of sex differences in emotional-prosodic priming depends on whether or not participants are instructed to take emotional prosody into account.
Minor, Kyle S; Bonfils, Kelsey A; Luther, Lauren; Firmin, Ruth L; Kukla, Marina; MacLain, Victoria R; Buck, Benjamin; Lysaker, Paul H; Salyers, Michelle P
2015-05-01
The words people use convey important information about internal states, feelings, and views of the world around them. Lexical analysis is a fast, reliable method of assessing word use that has shown promise for linking speech content, particularly in emotion and social categories, with psychopathological symptoms. However, few studies have utilized lexical analysis instruments to assess speech in schizophrenia. In this exploratory study, we investigated whether positive emotion, negative emotion, and social word use was associated with schizophrenia symptoms, metacognition, and general functioning in a schizophrenia cohort. Forty-six participants generated speech during a semi-structured interview, and word use categories were assessed using a validated lexical analysis measure. Trained research staff completed symptom, metacognition, and functioning ratings using semi-structured interviews. Word use categories significantly predicted all variables of interest, accounting for 28% of the variance in symptoms and 16% of the variance in metacognition and general functioning. Anger words, a subcategory of negative emotion, significantly predicted greater symptoms and lower functioning. Social words significantly predicted greater metacognition. These findings indicate that lexical analysis instruments have the potential to play a vital role in psychosocial assessments of schizophrenia. Future research should replicate these findings and examine the relationship between word use and additional clinical variables across the schizophrenia-spectrum. Copyright © 2015 Elsevier Ltd. All rights reserved.
Morgan, Nick
2008-11-01
Like the best-laid schemes of mice and men, the best-rehearsed speeches go oft astray. No amount of preparation can counter an audience's perception that the speaker is calculating or insincere. Why do so many managers have trouble communicating authenticity to their listeners? Morgan, a communications coach for more than two decades, offers advice for overcoming this difficulty. Recent brain research shows that natural, unstudied gestures--what Morgan calls the " second conversation"--express emotions or impulses a split second before our thought processes have turned them into words. So the timing of practiced gestures will always be subtly off--just enough to be picked up by listeners' unconscious ability to read body language. If you can't practice the unspoken part of your delivery, what can you do? Tap into four basic impulses underlying your speech--to be open to the audience, to connect with it, to be passionate, and to "listen" to how the audience is responding--and then rehearse your presentation with each in mind. You can become more open, for instance, by imagining that you're speaking to your spouse or close friend. To more readily connect, focus on needing to engage your listeners and then to keep their attention, as if you were speaking to a child who isn't heeding your words. To convey your passion, identify the feelings behind your speech and let them come through. To listen, think about what the audience is probably feeling when you step up to the podium and be alert to the nonverbal messages of its members. Internalizing these four impulses as you practice will help you come across as relaxed and authentic--your body language will take care of itself.
On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.
Weninger, Felix; Eyben, Florian; Schuller, Björn W; Mortillaro, Marcello; Scherer, Klaus R
2013-01-01
WITHOUT DOUBT, THERE IS EMOTIONAL INFORMATION IN ALMOST ANY KIND OF SOUND RECEIVED BY HUMANS EVERY DAY: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow's pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of "the sound that something makes," in order to evaluate the system's auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.
Niedtfeld, Inga
2017-07-01
Borderline personality disorder (BPD) is characterized by affective instability and interpersonal problems. In the context of social interaction, impairments in empathy are proposed to result in inadequate social behavior. In contrast to findings of reduced cognitive empathy, some authors suggested enhanced emotional empathy in BPD. It was investigated whether ambiguity leads to decreased cognitive or emotional empathy in BPD. Thirty-four patients with BPD and thirty-two healthy controls were presented with video clips, which were presented through prosody, facial expression, and speech content. Experimental conditions were designed to induce ambiguity by presenting neutral valence in one of these communication channels. Subjects were asked to indicate the actors' emotional valence, their decision confidence, and their own emotional state. BPD patients showed increased emotional empathy when neutral stories comprised nonverbally expressed emotions. In contrast, when all channels were emotional, patients showed lower emotional empathy than healthy controls. Regarding cognitive empathy, there were no significant differences between BPD patients and healthy control subjects in recognition accuracy, but reduced decision confidence in BPD. These results suggest that patients with BPD show altered emotional empathy, experiencing higher rates of emotional contagion when emotions are expressed nonverbally. The latter may contribute to misunderstandings and inadequate social behavior. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
[Minimal emotional dysfunction and first impression formation in personality disorders].
Linden, M; Vilain, M
2011-01-01
"Minimal cerebral dysfunctions" are isolated impairments of basic mental functions, which are elements of complex functions like speech. The best described are cognitive dysfunctions such as reading and writing problems, dyscalculia, attention deficits, but also motor dysfunctions such as problems with articulation, hyperactivity or impulsivity. Personality disorders can be characterized by isolated emotional dysfunctions in relation to emotional adequacy, intensity and responsivity. For example, paranoid personality disorders can be characterized by continuous and inadequate distrust, as a disorder of emotional adequacy. Schizoid personality disorders can be characterized by low expressive emotionality, as a disorder of effect intensity, or dissocial personality disorders can be characterized by emotional non-responsivity. Minimal emotional dysfunctions cause interactional misunderstandings because of the psychology of "first impression formation". Studies have shown that in 100 ms persons build up complex and lasting emotional judgements about other persons. Therefore, minimal emotional dysfunctions result in interactional problems and adjustment disorders and in corresponding cognitive schemata.From the concept of minimal emotional dysfunctions specific psychotherapeutic interventions in respect to the patient-therapist relationship, the diagnostic process, the clarification of emotions and reality testing, and especially an understanding of personality disorders as impairment and "selection, optimization, and compensation" as a way of coping can be derived.
Dual Diathesis-Stressor Model of Emotional and Linguistic Contributions to Developmental Stuttering
ERIC Educational Resources Information Center
Walden, Tedra A.; Frankel, Carl B.; Buhr, Anthony P.; Johnson, Kia N.; Conture, Edward G.; Karrass, Jan M.
2012-01-01
This study assessed emotional and speech-language contributions to childhood stuttering. A dual diathesis-stressor framework guided this study, in which both linguistic requirements and skills, and emotion and its regulation, are hypothesized to contribute to stuttering. The language diathesis consists of expressive and receptive language skills.…
Normal-Hearing Listeners’ and Cochlear Implant Users’ Perception of Pitch Cues in Emotional Speech
Fuller, Christina; Gilbers, Dicky; Broersma, Mirjam; Goudbeek, Martijn; Free, Rolien; Başkent, Deniz
2015-01-01
In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study’s aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings’ pitch cues were phonetically analyzed. The recordings were used to test 20 normal-hearing listeners’ and 20 CI users’ emotion recognition. In congruence with previous studies, high-arousal emotions had a higher mean pitch, wider pitch range, and more dominant pitches than low-arousal emotions. Regarding pitch, speakers did not differentiate emotions based on valence but on arousal. Normal-hearing listeners outperformed CI users in emotion recognition, even when presented with CI simulated stimuli. However, only normal-hearing listeners recognized one particular actor’s emotions worse than the other actors’. The groups behaved differently when presented with similar input, showing that they had to employ differing strategies. Considering the respective speaker’s deviating pronunciation, it appears that for normal-hearing listeners, mean pitch is a more salient cue than pitch range, whereas CI users are biased toward pitch range cues. PMID:27648210
Toward Emotionally Accessible Massive Open Online Courses (MOOCs).
Hillaire, Garron; Iniesto, Francisco; Rienties, Bart
2017-01-01
This paper outlines an approach to evaluating the emotional content of three Massive Open Online Courses (MOOCs) using the affective computing approach of prosody detection on two different text-to-speech voices in conjunction with human raters judging the emotional content of course text. The intent of this work is to establish the potential variation on the emotional delivery of MOOC material through synthetic voice.
Politeness, emotion, and gender: A sociophonetic study of voice pitch modulation
NASA Astrophysics Data System (ADS)
Yuasa, Ikuko
The present dissertation is a cross-gender and cross-cultural sociophonetic exploration of voice pitch characteristics utilizing speech data derived from Japanese and American speakers in natural conversations. The roles of voice pitch modulation in terms of the concepts of politeness and emotion as they pertain to culture and gender will be investigated herein. The research interprets the significance of my findings based on the acoustic measurements of speech data as they are presented in the ERB-rate scale (the most appropriate scale for human speech perception). The investigation reveals that pitch range modulation displayed by Japanese informants in two types of conversations is closely linked to types of politeness adopted by those informants. The degree of the informants' emotional involvement and expressions reflected in differing pitch range widths plays an important role in determining the relationship between pitch range modulation and politeness. The study further correlates the Japanese cultural concept of enryo ("self-restraint") with this phenomenon. When median values were examined, male and female pitch ranges across cultures did not conspicuously differ. However, sporadically occurring women's pitch characteristics which culturally differ in width and height of pitch ranges may create an 'emotional' perception of women's speech style. The salience of these pitch characteristics appears to be the source of the stereotypically linked sound of women's speech being identified as 'swoopy' or 'shrill' and thus 'emotional'. Such women's salient voice characteristics are interpreted in light of camaraderie/positive politeness. Women's use of conspicuous paralinguistic features helps to create an atmosphere of camaraderie. These voice pitch characteristics promote the establishment of a sense of camaraderie since they act to emphasize such feelings as concern, support, and comfort towards addressees, Moreover, men's wide pitch ranges are discussed in view of politeness (rather than gender). Japanese men's use of wide pitch ranges during conversations with familiar interlocutors demonstrate the extent to which male speakers can increase their pitch ranges if there is an authentic socio-cultural inspiration (other than a gender-related one) to do so. The findings suggest the necessity of interpreting research data in consideration of how the notion of gender interacts with other socio-cultural behavioral norms.
[Speech and thought disorder in frontal syndrome following subarachnoid hemorrhage].
Magiera, P; Sep-Kowalik, B; Pankiewicz, P; Pankiewicz, K
1994-01-01
Here is described a case of a patient suffering from cerebral hemorrhage resulting in the perforation of the third cerebral ventricle and massive damage of the frontal lobes in consequence of the rupture of an intracranial aneurysm. After neurosurgical operation the patient's general state improved, but in spite of this he displayed symptoms of the frontal syndrome with many symptoms in the area of abstractional thinking and reflectiveness and a significant reduction of higher emotionality. Very interesting in this case is the neurolinguistic symptomatology. The rehabilitation and pharmacotherapy was very successful. This case is very interesting because it contains many of the symptoms called "frontal syndrome". It is also important to show the role of the frontal lobes in the integral process of mental life and in the role of the left hemisphere in the gnostic and coordinative processes of speech and other higher functions of the central nervous system.
Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor
2015-01-01
In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757
A Hierarchical multi-input and output Bi-GRU Model for Sentiment Analysis on Customer Reviews
NASA Astrophysics Data System (ADS)
Zhang, Liujie; Zhou, Yanquan; Duan, Xiuyu; Chen, Ruiqi
2018-03-01
Multi-label sentiment classification on customer reviews is a practical challenging task in Natural Language Processing. In this paper, we propose a hierarchical multi-input and output model based bi-directional recurrent neural network, which both considers the semantic and lexical information of emotional expression. Our model applies two independent Bi-GRU layer to generate part of speech and sentence representation. Then the lexical information is considered via attention over output of softmax activation on part of speech representation. In addition, we combine probability of auxiliary labels as feature with hidden layer to capturing crucial correlation between output labels. The experimental result shows that our model is computationally efficient and achieves breakthrough improvements on customer reviews dataset.
Most, Tova; Gaon-Sivan, Gal; Shpak, Talma; Luntz, Michal
2012-01-01
Binaural hearing in cochlear implant (CI) users can be achieved either by bilateral implantation or bimodally with a contralateral hearing aid (HA). Binaural-bimodal hearing has the advantage of complementing the high-frequency electric information from the CI by low-frequency acoustic information from the HA. We examined the contribution of a contralateral HA in 25 adult implantees to their perception of fundamental frequency-cued speech characteristics (initial consonant voicing, intonation, and emotions). Testing with CI alone, HA alone, and bimodal hearing showed that all three characteristics were best perceived under the bimodal condition. Significant differences were recorded between bimodal and HA conditions in the initial voicing test, between bimodal and CI conditions in the intonation test, and between both bimodal and CI conditions and between bimodal and HA conditions in the emotion-in-speech test. These findings confirmed that such binaural-bimodal hearing enhances perception of these speech characteristics and suggest that implantees with residual hearing in the contralateral ear may benefit from a HA in that ear.
Computerized Measurement of Negative Symptoms in Schizophrenia
Cohen, Alex S.; Alpert, Murray; Nienow, Tasha M.; Dinzeo, Thomas J.; Docherty, Nancy M.
2008-01-01
Accurate measurement of negative symptoms is crucial for understanding and treating schizophrenia. However, current measurement strategies are reliant on subjective symptom rating scales which often have psychometric and practical limitations. Computerized analysis of patients’ speech offers a sophisticated and objective means of evaluating negative symptoms. The present study examined the feasibility and validity of using widely-available acoustic and lexical-analytic software to measure flat affect, alogia and anhedonia (via positive emotion). These measures were examined in their relationships to clinically-rated negative symptoms and social functioning. Natural speech samples were collected and analyzed for 14 patients with clinically-rated flat affect, 46 patients without flat affect and 19 healthy controls. The computer-based inflection and speech rate measures significantly discriminated patients with flat affect from controls, and the computer-based measure of alogia and negative emotion significantly discriminated the flat and non-flat patients. Both the computer and clinical measures of positive emotion/anhedonia corresponded to functioning impairments. The computerized method of assessing negative symptoms offered a number of advantages over the symptom scale-based approach. PMID:17920078
Yuskaitis, Christopher J.; Parviz, Mahsa; Loui, Psyche; Wan, Catherine Y.; Pearl, Phillip L.
2017-01-01
Music production and perception invoke a complex set of cognitive functions that rely on the integration of sensory-motor, cognitive, and emotional pathways. Pitch is a fundamental perceptual attribute of sound and a building block for both music and speech. Although the cerebral processing of pitch is not completely understood, recent advances in imaging and electrophysiology have provided insight into the functional and anatomical pathways of pitch processing. This review examines the current understanding of pitch processing, behavioral and neural variations that give rise to difficulties in pitch processing, and potential applications of music education for language processing disorders such as dyslexia. PMID:26092314
Durlik, Caroline; Brown, Gary; Tsakiris, Manos
2014-04-01
Interoceptive awareness (IA)--the ability to detect internal body signals--has been linked to various aspects of emotional processing. However, it has been examined mostly as a trait variable, with few studies also investigating state dependent fluctuations in IA. Based on the known positive correlation between IA and emotional reactivity, negative affectivity, and trait anxiety, the current study examined whether IA, as indexed by heartbeat detection accuracy, would change during an anxiety-provoking situation. Participants in the experimental condition, in which they anticipated giving a speech in front of a small audience, displayed significant IA increases from baseline to anticipation. Enhancement in IA was positively correlated with fear of negative evaluation. Implications of the results are discussed in relation to the role of trait and state IA in emotional experience.
Graham, Susan A; San Juan, Valerie; Khu, Melanie
2017-05-01
When linguistic information alone does not clarify a speaker's intended meaning, skilled communicators can draw on a variety of cues to infer communicative intent. In this paper, we review research examining the developmental emergence of preschoolers' sensitivity to a communicative partner's perspective. We focus particularly on preschoolers' tendency to use cues both within the communicative context (i.e. a speaker's visual access to information) and within the speech signal itself (i.e. emotional prosody) to make on-line inferences about communicative intent. Our review demonstrates that preschoolers' ability to use visual and emotional cues of perspective to guide language interpretation is not uniform across tasks, is sometimes related to theory of mind and executive function skills, and, at certain points of development, is only revealed by implicit measures of language processing.
ERIC Educational Resources Information Center
Filippatou, Diamanto; Dimitropoulou, Panagiota; Sideridis, Georgios
2009-01-01
The purpose of the present study was to investigate the differences between students with LD and SLI on emotional psychopathology and cognitive variables. In particular, the study examined whether cognitive, emotional, and psychopathology variables are significant discriminatory variables of speech and language disordered groups versus those…
Bipolar Disorder in Children: Implications for Speech-Language Pathologists
ERIC Educational Resources Information Center
Quattlebaum, Patricia D.; Grier, Betsy C.; Klubnik, Cynthia
2012-01-01
In the United States, bipolar disorder is an increasingly common diagnosis in children, and these children can present with severe behavior problems and emotionality. Many studies have documented the frequent coexistence of behavior disorders and speech-language disorders. Like other children with behavior disorders, children with bipolar disorder…
Children with Speech Sound Disorders at School: Challenges for Children, Parents and Teachers
ERIC Educational Resources Information Center
Daniel, Graham R.; McLeod, Sharynne
2017-01-01
Teachers play a major role in supporting children's educational, social, and emotional development although may be unprepared for supporting children with speech sound disorders. Interviews with 34 participants including six focus children, their parents, siblings, friends, teachers and other significant adults in their lives highlighted…
Effective Vocal Production in Performance.
ERIC Educational Resources Information Center
King, Robert G.
If speech instructors are to teach students to recreate for an audience an author's intellectual and emotional meanings, they must teach them to use human voice effectively. Seven essential elements of effective vocal production that often pose problems for oral interpretation students should be central to any speech training program: (1)…
The Relationship between Psychopathology and Speech and Language Disorders in Neurologic Patients.
ERIC Educational Resources Information Center
Sapir, Shimon; Aronson, Arnold E.
1990-01-01
This paper reviews findings that suggest a causal relationship between depression, anxiety, or conversion reaction and voice, speech, and language disorders in neurologic patients. The paper emphasizes the need to consider the psychosocial and psychopathological aspects of neurologic communicative disorders, the link between emotional and…
Guntupalli, Vijaya K; Everhart, D Erik; Kalinowski, Joseph; Nanjundeswaran, Chayadevie; Saltuklaroglu, Tim
2007-01-01
People who stutter produce speech that is characterized by intermittent, involuntary part-word repetitions and prolongations. In addition to these signature acoustic manifestations, those who stutter often display repetitive and fixated behaviours outside the speech producing mechanism (e.g. in the head, arm, fingers, nares, etc.). Previous research has examined the attitudes and perceptions of those who stutter and people who frequently interact with them (e.g. relatives, parents, employers). Results have shown an unequivocal, powerful and robust negative stereotype despite a lack of defined differences in personality structure between people who stutter and normally fluent individuals. However, physiological investigations of listener responses during moments of stuttering are limited. There is a need for data that simultaneously examine physiological responses (e.g. heart rate and galvanic skin conductance) and subjective behavioural responses to stuttering. The pairing of these objective and subjective data may provide information that casts light on the genesis of negative stereotypes associated with stuttering, the development of compensatory mechanisms in those who stutter, and the true impact of stuttering on senders and receivers alike. To compare the emotional and physiological responses of fluent speakers while listening and observing fluent and severe stuttered speech samples. Twenty adult participants (mean age = 24.15 years, standard deviation = 3.40) observed speech samples of two fluent speakers and two speakers who stutter reading aloud. Participants' skin conductance and heart rate changes were measured as physiological responses to stuttered or fluent speech samples. Participants' subjective responses on arousal (excited-calm) and valence (happy-unhappy) dimensions were assessed via the Self-Assessment Manikin (SAM) rating scale with an additional questionnaire comprised of a set of nine bipolar adjectives. Results showed significantly increased skin conductance and lower mean heart rate during the presentation of stuttered speech relative to the presentation of fluent speech samples (p<0.05). Listeners also self-rated themselves as being more aroused, unhappy, nervous, uncomfortable, sad, tensed, unpleasant, avoiding, embarrassed, and annoyed while viewing stuttered speech relative to the fluent speech. These data support the notion that stutter-filled speech can elicit physiological and emotional responses in listeners. Clinicians who treat stuttering should be aware that listeners show involuntary physiological responses to moderate-severe stuttering that probably remain salient over time and contribute to the evolution of negative stereotypes of people who stutter. With this in mind, it is hoped that clinicians can work with people who stutter to develop appropriate coping strategies. The role of amygdala and mirror neural mechanism in physiological and subjective responses to stuttering is discussed.
Perception of affective and linguistic prosody: an ALE meta-analysis of neuroimaging studies.
Belyk, Michel; Brown, Steven
2014-09-01
Prosody refers to the melodic and rhythmic aspects of speech. Two forms of prosody are typically distinguished: 'affective prosody' refers to the expression of emotion in speech, whereas 'linguistic prosody' relates to the intonation of sentences, including the specification of focus within sentences and stress within polysyllabic words. While these two processes are united by their use of vocal pitch modulation, they are functionally distinct. In order to examine the localization and lateralization of speech prosody in the brain, we performed two voxel-based meta-analyses of neuroimaging studies of the perception of affective and linguistic prosody. There was substantial sharing of brain activations between analyses, particularly in right-hemisphere auditory areas. However, a major point of divergence was observed in the inferior frontal gyrus: affective prosody was more likely to activate Brodmann area 47, while linguistic prosody was more likely to activate the ventral part of area 44. © The Author (2013). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
On how the brain decodes vocal cues about speaker confidence.
Jiang, Xiaoming; Pell, Marc D
2015-05-01
In speech communication, listeners must accurately decode vocal cues that refer to the speaker's mental state, such as their confidence or 'feeling of knowing'. However, the time course and neural mechanisms associated with online inferences about speaker confidence are unclear. Here, we used event-related potentials (ERPs) to examine the temporal neural dynamics underlying a listener's ability to infer speaker confidence from vocal cues during speech processing. We recorded listeners' real-time brain responses while they evaluated statements wherein the speaker's tone of voice conveyed one of three levels of confidence (confident, close-to-confident, unconfident) or were spoken in a neutral manner. Neural responses time-locked to event onset show that the perceived level of speaker confidence could be differentiated at distinct time points during speech processing: unconfident expressions elicited a weaker P2 than all other expressions of confidence (or neutral-intending utterances), whereas close-to-confident expressions elicited a reduced negative response in the 330-500 msec and 550-740 msec time window. Neutral-intending expressions, which were also perceived as relatively confident, elicited a more delayed, larger sustained positivity than all other expressions in the 980-1270 msec window for this task. These findings provide the first piece of evidence of how quickly the brain responds to vocal cues signifying the extent of a speaker's confidence during online speech comprehension; first, a rough dissociation between unconfident and confident voices occurs as early as 200 msec after speech onset. At a later stage, further differentiation of the exact level of speaker confidence (i.e., close-to-confident, very confident) is evaluated via an inferential system to determine the speaker's meaning under current task settings. These findings extend three-stage models of how vocal emotion cues are processed in speech comprehension (e.g., Schirmer & Kotz, 2006) by revealing how a speaker's mental state (i.e., feeling of knowing) is simultaneously inferred from vocal expressions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Voice emotion recognition by cochlear-implanted children and their normally-hearing peers.
Chatterjee, Monita; Zion, Danielle J; Deroche, Mickael L; Burianek, Brooke A; Limb, Charles J; Goren, Alison P; Kulkarni, Aditya M; Christensen, Julie A
2015-04-01
Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups' mean performance is similar to aNHs' performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. This article is part of a Special Issue entitled
Fernández-Aranda, Fernando; Jiménez-Murcia, Susana; Santamaría, Juan J; Gunnard, Katarina; Soto, Antonio; Kalapanidas, Elias; Bults, Richard G A; Davarakis, Costas; Ganchev, Todor; Granero, Roser; Konstantas, Dimitri; Kostoulas, Theodoros P; Lam, Tony; Lucas, Mikkel; Masuet-Aumatell, Cristina; Moussa, Maher H; Nielsen, Jeppe; Penelo, Eva
2012-08-01
Previous review studies have suggested that computer games can serve as an alternative or additional form of treatment in several areas (schizophrenia, asthma or motor rehabilitation). Although several naturalistic studies have been conducted showing the usefulness of serious video games in the treatment of some abnormal behaviours, there is a lack of serious games specially designed for treating mental disorders. The purpose of our project was to develop and evaluate a serious video game designed to remediate attitudinal, behavioural and emotional processes of patients with impulse-related disorders. The video game was created and developed within the European research project PlayMancer. It aims to prove potential capacity to change underlying attitudinal, behavioural and emotional processes of patients with impulse-related disorders. New interaction modes were provided by newly developed components, such as emotion recognition from speech, face and physiological reactions, while specific impulsive reactions were elicited. The video game uses biofeedback for helping patients to learn relaxation skills, acquire better self-control strategies and develop new emotional regulation strategies. In this article, we present a description of the video game used, rationale, user requirements, usability and preliminary data, in several mental disorders.
ERIC Educational Resources Information Center
New York State Education Dept., Albany. Div. for Handicapped Children.
Six speeches given at an institute on reading programs for emotionally handicapped children are presented. Jules Abrams first examines the relationship of emotional and personality maladjustments to reading difficulty. Then Clifford Kolson advocates the promotion of informal reading and the proper diagnosis of a child's reading level. A discussion…
Computer Graphics Research Laboratory
1994-01-31
loken language (words and contextually appropriate intonation marking topic and focus), fac %. iove- ments (lip shapes, emotions , gaze direction, head...content of speech (scrunching one’s lose when talking about something unpleasant), emotion (wrinkling one’s eyebrows with wov ry), personality...ng at the other person to see how she follows), look for information, express emotion (lookii , downward in case of sadness), or influence another
Attitudes toward speech disorders: sampling the views of Cantonese-speaking Americans.
Bebout, L; Arthur, B
1997-01-01
Speech-language pathologists who serve clients from cultural backgrounds that are not familiar to them may encounter culturally influenced attitudinal differences. A questionnaire with statements about 4 speech disorders (dysfluency, cleft pallet, speech of the deaf, and misarticulations) was given to a focus group of Chinese Americans and a comparison group of non-Chinese Americans. The focus group was much more likely to believe that persons with speech disorders could improve their own speech by "trying hard," was somewhat more likely to say that people who use deaf speech and people with cleft palates might be "emotionally disturbed," and generally more likely to view deaf speech as a limitation. The comparison group was more pessimistic about stuttering children's acceptance by their peers than was the focus group. The two subject groups agreed about other items, such as the likelihood that older children with articulation problems are "less intelligent" than their peers.
Developmental Variables and Speech-Language in a Special Education Intervention Model.
ERIC Educational Resources Information Center
Cruz, Maria del C.; Ayala, Myrna
Case studies of eight children with speech and language impairments are presented in a review of the intervention efforts at the Demonstration Center for Preschool Special Education (DCPSE) in Puerto Rico. Five components of the intervention model are examined: social medical history, intelligence, motor development, socio-emotional development,…
ERIC Educational Resources Information Center
Huinck, Wendy J.; Langevin, Marilyn; Kully, Deborah; Graamans, Kees; Peters, Herman F. M.; Hulstijn, Wouter
2006-01-01
A procedure for subtyping individuals who stutter and its relationship to treatment outcome is explored. Twenty-five adult participants of the Comprehensive Stuttering Program (CSP) were classified according to: (1) stuttering severity and (2) severity of negative emotions and cognitions associated with their speech problem. Speech characteristics…
Imagery, Concept Formation and Creativity--From Past to Future.
ERIC Educational Resources Information Center
Silverstein, Ora. N. Asael
At the center of the conceptual framework there is visual imagery. Man's emotional and mental behavior is built on archetypal symbols that are the source of creative ideas. Native American pictography, in particular, illustrates this in the correlation between gesture speech and verbal speech. The author's research in this area has included a…
Gross, Wibke; Linden, Ulrike; Ostermann, Thomas
2010-07-21
Language development is one of the most significant processes of early childhood development. Children with delayed speech development are more at risk of acquiring other cognitive, social-emotional, and school-related problems. Music therapy appears to facilitate speech development in children, even within a short period of time. The aim of this pilot study is to explore the effects of music therapy in children with delayed speech development. A total of 18 children aged 3.5 to 6 years with delayed speech development took part in this observational study in which music therapy and no treatment were compared to demonstrate effectiveness. Individual music therapy was provided on an outpatient basis. An ABAB reversal design with alternations between music therapy and no treatment with an interval of approximately eight weeks between the blocks was chosen. Before and after each study period, a speech development test, a non-verbal intelligence test for children, and music therapy assessment scales were used to evaluate the speech development of the children. Compared to the baseline, we found a positive development in the study group after receiving music therapy. Both phonological capacity and the children's understanding of speech increased under treatment, as well as their cognitive structures, action patterns, and level of intelligence. Throughout the study period, developmental age converged with their biological age. Ratings according to the Nordoff-Robbins scales showed clinically significant changes in the children, namely in the areas of client-therapist relationship and communication. This study suggests that music therapy may have a measurable effect on the speech development of children through the treatment's interactions with fundamental aspects of speech development, including the ability to form and maintain relationships and prosodic abilities. Thus, music therapy may provide a basic and supportive therapy for children with delayed speech development. Further studies should be conducted to investigate the mechanisms of these interactions in greater depth. The trial is registered in the German clinical trials register; Trial-No.: DRKS00000343.
Na, Ji Young; Wilkinson, Krista; Karny, Meredith; Blackstone, Sarah; Stifter, Cynthia
2016-08-01
Emotional competence refers to the ability to identify, respond to, and manage one's own and others' emotions. Emotional competence is critical to many functional outcomes, including making and maintaining friends, academic success, and community integration. There appears to be a link between the development of language and the development of emotional competence in children who use speech. Little information is available about these issues in children who rely on augmentative and alternative communication (AAC). In this article, we consider how AAC systems can be designed to support communication about emotions and the development of emotional competence. Because limited research exists on communication about emotions in a context of aided AAC, theory and research from other fields (e.g., psychology, linguistics, child development) is reviewed to identify key features of emotional competence and their possible implications for AAC design and intervention. The reviewed literature indicated that the research and clinical attention to emotional competence in children with disabilities is encouraging. However, the ideas have not been considered specifically in the context of aided AAC. On the basis of the reviewed literature, we offer practical suggestions for system design and AAC use for communication about emotions with children who have significant disabilities. Three key elements of discussing emotions (i.e., emotion name, reason, and solution) are suggested for inclusion in order to provide these children with opportunities for a full range of discussion about emotions. We argue that supporting communication about emotions is as important for children who use AAC as it is for children who are learning speech. This article offers a means to integrate information from other fields for the purpose of enriching AAC supports.
Acoustic analysis of speech under stress.
Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish
2015-01-01
When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation.
An algorithm of improving speech emotional perception for hearing aid
NASA Astrophysics Data System (ADS)
Xi, Ji; Liang, Ruiyu; Fei, Xianju
2017-07-01
In this paper, a speech emotion recognition (SER) algorithm was proposed to improve the emotional perception of hearing-impaired people. The algorithm utilizes multiple kernel technology to overcome the drawback of SVM: slow training speed. Firstly, in order to improve the adaptive performance of Gaussian Radial Basis Function (RBF), the parameter determining the nonlinear mapping was optimized on the basis of Kernel target alignment. Then, the obtained Kernel Function was used as the basis kernel of Multiple Kernel Learning (MKL) with slack variable that could solve the over-fitting problem. However, the slack variable also brings the error into the result. Therefore, a soft-margin MKL was proposed to balance the margin against the error. Moreover, the relatively iterative algorithm was used to solve the combination coefficients and hyper-plane equations. Experimental results show that the proposed algorithm can acquire an accuracy of 90% for five kinds of emotions including happiness, sadness, anger, fear and neutral. Compared with KPCA+CCA and PIM-FSVM, the proposed algorithm has the highest accuracy.
On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common
Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.
2013-01-01
Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144
Ahadi, Mohsen; Pourbakht, Akram; Jafari, Amir Homayoun; Shirjian, Zahra; Jafarpisheh, Amir Salar
2014-06-01
To investigate the influence of gender on subcortical representation of speech acoustic parameters where simultaneously presented to both ears. Two-channel speech-evoked auditory brainstem responses were obtained in 25 female and 23 male normal hearing young adults by using binaural presentation of the 40 ms synthetic consonant-vowel/da/, and the encoding of the fast and slow elements of speech stimuli at subcortical level were compared in the temporal and spectral domains between the sexes using independent sample, two tailed t-test. Highly detectable responses were established in both groups. Analysis in the time domain revealed earlier and larger Fast-onset-responses in females but there was no gender related difference in sustained segment and offset of the response. Interpeak intervals between Frequency Following Response peaks were also invariant to sex. Based on shorter onset responses in females, composite onset measures were also sex dependent. Analysis in the spectral domain showed more robust and better representation of fundamental frequency as well as the first formant and high frequency components of first formant in females than in males. Anatomical, biological and biochemical distinctions between females and males could alter the neural encoding of the acoustic cues of speech stimuli at subcortical level. Females have an advantage in binaural processing of the slow and fast elements of speech. This could be a physiological evidence for better identification of speaker and emotional tone of voice, as well as better perceiving the phonetic information of speech in women. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Preliminary Analysis of Automatic Speech Recognition and Synthesis Technology.
1983-05-01
16.311 % a. Seale In/Se"l tAL4 lrs e y i s 2 I ROM men "Ig eddiei, m releerla ons leveltc. Ŗ dots ghoeea INDtISTRtAIJ%6LITARY SPEECH SYNTHESIS PRODUCTS...saquence The SC-01 Suech Syntheszer conftains 64 cf, arent poneme~hs which are accessed try A 6-tht code. 1 - the proper sequ.enti omthnatiors of thoe...connected speech input with widely differing emotional states, diverse accents, and substantial nonperiodic background noise input. As noted previously
The Atlanta Motor Speech Disorders Corpus: Motivation, Development, and Utility.
Laures-Gore, Jacqueline; Russell, Scott; Patel, Rupal; Frankel, Michael
2016-01-01
This paper describes the design and collection of a comprehensive spoken language dataset from speakers with motor speech disorders in Atlanta, Ga., USA. This collaborative project aimed to gather a spoken database consisting of nonmainstream American English speakers residing in the Southeastern US in order to provide a more diverse perspective of motor speech disorders. Ninety-nine adults with an acquired neurogenic disorder resulting in a motor speech disorder were recruited. Stimuli include isolated vowels, single words, sentences with contrastive focus, sentences with emotional content and prosody, sentences with acoustic and perceptual sensitivity to motor speech disorders, as well as 'The Caterpillar' and 'The Grandfather' passages. Utility of this data in understanding the potential interplay of dialect and dysarthria was demonstrated with a subset of the speech samples existing in the database. The Atlanta Motor Speech Disorders Corpus will enrich our understanding of motor speech disorders through the examination of speech from a diverse group of speakers. © 2016 S. Karger AG, Basel.
Got EQ?: Increasing Cultural and Clinical Competence through Emotional Intelligence
ERIC Educational Resources Information Center
Robertson, Shari A.
2007-01-01
Cultural intelligence has been described across three parameters of human behavior: cognitive intelligence, emotional intelligence (EQ), and physical intelligence. Each contributes a unique and important perspective to the ability of speech-language pathologists and audiologists to provide benefits to their clients regardless of cultural…
Vulnerability to Bullying in Children with a History of Specific Speech and Language Difficulties
ERIC Educational Resources Information Center
Lindsay, Geoff; Dockrell, Julie E.; Mackie, Clare
2008-01-01
This study examined the susceptibility to problems with peer relationships and being bullied in a UK sample of 12-year-old children with a history of specific speech and language difficulties. Data were derived from the children's self-reports and the reports of parents and teachers using measures of victimization, emotional and behavioral…
Priming of Non-Speech Vocalizations in Male Adults: The Influence of the Speaker's Gender
ERIC Educational Resources Information Center
Fecteau, Shirley; Armony, Jorge L.; Joanette, Yves; Belin, Pascal
2004-01-01
Previous research reported a priming effect for voices. However, the type of information primed is still largely unknown. In this study, we examined the influence of speaker's gender and emotional category of the stimulus on priming of non-speech vocalizations in 10 male participants, who performed a gender identification task. We found a…
ERIC Educational Resources Information Center
Wells, Elizabeth M.; Walsh, Karin S.; Khademian, Zarir P.; Keating, Robert F.; Packer, Roger J.
2008-01-01
The postoperative cerebellar mutism syndrome (CMS), consisting of diminished speech output, hypotonia, ataxia, and emotional lability, occurs after surgery in up to 25% of patients with medulloblastoma and occasionally after removal of other posterior fossa tumors. Although the mutism is transient, speech rarely normalizes and the syndrome is…
Separating the Problem and the Person: Insights from Narrative Therapy with People Who Stutter
ERIC Educational Resources Information Center
Ryan, Fiona; O'Dwyer, Mary; Leahy, Margaret M.
2015-01-01
Stuttering is a complex disorder of speech that encompasses motor speech and emotional and cognitive factors. The use of narrative therapy is described here, focusing on the stories that clients tell about the problems associated with stuttering that they have encountered in their lives. Narrative therapy uses these stories to understand, analyze,…
ERIC Educational Resources Information Center
Indiana State Dept. of Education, Indianapolis. Div. of Special Education.
The guide provides an information resource for related and supportive services personnel (e.g., school nurse, physical therapist, speech language pathologist) in their interactions with emotionally handicapped (EH) students. Following a definition of EH students, the first of six brief chapters discusses student characteristics, presents three…
Nonverbal Effects in Memory for Dialogue.
ERIC Educational Resources Information Center
Narvaez, Alice; Hertel, Paula T.
Memory for everyday conversational speech may be influenced by the nonverbally communicated emotion of the speaker. In order to investigate this premise, three videotaped scenes with bipolar emotional perspectives (joy/fear about going away to college, fear/anger about having been robbed, and disgust/interest regarding a friend's infidelity) were…
Convergence of semantics and emotional expression within the IFG pars orbitalis.
Belyk, Michel; Brown, Steven; Lim, Jessica; Kotz, Sonja A
2017-08-01
Humans communicate through a combination of linguistic and emotional channels, including propositional speech, writing, sign language, music, but also prosodic, facial, and gestural expression. These channels can be interpreted separately or they can be integrated to multimodally convey complex meanings. Neural models of the perception of semantics and emotion include nodes for both functions in the inferior frontal gyrus pars orbitalis (IFGorb). However, it is not known whether this convergence involves a common functional zone or instead specialized subregions that process semantics and emotion separately. To address this, we performed Kernel Density Estimation meta-analyses of published neuroimaging studies of the perception of semantics or emotion that reported activation in the IFGorb. The results demonstrated that the IFGorb contains two zones with distinct functional profiles. A lateral zone, situated immediately ventral to Broca's area, was implicated in both semantics and emotion. Another zone, deep within the ventral frontal operculum, was engaged almost exclusively by studies of emotion. Follow-up analysis using Meta-Analytic Connectivity Modeling demonstrated that both zones were frequently co-activated with a common network of sensory, motor, and limbic structures, although the lateral zone had a greater association with prefrontal cortical areas involved in executive function. The status of the lateral IFGorb as a point of convergence between the networks for processing semantic and emotional content across modalities of communication is intriguing since this structure is preserved across primates with limited semantic abilities. Hence, the IFGorb may have initially evolved to support the comprehension of emotional signals, being later co-opted to support semantic communication in humans by forming new connections with brain regions that formed the human semantic network. Copyright © 2017 Elsevier Inc. All rights reserved.
Thompson, Laura A; Malloy, Daniel M; Cone, John M; Hendrickson, David L
2010-01-01
We introduce a novel paradigm for studying the cognitive processes used by listeners within interactive settings. This paradigm places the talker and the listener in the same physical space, creating opportunities for investigations of attention and comprehension processes taking place during interactive discourse situations. An experiment was conducted to compare results from previous research using videotaped stimuli to those obtained within the live face-to-face task paradigm. A headworn apparatus is used to briefly display LEDs on the talker's face in four locations as the talker communicates with the participant. In addition to the primary task of comprehending speeches, participants make a secondary task light detection response. In the present experiment, the talker gave non-emotionally-expressive speeches that were used in past research with videotaped stimuli. Signal detection analysis was employed to determine which areas of the face received the greatest focus of attention. Results replicate previous findings using videotaped methods.
Thompson, Laura A.; Malloy, Daniel M.; Cone, John M.; Hendrickson, David L.
2009-01-01
We introduce a novel paradigm for studying the cognitive processes used by listeners within interactive settings. This paradigm places the talker and the listener in the same physical space, creating opportunities for investigations of attention and comprehension processes taking place during interactive discourse situations. An experiment was conducted to compare results from previous research using videotaped stimuli to those obtained within the live face-to-face task paradigm. A headworn apparatus is used to briefly display LEDs on the talker’s face in four locations as the talker communicates with the participant. In addition to the primary task of comprehending speeches, participants make a secondary task light detection response. In the present experiment, the talker gave non-emotionally-expressive speeches that were used in past research with videotaped stimuli. Signal detection analysis was employed to determine which areas of the face received the greatest focus of attention. Results replicate previous findings using videotaped methods. PMID:21113354
Cost-sensitive learning for emotion robust speaker recognition.
Li, Dongdong; Yang, Yingchun; Dai, Weihui
2014-01-01
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Cost-Sensitive Learning for Emotion Robust Speaker Recognition
Li, Dongdong; Yang, Yingchun
2014-01-01
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492
Pinheiro, Ana P; Rezaii, Neguine; Rauber, Andréia; Nestor, Paul G; Spencer, Kevin M; Niznikiewicz, Margaret
2017-09-01
Abnormalities in self-other voice processing have been observed in schizophrenia, and may underlie the experience of hallucinations. More recent studies demonstrated that these impairments are enhanced for speech stimuli with negative content. Nonetheless, few studies probed the temporal dynamics of self versus nonself speech processing in schizophrenia and, particularly, the impact of semantic valence on self-other voice discrimination. In the current study, we examined these questions, and additionally probed whether impairments in these processes are associated with the experience of hallucinations. Fifteen schizophrenia patients and 16 healthy controls listened to 420 prerecorded adjectives differing in voice identity (self-generated [SGS] versus nonself speech [NSS]) and semantic valence (neutral, positive, and negative), while EEG data were recorded. The N1, P2, and late positive potential (LPP) ERP components were analyzed. ERP results revealed group differences in the interaction between voice identity and valence in the P2 and LPP components. Specifically, LPP amplitude was reduced in patients compared with healthy subjects for SGS and NSS with negative content. Further, auditory hallucinations severity was significantly predicted by LPP amplitude: the higher the SAPS "voices conversing" score, the larger the difference in LPP amplitude between negative and positive NSS. The absence of group differences in the N1 suggests that self-other voice processing abnormalities in schizophrenia are not primarily driven by disrupted sensory processing of voice acoustic information. The association between LPP amplitude and hallucination severity suggests that auditory hallucinations are associated with enhanced sustained attention to negative cues conveyed by a nonself voice. © 2017 Society for Psychophysiological Research.
Dreyer, Felix R; Pulvermüller, Friedemann
2018-03-01
Previous research showed that modality-preferential sensorimotor areas are relevant for processing concrete words used to speak about actions. However, whether modality-preferential areas also play a role for abstract words is still under debate. Whereas recent functional magnetic resonance imaging (fMRI) studies suggest an involvement of motor cortex in processing the meaning of abstract emotion words as, for example, 'love', other non-emotional abstract words, in particular 'mental words', such as 'thought' or 'logic', are believed to engage 'amodal' semantic systems only. In the present event-related fMRI experiment, subjects passively read abstract emotional and mental nouns along with concrete action related words. Contrary to expectation, the results indicate a specific involvement of face motor areas in the processing of mental nouns, resembling that seen for face related action words. This result was confirmed when subject-specific regions of interest (ROIs) defined by motor localizers were used. We conclude that a role of motor systems in semantic processing is not restricted to concrete words but extends to at least some abstract mental symbols previously thought to be entirely 'disembodied' and divorced from semantically related sensorimotor processing. Implications for neurocognitive theories of semantics and clinical applications will be highlighted, paying specific attention to the role of brain activations as indexes of cognitive processes and their relationships to 'causal' studies addressing lesion and transcranial magnetic stimulation (TMS) effects. Possible implications for clinical practice, in particular speech language therapy, are discussed in closing. Copyright © 2017. Published by Elsevier Ltd.
The Effects of Alcohol on the Emotional Displays of Whites in Interracial Groups
Fairbairn, Catharine E.; Sayette, Michael A.; Levine, John M.; Cohn, Jeffrey F.; Creswell, Kasey G.
2017-01-01
Discomfort during interracial interactions is common among Whites in the U.S. and is linked to avoidance of interracial encounters. While the negative consequences of interracial discomfort are well-documented, understanding of its causes is still incomplete. Alcohol consumption has been shown to decrease negative emotions caused by self-presentational concern but increase negative emotions associated with racial prejudice. Using novel behavioral-expressive measures of emotion, we examined the impact of alcohol on displays of discomfort among 92 White individuals interacting in all-White or interracial groups. We used the Facial Action Coding System and comprehensive content-free speech analyses to examine affective and behavioral dynamics during these 36-minute exchanges (7.9 million frames of video data). Among Whites consuming nonalcoholic beverages, those assigned to interracial groups evidenced more facial and speech displays of discomfort than those in all-White groups. In contrast, among intoxicated Whites there were no differences in displays of discomfort between interracial and all-White groups. Results highlight the central role of self-presentational concerns in interracial discomfort and offer new directions for applying theory and methods from emotion science to the examination of intergroup relations. PMID:23356562
The effects of alcohol on the emotional displays of Whites in interracial groups.
Fairbairn, Catharine E; Sayette, Michael A; Levine, John M; Cohn, Jeffrey F; Creswell, Kasey G
2013-06-01
Discomfort during interracial interactions is common among Whites in the U.S. and is linked to avoidance of interracial encounters. While the negative consequences of interracial discomfort are well-documented, understanding of its causes is still incomplete. Alcohol consumption has been shown to decrease negative emotions caused by self-presentational concern but increase negative emotions associated with racial prejudice. Using novel behavioral-expressive measures of emotion, we examined the impact of alcohol on displays of discomfort among 92 White individuals interacting in all-White or interracial groups. We used the Facial Action Coding System and comprehensive content-free speech analyses to examine affective and behavioral dynamics during these 36-min exchanges (7.9 million frames of video data). Among Whites consuming nonalcoholic beverages, those assigned to interracial groups evidenced more facial and speech displays of discomfort than those in all-White groups. In contrast, among intoxicated Whites there were no differences in displays of discomfort between interracial and all-White groups. Results highlight the central role of self-presentational concerns in interracial discomfort and offer new directions for applying theory and methods from emotion science to the examination of intergroup relations.
Experience-induced Malleability in Neural Encoding of Pitch, Timbre, and Timing
Kraus, Nina; Skoe, Erika; Parbery-Clark, Alexandra; Ashley, Richard
2009-01-01
Speech and music are highly complex signals that have many shared acoustic features. Pitch, Timbre, and Timing can be used as overarching perceptual categories for describing these shared properties. The acoustic cues contributing to these percepts also have distinct subcortical representations which can be selectively enhanced or degraded in different populations. Musically trained subjects are found to have enhanced subcortical representations of pitch, timbre, and timing. The effects of musical experience on subcortical auditory processing are pervasive and extend beyond music to the domains of language and emotion. The sensory malleability of the neural encoding of pitch, timbre, and timing can be affected by lifelong experience and short-term training. This conceptual framework and supporting data can be applied to consider sensory learning of speech and music through a hearing aid or cochlear implant. PMID:19673837
Fernández-Aranda, Fernando; Jiménez-Murcia, Susana; Santamaría, Juan J.; Gunnard, Katarina; Soto, Antonio; Kalapanidas, Elias; Bults, Richard G. A.; Davarakis, Costas; Ganchev, Todor; Granero, Roser; Konstantas, Dimitri; Kostoulas, Theodoros P.; Lam, Tony; Lucas, Mikkel; Masuet-Aumatell, Cristina; Moussa, Maher H.; Nielsen, Jeppe; Penelo, Eva
2012-01-01
Background: Previous review studies have suggested that computer games can serve as an alternative or additional form of treatment in several areas (schizophrenia, asthma or motor rehabilitation). Although several naturalistic studies have been conducted showing the usefulness of serious video games in the treatment of some abnormal behaviours, there is a lack of serious games specially designed for treating mental disorders. Aim: The purpose of our project was to develop and evaluate a serious video game designed to remediate attitudinal, behavioural and emotional processes of patients with impulse-related disorders. Method and results: The video game was created and developed within the European research project PlayMancer. It aims to prove potential capacity to change underlying attitudinal, behavioural and emotional processes of patients with impulse-related disorders. New interaction modes were provided by newly developed components, such as emotion recognition from speech, face and physiological reactions, while specific impulsive reactions were elicited. The video game uses biofeedback for helping patients to learn relaxation skills, acquire better self-control strategies and develop new emotional regulation strategies. In this article, we present a description of the video game used, rationale, user requirements, usability and preliminary data, in several mental disorders. PMID:22548300
... worker Consultant pharmacist Nutritionist Physical therapist Occupational therapist Speech and hearing specialist Psychiatrist Psychologist These professionals evaluate the older person’s medical, social, emotional, and other needs. ...
Code of Federal Regulations, 2013 CFR
2013-10-01
... diseases and conditions as orthopedic, visual, speech and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
Code of Federal Regulations, 2012 CFR
2012-01-01
... diseases and conditions as orthopedic; visual, speech, and hearing impairments; cerebral palsy; epilepsy; muscular dystrophy; multiple sclerosis; cancer; heart disease; diabetes; mental retardation; emotional...
Code of Federal Regulations, 2014 CFR
2014-01-01
... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
Code of Federal Regulations, 2013 CFR
2013-01-01
... diseases and conditions as orthopedic; visual, speech, and hearing impairments; cerebral palsy; epilepsy; muscular dystrophy; multiple sclerosis; cancer; heart disease; diabetes; mental retardation; emotional...
Code of Federal Regulations, 2013 CFR
2013-01-01
... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
Code of Federal Regulations, 2014 CFR
2014-10-01
... diseases and conditions as orthopedic, visual, speech and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
Code of Federal Regulations, 2012 CFR
2012-01-01
... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
Code of Federal Regulations, 2013 CFR
2013-01-01
... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
Code of Federal Regulations, 2014 CFR
2014-10-01
... orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, drug addiction, and...
Code of Federal Regulations, 2012 CFR
2012-10-01
... orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, drug addiction, and...
Code of Federal Regulations, 2013 CFR
2013-10-01
... orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, drug addiction, and...
Expressed Emotion Displayed by the Mothers of Inhibited and Uninhibited Preschool-Aged Children
ERIC Educational Resources Information Center
Raishevich, Natoshia; Kennedy, Susan J.; Rapee, Ronald M.
2010-01-01
In the current study, the Five Minute Speech Sample was used to assess the association between parent attitudes and children's behavioral inhibition in mothers of 120 behaviorally inhibited (BI) and 37 behaviorally uninhibited preschool-aged children. Mothers of BI children demonstrated significantly higher levels of emotional over-involvement…
ERIC Educational Resources Information Center
Schorr, Efrat A.
2006-01-01
The importance of early intervention for children with hearing loss has been demonstrated persuasively in areas including speech perception and production and spoken language. The present research shows that feelings of loneliness, a significant emotional outcome, are affected by the age at which children receive intervention with cochlear…
Computer-aided psychotherapy based on multimodal elicitation, estimation and regulation of emotion.
Cosić, Krešimir; Popović, Siniša; Horvat, Marko; Kukolja, Davor; Dropuljić, Branimir; Kovač, Bernard; Jakovljević, Miro
2013-09-01
Contemporary psychiatry is looking at affective sciences to understand human behavior, cognition and the mind in health and disease. Since it has been recognized that emotions have a pivotal role for the human mind, an ever increasing number of laboratories and research centers are interested in affective sciences, affective neuroscience, affective psychology and affective psychopathology. Therefore, this paper presents multidisciplinary research results of Laboratory for Interactive Simulation System at Faculty of Electrical Engineering and Computing, University of Zagreb in the stress resilience. Patient's distortion in emotional processing of multimodal input stimuli is predominantly consequence of his/her cognitive deficit which is result of their individual mental health disorders. These emotional distortions in patient's multimodal physiological, facial, acoustic, and linguistic features related to presented stimulation can be used as indicator of patient's mental illness. Real-time processing and analysis of patient's multimodal response related to annotated input stimuli is based on appropriate machine learning methods from computer science. Comprehensive longitudinal multimodal analysis of patient's emotion, mood, feelings, attention, motivation, decision-making, and working memory in synchronization with multimodal stimuli provides extremely valuable big database for data mining, machine learning and machine reasoning. Presented multimedia stimuli sequence includes personalized images, movies and sounds, as well as semantically congruent narratives. Simultaneously, with stimuli presentation patient provides subjective emotional ratings of presented stimuli in terms of subjective units of discomfort/distress, discrete emotions, or valence and arousal. These subjective emotional ratings of input stimuli and corresponding physiological, speech, and facial output features provides enough information for evaluation of patient's cognitive appraisal deficit. Aggregated real-time visualization of this information provides valuable assistance in patient mental state diagnostics enabling therapist deeper and broader insights into dynamics and progress of the psychotherapy.
Sokka, Laura; Huotilainen, Minna; Leinikka, Marianne; Korpela, Jussi; Henelius, Andreas; Alain, Claude; Müller, Kiti; Pakarinen, Satu
2014-12-01
Job burnout is a significant cause of work absenteeism. Evidence from behavioral studies and patient reports suggests that job burnout is associated with impairments of attention and decreased working capacity, and it has overlapping elements with depression, anxiety and sleep disturbances. Here, we examined the electrophysiological correlates of automatic sound change detection and involuntary attention allocation in job burnout using scalp recordings of event-related potentials (ERP). Volunteers with job burnout symptoms but without severe depression and anxiety disorders and their non-burnout controls were presented with natural speech sound stimuli (standard and nine deviants), as well as three rarely occurring speech sounds with strong emotional prosody. All stimuli elicited mismatch negativity (MMN) responses that were comparable in both groups. The groups differed with respect to the P3a, an ERP component reflecting involuntary shift of attention: job burnout group showed a shorter P3a latency in response to the emotionally negative stimulus, and a longer latency in response to the positive stimulus. Results indicate that in job burnout, automatic speech sound discrimination is intact, but there is an attention capture tendency that is faster for negative, and slower to positive information compared to that of controls. Copyright © 2014 Elsevier B.V. All rights reserved.
Acoustic resonance at the dawn of life: musical fundamentals of the psychoanalytic relationship.
Pickering, Judith
2015-11-01
This paper uses a case vignette to show how musical elements of speech are a crucial source of information regarding the patient's emotional states and associated memory systems that are activated at a given moment in the analytic field. There are specific psychoacoustic markers associated with different memory systems which indicate whether a patient is immersed in a state of creative intersubjective relatedness related to autobiographical memory, or has been triggered into a traumatic memory system. When a patient feels immersed in an atmosphere of intersubjective mutuality, dialogue features a rhythmical and tuneful form of speech featuring improvized reciprocal imitation, theme and variation. When the patient is catapulted into a traumatic memory system, speech becomes monotone and disjointed. Awareness of such acoustic features of the traumatic memory system helps to alert the analyst that such a shift has taken place informing appropriate responses and interventions. Communicative musicality (Malloch & Trevarthen 2009) originates in the earliest non-verbal vocal communication between infant and care-giver, states of primary intersubjectivity. Such musicality continues to be the primary vehicle for transmitting emotional meaning and for integrating right and left hemispheres. This enables communication that expresses emotional significance, personal value as well as conceptual reasoning. © 2015, The Society of Analytical Psychology.
Romero, Nuria; De Raedt, Rudi
2017-01-01
The present study aimed to clarify: 1) the presence of depression-related attention bias related to a social stressor, 2) its association with depression-related attention biases as measured under standard conditions, and 3) their association with impaired stress recovery in depression. A sample of 39 participants reporting a broad range of depression levels completed a standard eye-tracking paradigm in which they had to engage/disengage their gaze with/from emotional faces. Participants then underwent a stress induction (i.e., giving a speech), in which their eye movements to false emotional feedback were measured, and stress reactivity and recovery were assessed. Depression level was associated with longer times to engage/disengage attention with/from negative faces under standard conditions and with sustained attention to negative feedback during the speech. These depression-related biases were associated and mediated the association between depression level and self-reported stress recovery, predicting lower recovery from stress after giving the speech. PMID:28362826
28 CFR 41.31 - Handicapped person.
Code of Federal Regulations, 2012 CFR
2012-07-01
... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
28 CFR 41.31 - Handicapped person.
Code of Federal Regulations, 2013 CFR
2013-07-01
... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
28 CFR 41.31 - Handicapped person.
Code of Federal Regulations, 2014 CFR
2014-07-01
... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
On the role of crossmodal prediction in audiovisual emotion perception.
Jessen, Sarah; Kotz, Sonja A
2013-01-01
Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others' emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of cross-modal prediction. In emotion perception, as in most other settings, visual information precedes the auditory information. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, so far it has not been addressed in audiovisual emotion perception. Based on the current state of the art in (a) cross-modal prediction and (b) multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG) and magnetoencephalographic (MEG) studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow more reliable predicting of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 EEG response and the duration of visual emotional, but not non-emotional information. If the assumption that emotional content allows more reliable predicting can be corroborated in future studies, cross-modal prediction is a crucial factor in our understanding of multisensory emotion perception.
Barnes, Jonathan
2014-03-01
This paper focuses on an innovative intersection between education, health and arts. Taking a broad definition of health it examines some social and psychological well-being impacts of extended collaborations between a theatre company and children with communication difficulties. It seeks to test aspects of Fredrickson's(1) broaden-and-build theory of positive emotions in a primary school curriculum context. The researcher participated in a project called Speech Bubbles. The programme was devised by theatre practitioners and aimed at six- and seven-year-olds with difficulties in speech, language and communication. Sessions were observed, videoed and analysed for levels of child well-being using an established scale. In addition, responses regarding perceived improvements in speech, language and communication were gathered from school records and teachers, teaching assistants, practitioners and parents. Data were captured using still images and videos, children's recorded commentaries, conversations, written feedback and observation. Using grounded research methods, themes and categories arose directly from the collected data. Fluency, vocabulary, inventiveness and concentration were enhanced in the large majority of referred children. The research also found significant positive developments in motivation and confidence. Teachers and their assistants credited the drama intervention with notable improvements in attitude, behaviour and relationships over the year. Aspects of many children's psychological well-being also showed marked signs of progress when measured against original reasons for referral and normal expectations over a year. An unexpected outcome was evidence of heightened well-being of the teaching assistants involved. Findings compared well with expectations based upon Fredrickson's theory and also the theatre company's view that theatre-making promotes emotional awareness and empathy. Improvements in both children's well-being and communication were at least in part related to the sustained and playful emphases on the processes and practice of drama, clear values and an inclusive environment.
Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio
2014-01-01
Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686
2010-01-01
Background Language development is one of the most significant processes of early childhood development. Children with delayed speech development are more at risk of acquiring other cognitive, social-emotional, and school-related problems. Music therapy appears to facilitate speech development in children, even within a short period of time. The aim of this pilot study is to explore the effects of music therapy in children with delayed speech development. Methods A total of 18 children aged 3.5 to 6 years with delayed speech development took part in this observational study in which music therapy and no treatment were compared to demonstrate effectiveness. Individual music therapy was provided on an outpatient basis. An ABAB reversal design with alternations between music therapy and no treatment with an interval of approximately eight weeks between the blocks was chosen. Before and after each study period, a speech development test, a non-verbal intelligence test for children, and music therapy assessment scales were used to evaluate the speech development of the children. Results Compared to the baseline, we found a positive development in the study group after receiving music therapy. Both phonological capacity and the children's understanding of speech increased under treatment, as well as their cognitive structures, action patterns, and level of intelligence. Throughout the study period, developmental age converged with their biological age. Ratings according to the Nordoff-Robbins scales showed clinically significant changes in the children, namely in the areas of client-therapist relationship and communication. Conclusions This study suggests that music therapy may have a measurable effect on the speech development of children through the treatment's interactions with fundamental aspects of speech development, including the ability to form and maintain relationships and prosodic abilities. Thus, music therapy may provide a basic and supportive therapy for children with delayed speech development. Further studies should be conducted to investigate the mechanisms of these interactions in greater depth. Trial registration The trial is registered in the German clinical trials register; Trial-No.: DRKS00000343 PMID:20663139
Effects of human fatigue on speech signals
NASA Astrophysics Data System (ADS)
Stamoulis, Catherine
2004-05-01
Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.
ERIC Educational Resources Information Center
Lindsay, Geoff; Dockrell, Julie E.; Strand, Steve
2007-01-01
Background: The purpose of this study was to examine the stability of behavioural, emotional and social difficulties (BESD) in children with specific speech and language difficulties (SSLD), and the relationship between BESD and the language ability. Methods: A sample of children with SSLD were assessed for BESD at ages 8, 10 and 12 years by both…
Altered time course of amygdala activation during speech anticipation in social anxiety disorder.
Davies, Carolyn D; Young, Katherine; Torre, Jared B; Burklund, Lisa J; Goldin, Philippe R; Brown, Lily A; Niles, Andrea N; Lieberman, Matthew D; Craske, Michelle G
2017-02-01
Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Participants (SAD n=58; HC n=16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task. Repeated measures multi-level modeling analyses were used to examine group differences in time course activity during speech vs. control anticipation for regions of interest, including bilateral amygdala, insula, ventral striatum, and dorsal anterior cingulate cortex. The time course of amygdala activity was more prolonged and less variable throughout speech anticipation in SAD participants compared to HCs, whereas the overall magnitude of amygdala response did not differ between groups. Magnitude and time course of activity was largely similar between groups across other regions of interest. Analyses were restricted to regions of interest and task order was the same across participants due to the nature of deception instructions. Sustained amygdala time course during anticipation may uniquely reflect heightened detection of threat or deficits in emotion regulation in socially anxious individuals. Findings highlight the importance of examining temporal dynamics of amygdala responding. Copyright © 2016 Elsevier B.V. All rights reserved.
Altered time course of amygdala activation during speech anticipation in social anxiety disorder
Davies, Carolyn D.; Young, Katherine; Torre, Jared B.; Burklund, Lisa J.; Goldin, Philippe R.; Brown, Lily A.; Niles, Andrea N.; Lieberman, Matthew D.; Craske, Michelle G.
2016-01-01
Background Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task. Repeated measures multi-level modeling analyses were used to examine group differences in time course activity during speech vs. control anticipation for regions of interest, including bilateral amygdala, insula, ventral striatum, and dorsal anterior cingulate cortex. Results The time course of amygdala activity was more prolonged and less variable throughout speech anticipation in SAD participants compared to HCs, whereas the overall magnitude of amygdala response did not differ between groups. Magnitude and time course of activity was largely similar between groups across other regions of interest. Limitations Analyses were restricted to regions of interest and task order was the same across participants due to the nature of deception instructions. Conclusions Sustained amygdala time course during anticipation may uniquely reflect heightened detection of threat or deficits in emotion regulation in socially anxious individuals. Findings highlight the importance of examining temporal dynamics of amygdala responding. PMID:27870942
NASA Astrophysics Data System (ADS)
Izdebski, Krzysztof; Jarosz, Paweł; Usydus, Ireneusz
2017-02-01
Ventilation, speech and singing must use facial musculature to complete these motor tasks and these tasks are fueled by the air we inhale. This motor process requires increase in the blood flow as the muscles contract and relax, therefore skin surface temperature changes are expected. Hence, we used thermography to image these effects. The system used was the thermography camera model FLIR X6580sc with a chilled detector (FLIR Systems Advanced Thermal Solutions, 27700 SW Parkway Ave Wilsonville, OR 97070, USA). To assure improved imaging, the room temperature was air-conditioned to +18° C. All images were recoded at the speed of 30 f/s. Acquired data were analyzed with FLIR Research IR Max Version 4 software and software filters. In this preliminary study a male subject was imaged from frontal and lateral views simultaneously while he performed normal resting ventilation, speech and song. The lateral image was captured in a stainless steel mirror. Results showed different levels of heat flow in the facial musculature as a function of these three tasks. Also, we were able to capture the exalted air jet directionality. The breathing jet was discharged in horizontal direction, speaking voice jet was discharged downwards while singing jet went upward. We interpreted these jet directions as representing different gas content of air expired during these different tasks, with speech having less oxygen than singing. Further studies examining gas exchange during various forms of speech and song and emotional states are warranted.
Spontaneous regulation of emotions in preschool children who stutter: preliminary findings.
Johnson, Kia N; Walden, Tedra A; Conture, Edward G; Karrass, Jan
2010-12-01
Emotional regulation of preschool children who stutter (CWS) and children who do not stutter (CWNS) was assessed through use of a disappointing gift (DG) procedure (P. M. Cole, 1986; C. Saarni, 1984, 1992). Participants consisted of 16 CWS and CWNS (11 boys and 5 girls in each talker group) who were 3 to 5 years of age. After assessing each child's knowledge of display rules about socially appropriate expression of emotions, the authors asked the children to participate in a DG procedure. The children received a desirable gift preceding the first free-play task and a disappointing gift preceding a second free-play task. Dependent variables consisted of participants' positive and negative expressive nonverbal behaviors exhibited during receipt of a desirable gift and disappointing gift as well as conversational speech disfluencies exhibited following receipt of each gift. Findings indicated that CWS and CWNS exhibited no significant differences in amount of positive emotional expressions after receiving the desired gift; however, CWS--when compared with CWNS--exhibited more negative emotional expressions after receiving the undesirable gift. Furthermore, CWS were more disfluent after receiving the desired gift than after receiving the disappointing gift. Ancillary findings also indicated that CWS and CWNS had equivalent knowledge of display rules. Findings suggest that efforts to concurrently regulate emotional behaviors and that speech disfluencies may be problematic for preschool-age CWS.
How Conceptual Frameworks Influence Discovery and Depictions of Emotions in Clinical Relationships
ERIC Educational Resources Information Center
Duchan, Judith Felson
2011-01-01
Although emotions are often seen as key to maintaining rapport between speech-language pathologists and their clients, they are often neglected in the research and clinical literature. This neglect, it is argued here, comes in part from the inadequacies of prevailing conceptual frameworks used to govern practices. I aim to show how six such…
ERIC Educational Resources Information Center
Cartwright, Kim L.; Bitsakou, Paraskevi; Daley, David; Gramzow, Richard H.; Psychogiou, Lamprini; Simonoff, Emily; Thompson, Margaret J.; Sonuga-Barke, Edmund J. S.
2011-01-01
Objective: We used multi-level modelling of sibling-pair data to disentangle the influence of proband-specific and more general family influences on maternal expressed emotion (MEE) toward children and adolescents with attention-deficit/hyperactivity disorder (ADHD). Method: MEE was measured using the Five Minute Speech Sample (FMSS) for 60…
Judgment of musical emotions after cochlear implantation in adults with progressive deafness
Ambert-Dahan, Emmanuèle; Giraud, Anne-Lise; Sterkers, Olivier; Samson, Séverine
2015-01-01
While cochlear implantation is rather successful in restoring speech comprehension in quiet environments (Nimmons et al., 2008), other auditory tasks, such as music perception, can remain challenging for implant users. Here, we tested how patients who had received a cochlear implant (CI) after post-lingual progressive deafness perceive emotions in music. Thirteen adult CI recipients with good verbal comprehension (dissyllabic words ≥70%) and 13 normal hearing participants matched for age, gender, and education listened to 40 short musical excerpts that selectively expressed fear, happiness, sadness, and peacefulness ( Vieillard et al., 2008). The participants were asked to rate (on a 0–100 scale) how much the musical stimuli expressed these four cardinal emotions, and to judge their emotional valence (unpleasant–pleasant) and arousal (relaxing–stimulating). Although CI users performed above chance level, their emotional judgments (mean correctness scores) were generally impaired for happy, scary, and sad, but not for peaceful excerpts. CI users also demonstrated deficits in perceiving arousal of musical excerpts, whereas rating of valence remained unaffected. The current findings indicate that judgments of emotional categories and dimensions of musical excerpts are not uniformly impaired after cochlear implantation. These results are discussed in relation to the relatively spared abilities of CI users in perceiving temporal (rhythm and metric) as compared to spectral (pitch and timbre) musical dimensions, which might benefit the processing of musical emotions (Cooper et al., 2008). PMID:25814961
Anticipatory stress influences decision making under explicit risk conditions.
Starcke, Katrin; Wolf, Oliver T; Markowitsch, Hans J; Brand, Matthias
2008-12-01
Recent research has suggested that stress may affect memory, executive functioning, and decision making on the basis of emotional feedback processing. The current study examined whether anticipatory stress affects decision making measured with the Game of Dice Task (GDT), a decision-making task with explicit and stable rules that taps both executive functioning and feedback learning. The authors induced stress in 20 participants by having them anticipate giving a public speech and also examined 20 comparison subjects. The authors assessed the level of stress with questionnaires and endocrine markers (salivary cortisol and alpha-amylase), both revealing that speech anticipation led to increased stress. Results of the GDT showed that participants under stress scored significantly lower than the comparison group and that GDT performance was negatively correlated with the increase of cortisol. Our results indicate that stress can lead to disadvantageous decision making even when explicit and stable information about outcome contingencies is provided.
Childhood Stuttering – Where are we and Where are we going?
Smith, Anne; Weber, Christine
2017-01-01
Remarkable progress has been made over the past two decades in expanding our understanding of the behavioral, peripheral physiological, and central neurophysiological bases of stuttering in early childhood. It is clear that stuttering is a neurodevelopmental disorder characterized by atypical development of speech motor planning and execution networks. The speech motor system must interact in complex ways with neural systems mediating language, other cognitive, and emotional processes. During the time window when stuttering typically appears and follows its path to either recovery or persistence, all of these neurobehavioral systems are undergoing rapid and dramatic developmental changes. We summarize our current understanding of the various developmental trajectories relevant for the understanding of stuttering in early childhood. We also present theoretical and experimental approaches that we believe will be optimal for even more rapid progress toward developing better and more targeted treatment for stuttering in the preschool children who are more likely to persist in stuttering. PMID:27701705
Generating and Describing Affective Eye Behaviors
NASA Astrophysics Data System (ADS)
Mao, Xia; Li, Zheng
The manner of a person's eye movement conveys much about nonverbal information and emotional intent beyond speech. This paper describes work on expressing emotion through eye behaviors in virtual agents based on the parameters selected from the AU-Coded facial expression database and real-time eye movement data (pupil size, blink rate and saccade). A rule-based approach to generate primary (joyful, sad, angry, afraid, disgusted and surprise) and intermediate emotions (emotions that can be represented as the mixture of two primary emotions) utilized the MPEG4 FAPs (facial animation parameters) is introduced. Meanwhile, based on our research, a scripting tool, named EEMML (Emotional Eye Movement Markup Language) that enables authors to describe and generate emotional eye movement of virtual agents, is proposed.
Multilingual vocal emotion recognition and classification using back propagation neural network
NASA Astrophysics Data System (ADS)
Kayal, Apoorva J.; Nirmal, Jagannath
2016-03-01
This work implements classification of different emotions in different languages using Artificial Neural Networks (ANN). Mel Frequency Cepstral Coefficients (MFCC) and Short Term Energy (STE) have been considered for creation of feature set. An emotional speech corpus consisting of 30 acted utterances per emotion has been developed. The emotions portrayed in this work are Anger, Joy and Neutral in each of English, Marathi and Hindi languages. Different configurations of Artificial Neural Networks have been employed for classification purposes. The performance of the classifiers has been evaluated by False Negative Rate (FNR), False Positive Rate (FPR), True Positive Rate (TPR) and True Negative Rate (TNR).
[Restoration of speech function in oncological patients with maxillary defects].
Matiakin, E G; Chuchkov, V M; Akhundov, A A; Azizian, R I; Romanov, I S; Chuchkov, M V; Agapov, V V
2009-01-01
Speech quality was evaluated in 188 patients with acquired maxillary defects. Prosthetic treatment of 29 patients was preceded by pharmacopsychotherapy. Sixty three patients had lessons with a logopedist and 66 practiced self-tuition based on the specially developed test. Thirty patients were examined for the quality of speech without preliminary preparation. Speech quality was assessed by auditory and spectral analysis. The main forms of impaired speech quality in the patients with maxillary defects were marked rhinophonia and impaired articulation. The proposed analytical tests were based on a combination of "difficult" vowels and consonants. The use of a removable prostheses with an obturator failed to correct the affected speech function but created prerequisites for the formation of the correct speech stereotype. Results of the study suggest the relationship between the quality of speech in subjects with maxillary defects and their intellectual faculties as well as the desire to overcome this drawback. The proposed tests are designed to activate the neuromuscular apparatus responsible for the generation of the speech. Lessons with a speech therapist give a powerful emotional incentive to the patients and promote their efforts toward restoration of speaking ability. Pharmacopsychotherapy and self-control are another efficacious tools for the improvement of speech quality in patients with maxillary defects.
Liu, Xiaoluan; Xu, Yi
2015-01-01
This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role.
Liu, Xiaoluan; Xu, Yi
2015-01-01
This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role. PMID:26217252
The Neural Mechanisms of Meditative Practices: Novel Approaches for Healthy Aging.
Acevedo, Bianca P; Pospos, Sarah; Lavretsky, Helen
2016-01-01
Meditation has been shown to have physical, cognitive, and psychological health benefits that can be used to promote healthy aging. However, the common and specific mechanisms of response remain elusive due to the diverse nature of mind-body practices. In this review, we aim to compare the neural circuits implicated in focused-attention meditative practices that focus on present-moment awareness to those involved in active-type meditative practices (e.g., yoga) that combine movement, including chanting, with breath practices and meditation. Recent meta-analyses and individual studies demonstrated common brain effects for attention-based meditative practices and active-based meditations in areas involved in reward processing and learning, attention and memory, awareness and sensory integration, and self-referential processing and emotional control, while deactivation was seen in the amygdala, an area implicated in emotion processing. Unique effects for mindfulness practices were found in brain regions involved in body awareness, attention, and the integration of emotion and sensory processing. Effects specific to active-based meditations appeared in brain areas involved in self-control, social cognition, language, speech, tactile stimulation, sensorimotor integration, and motor function. This review suggests that mind-body practices can target different brain systems that are involved in the regulation of attention, emotional control, mood, and executive cognition that can be used to treat or prevent mood and cognitive disorders of aging, such as depression and caregiver stress, or serve as "brain fitness" exercise. Benefits may include improving brain functional connectivity in brain systems that generally degenerate with Alzheimer's disease, Parkinson's disease, and other aging-related diseases.
Pilot Workload and Speech Analysis: A Preliminary Investigation
NASA Technical Reports Server (NTRS)
Bittner, Rachel M.; Begault, Durand R.; Christopher, Bonny R.
2013-01-01
Prior research has questioned the effectiveness of speech analysis to measure the stress, workload, truthfulness, or emotional state of a talker. The question remains regarding the utility of speech analysis for restricted vocabularies such as those used in aviation communications. A part-task experiment was conducted in which participants performed Air Traffic Control read-backs in different workload environments. Participant's subjective workload and the speech qualities of fundamental frequency (F0) and articulation rate were evaluated. A significant increase in subjective workload rating was found for high workload segments. F0 was found to be significantly higher during high workload while articulation rates were found to be significantly slower. No correlation was found to exist between subjective workload and F0 or articulation rate.
Toyomura, Akira; Fujii, Tetsunoshin; Yokosawa, Koichi; Kuriki, Shinya
2018-03-15
Affective states, such as anticipatory anxiety, critically influence speech communication behavior in adults who stutter. However, there is currently little evidence regarding the involvement of the limbic system in speech disfluency during interpersonal communication. We designed this neuroimaging study and experimental procedure to sample neural activity during interpersonal communication between human participants, and to investigate the relationship between the amygdala activity and speech disfluency. Participants were required to engage in live communication with a stranger of the opposite sex in the MRI scanner environment. In the gaze condition, the stranger gazed at the participant without speaking, while in the live conversation condition, the stranger asked questions that the participant was required to answer. The stranger continued to gaze silently at the participant while the participant answered. Adults who stutter reported significantly higher discomfort than fluent controls during the experiment. Activity in the right amygdala, a key anatomical region in the limbic system involved in emotion, was significantly correlated with stuttering occurrences in adults who stutter. Right amygdala activity from pooled data of all participants also showed a significant correlation with discomfort level during the experiment. Activity in the prefrontal cortex, which forms emotion regulation neural circuitry with the amygdala, was decreased in adults who stutter than in fluent controls. This is the first study to demonstrate that amygdala activity during interpersonal communication is involved in disfluent speech in adults who stutter. Copyright © 2018 IBRO. Published by Elsevier Ltd. All rights reserved.
Waaramaa, Teija; Leisiö, Timo
2013-01-01
The present study focused on voice quality and the perception of the basic emotions from speech samples in cross-cultural conditions. It was examined whether voice quality, cultural, or language background, age, or gender were related to the identification of the emotions. Professional actors (n2) and actresses (n2) produced non-sense sentences (n32) and protracted vowels (n8) expressing the six basic emotions, interest, and a neutral emotional state. The impact of musical interests on the ability to distinguish between emotions or valence (on an axis positivity – neutrality – negativity) from voice samples was studied. Listening tests were conducted on location in five countries: Estonia, Finland, Russia, Sweden, and the USA with 50 randomly chosen participants (25 males and 25 females) in each country. The participants (total N = 250) completed a questionnaire eliciting their background information and musical interests. The responses in the listening test and the questionnaires were statistically analyzed. Voice quality parameters and the share of the emotions and valence identified correlated significantly with each other for both genders. The percentage of emotions and valence identified was clearly above the chance level in each of the five countries studied, however, the countries differed significantly from each other for the identified emotions and the gender of the speaker. The samples produced by females were identified significantly better than those produced by males. Listener's age was a significant variable. Only minor gender differences were found for the identification. Perceptual confusion in the listening test between emotions seemed to be dependent on their similar voice production types. Musical interests tended to have a positive effect on the identification of the emotions. The results also suggest that identifying emotions from speech samples may be easier for those listeners who share a similar language or cultural background with the speaker. PMID:23801972
Unpacking the psychological weight of weight stigma: A rejection-expectation pathway
Blodorn, Alison; Major, Brenda; Hunger, Jeffrey; Miller, Carol
2015-01-01
The present research tested the hypothesis that the negative effects of weight stigma among higher body-weight individuals are mediated by expectations of social rejection. Women and men who varied in objective body-weight (body mass index; BMI) gave a speech describing why they would make a good date. Half believed that a potential dating partner would see a videotape of their speech (weight seen) and half believed that a potential dating partner would listen to an audiotape of their speech (weight unseen). Among women, but not men, higher body-weight predicted increased expectations of social rejection, decreased executive control resources, decreased self-esteem, increased self-conscious emotions and behavioral displays of self-consciousness when weight was seen but not when weight was unseen. As predicted, higher body-weight women reported increased expectations of social rejection when weight was seen (versus unseen), which in turn predicted decreased self-esteem, increased self-conscious emotions, and increased stress. In contrast, lower body-weight women reported decreased expectations of social rejection when weight was seen (versus unseen), which in turn predicted increased self-esteem, decreased self-conscious emotions, and decreased stress. Men’s responses were largely unaffected by body-weight or visibility, suggesting that a dating context may not be identity threatening for higher body-weight men. Overall, the present research illuminates a rejection-expectation pathway by which weight stigma undermines higher body-weight women’s health. PMID:26752792
Vivona, Jeanine M
2013-12-01
Like psychoanalysis, poetry is possible because of the nature of verbal language, particularly its potentials to evoke the sensations of lived experience. These potentials are vestiges of the personal relational context in which language is learned, without which there would be no poetry and no psychoanalysis. Such a view of language infuses psychoanalytic writings on poetry, yet has not been fully elaborated. To further that elaboration, a poem by Billy Collins is presented to illustrate the sensorial and imagistic potentials of words, after which the interpersonal processes of language development are explored in an attempt to elucidate the original nature of words as imbued with personal meaning, embodied resonance, and emotion. This view of language and the verbal form allows a fuller understanding of the therapeutic processes of speech and conversation at the heart of psychoanalysis, including the relational potentials of speech between present individuals, which are beyond the reach of poetry. In one sense, the work of the analyst is to create language that mobilizes the experiential, memorial, and relational potentials of words, and in so doing to make a poet out of the patient so that she too can create such language.
Bilingualism and Children's Use of Paralinguistic Cues to Interpret Emotion in Speech
ERIC Educational Resources Information Center
Yow, W. Quin; Markman, Ellen M.
2011-01-01
Preschoolers tend to rely on what speakers say rather than how they sound when interpreting a speaker's emotion while adults rely instead on tone of voice. However, children who have a greater need to attend to speakers' communicative requirements, such as bilingual children, may be more adept in using paralinguistic cues (e.g. tone of voice) when…
ERIC Educational Resources Information Center
Dalton, C. J.
2011-01-01
Mild or moderate hearing loss (MMHL) is a communication disability that impacts speech and language development and academic performance. Students with MMHL also have threats to their social-emotional well-being and self-identity formation, and are at risk for psychosocial deficits related to cognitive fatigue, isolation, and bullying. While the…
ERIC Educational Resources Information Center
Martinez-Castilla, Pastora; Peppe, Susan
2008-01-01
This study aimed to find out what intonation features reliably represent the emotions of "liking" as opposed to "disliking" in the Spanish language, with a view to designing a prosody assessment procedure for use with children with speech and language disorders. 18 intonationally different prosodic realisations (tokens) of one word (limon) were…
Lions, tigers, and bears, oh sh!t: Semantics versus tabooness in speech production.
White, Katherine K; Abrams, Lise; Koehler, Sarah M; Collins, Richard J
2017-04-01
While both semantic and highly emotional (i.e., taboo) words can interfere with speech production, different theoretical mechanisms have been proposed to explain why interference occurs. Two experiments investigated these theoretical approaches by comparing the magnitude of these two types of interference and the stages at which they occur during picture naming. Participants named target pictures superimposed with semantic, taboo, or unrelated distractor words that were presented at three different stimulus-onset asynchronies (SOA): -150 ms, 0 ms, or +150 ms. In addition, the duration of distractor presentation was manipulated across experiments, with distractors appearing for the duration of the picture (Experiment 1) or for 350 ms (Experiment 2). Taboo distractors interfered more than semantic distractors, i.e., slowed target naming times, at all SOAs. While distractor duration had no effect on type of interference at -150 or 0 SOAs, briefly presented distractors eliminated semantic interference but not taboo interference at +150 SOA. Discussion focuses on how existing speech production theories can explain interference from emotional distractors and the unique role that attention may play in taboo interference.
Low-Arousal Speech Noise Improves Performance in N-Back Task: An ERP Study
Zhang, Dandan; Jin, Yi; Luo, Yuejia
2013-01-01
The relationship between noise and human performance is a crucial topic in ergonomic research. However, the brain dynamics of the emotional arousal effects of background noises are still unclear. The current study employed meaningless speech noises in the n-back working memory task to explore the changes of event-related potentials (ERPs) elicited by the noises with low arousal level vs. high arousal level. We found that the memory performance in low arousal condition were improved compared with the silent and the high arousal conditions; participants responded more quickly and had larger P2 and P3 amplitudes in low arousal condition while the performance and ERP components showed no significant difference between high arousal and silent conditions. These findings suggested that the emotional arousal dimension of background noises had a significant influence on human working memory performance, and that this effect was independent of the acoustic characteristics of noises (e.g., intensity) and the meaning of speech materials. The current findings improve our understanding of background noise effects on human performance and lay the ground for the investigation of patients with attention deficits. PMID:24204607
A Joint Prosodic Origin of Language and Music
Brown, Steven
2017-01-01
Vocal theories of the origin of language rarely make a case for the precursor functions that underlay the evolution of speech. The vocal expression of emotion is unquestionably the best candidate for such a precursor, although most evolutionary models of both language and speech ignore emotion and prosody altogether. I present here a model for a joint prosodic precursor of language and music in which ritualized group-level vocalizations served as the ancestral state. This precursor combined not only affective and intonational aspects of prosody, but also holistic and combinatorial mechanisms of phrase generation. From this common stage, there was a bifurcation to form language and music as separate, though homologous, specializations. This separation of language and music was accompanied by their (re)unification in songs with words. PMID:29163276
Cross-modal metaphorical mapping of spoken emotion words onto vertical space.
Montoro, Pedro R; Contreras, María José; Elosúa, María Rosa; Marmolejo-Ramos, Fernando
2015-01-01
From the field of embodied cognition, previous studies have reported evidence of metaphorical mapping of emotion concepts onto a vertical spatial axis. Most of the work on this topic has used visual words as the typical experimental stimuli. However, to our knowledge, no previous study has examined the association between affect and vertical space using a cross-modal procedure. The current research is a first step toward the study of the metaphorical mapping of emotions onto vertical space by means of an auditory to visual cross-modal paradigm. In the present study, we examined whether auditory words with an emotional valence can interact with the vertical visual space according to a 'positive-up/negative-down' embodied metaphor. The general method consisted in the presentation of a spoken word denoting a positive/negative emotion prior to the spatial localization of a visual target in an upper or lower position. In Experiment 1, the spoken words were passively heard by the participants and no reliable interaction between emotion concepts and bodily simulated space was found. In contrast, Experiment 2 required more active listening of the auditory stimuli. A metaphorical mapping of affect and space was evident but limited to the participants engaged in an emotion-focused task. Our results suggest that the association of affective valence and vertical space is not activated automatically during speech processing since an explicit semantic and/or emotional evaluation of the emotionally valenced stimuli was necessary to obtain an embodied effect. The results are discussed within the framework of the embodiment hypothesis.
Cross-modal metaphorical mapping of spoken emotion words onto vertical space
Montoro, Pedro R.; Contreras, María José; Elosúa, María Rosa; Marmolejo-Ramos, Fernando
2015-01-01
From the field of embodied cognition, previous studies have reported evidence of metaphorical mapping of emotion concepts onto a vertical spatial axis. Most of the work on this topic has used visual words as the typical experimental stimuli. However, to our knowledge, no previous study has examined the association between affect and vertical space using a cross-modal procedure. The current research is a first step toward the study of the metaphorical mapping of emotions onto vertical space by means of an auditory to visual cross-modal paradigm. In the present study, we examined whether auditory words with an emotional valence can interact with the vertical visual space according to a ‘positive-up/negative-down’ embodied metaphor. The general method consisted in the presentation of a spoken word denoting a positive/negative emotion prior to the spatial localization of a visual target in an upper or lower position. In Experiment 1, the spoken words were passively heard by the participants and no reliable interaction between emotion concepts and bodily simulated space was found. In contrast, Experiment 2 required more active listening of the auditory stimuli. A metaphorical mapping of affect and space was evident but limited to the participants engaged in an emotion-focused task. Our results suggest that the association of affective valence and vertical space is not activated automatically during speech processing since an explicit semantic and/or emotional evaluation of the emotionally valenced stimuli was necessary to obtain an embodied effect. The results are discussed within the framework of the embodiment hypothesis. PMID:26322007
A Shared Neural Substrate for Mentalizing and the Affective Component of Sentence Comprehension
Hervé, Pierre-Yves; Razafimandimby, Annick; Jobard, Gaël; Tzourio-Mazoyer, Nathalie
2013-01-01
Using event-related fMRI in a sample of 42 healthy participants, we compared the cerebral activity maps obtained when classifying spoken sentences based on the mental content of the main character (belief, deception or empathy) or on the emotional tonality of the sentence (happiness, anger or sadness). To control for the effects of different syntactic constructions (such as embedded clauses in belief sentences), we subtracted from each map the BOLD activations obtained during plausibility judgments on structurally matching sentences, devoid of emotions or ToM. The obtained theory of mind (ToM) and emotional speech comprehension networks overlapped in the bilateral temporo-parietal junction, posterior cingulate cortex, right anterior temporal lobe, dorsomedial prefrontal cortex and in the left inferior frontal sulcus. These regions form a ToM network, which contributes to the emotional component of spoken sentence comprehension. Compared with the ToM task, in which the sentences were enounced on a neutral tone, the emotional sentence classification task, in which the sentences were play-acted, was associated with a greater activity in the bilateral superior temporal sulcus, in line with the presence of emotional prosody. Besides, the ventromedial prefrontal cortex was more active during emotional than ToM sentence processing. This region may link mental state representations with verbal and prosodic emotional cues. Compared with emotional sentence classification, ToM was associated with greater activity in the caudate nucleus, paracingulate cortex, and superior frontal and parietal regions, in line with behavioral data showing that ToM sentence comprehension was a more demanding task. PMID:23342148
Parry, Elizabeth; Nath, Selina; Kallitsoglou, Angeliki; Russell, Ginny
2017-01-01
This longitudinal study examined whether mothers’ and fathers’ depressive symptoms predict, independently and interactively, children’s emotional and behavioural problems. It also examined bi-directional associations between parents’ expressed emotion constituents (parents’ child-directed positive and critical comments) and children’s emotional and behavioural problems. At time 1, the sample consisted of 160 families in which 50 mothers and 40 fathers had depression according to the Structured Clinical Interview for DSM-IV. Children’s mean age at Time 1 was 3.9 years (SD = 0.8). Families (n = 106) were followed up approximately 16 months later (Time 2). Expressed emotion constituents were assessed using the Preschool Five Minute Speech Sample. In total, 144 mothers and 158 fathers at Time 1 and 93 mothers and 105 fathers at Time 2 provided speech samples. Fathers’ depressive symptoms were concurrently associated with more child emotional problems when mothers had higher levels of depressive symptoms. When controlling for important confounders (children’s gender, baseline problems, mothers’ depressive symptoms and parents’ education and age), fathers’ depressive symptoms independently predicted higher levels of emotional and behavioural problems in their children over time. There was limited evidence for a bi-directional relationship between fathers’ positive comments and change in children’s behavioural problems over time. Unexpectedly, there were no bi-directional associations between parents’ critical comments and children’s outcomes. We conclude that the study provides evidence to support a whole family approach to prevention and intervention strategies for children’s mental health and parental depression. PMID:29045440
Aesthetic and Emotional Effects of Meter and Rhyme in Poetry
Obermeier, Christian; Menninghaus, Winfried; von Koppenfels, Martin; Raettig, Tim; Schmidt-Kassow, Maren; Otterbein, Sascha; Kotz, Sonja A.
2013-01-01
Metrical patterning and rhyme are frequently employed in poetry but also in infant-directed speech, play, rites, and festive events. Drawing on four line-stanzas from nineteenth and twentieth German poetry that feature end rhyme and regular meter, the present study tested the hypothesis that meter and rhyme have an impact on aesthetic liking, emotional involvement, and affective valence attributions. Hypotheses that postulate such effects have been advocated ever since ancient rhetoric and poetics, yet they have barely been empirically tested. More recently, in the field of cognitive poetics, these traditional assumptions have been readopted into a general cognitive framework. In the present experiment, we tested the influence of meter and rhyme as well as their interaction with lexicality in the aesthetic and emotional perception of poetry. Participants listened to stanzas that were systematically modified with regard to meter and rhyme and rated them. Both rhyme and regular meter led to enhanced aesthetic appreciation, higher intensity in processing, and more positively perceived and felt emotions, with the latter finding being mediated by lexicality. Together these findings clearly show that both features significantly contribute to the aesthetic and emotional perception of poetry and thus confirm assumptions about their impact put forward by cognitive poetics. The present results are explained within the theoretical framework of cognitive fluency, which links structural features of poetry with aesthetic and emotional appraisal. PMID:23386837
Human neuroethology of emotion.
Ploog, D
1989-01-01
1. Based on ethological theory, the question of what is the difference between human and nonhuman primate emotionality is investigated. 2. The anatomical basis for this difference is the greater number of neurons in the anterior thalamic nuclei in humans than in monkeys and apes. This may represent an increased differentiation of the limbic message being sent to the cortex. 3. Only humans can report about experiences and subjective feelings in certain motivational states. The two most general states are wakefulness and sleep. The subjective aspect of (desynchronized) sleep is dreaming. The causal relationship between dreaming and certain lower brain stem mechanisms is analysed. 4. Whereas the motor system is usually blocked during desynchronized sleep, there are individuals who voice their emotions and speak while sleeping. As there are essential differences in the substrates for the voluntary control of the voice in the human and nonhuman primates there are essential differences in the voluntary control of emotions. 5. Similar to the motor matching theory of speech perception a motor matching process of affect perception is suggested. 6. The evolutionary change in the human motivational system is thought to be one of several prerequisites for the evolution of language.
Effect of Parkinson Disease on Emotion Perception Using the Persian Affective Voices Test.
Saffarian, Arezoo; Shavaki, Yunes Amiri; Shahidi, Gholam Ali; Jafari, Zahra
2018-05-04
Emotion perception plays a major role in proper communication with people in different social interactions. Nonverbal affect bursts can be used to evaluate vocal emotion perception. The present study was a preliminary step to establishing the psychometric properties of the Persian version of the Montreal Affective Voices (MAV) test, as well as to investigate the effect of Parkinson disease (PD) on vocal emotion perception. The short, emotional sound made by pronouncing the vowel "a" in Persian was recorded by 22 actors and actresses to develop the Persian version of the MAV, the Persian Affective Voices (PAV), for emotions of happiness, sadness, pleasure, pain, anger, disgust, fear, surprise, and neutrality. The results of the recordings of five of the actresses and five of the actors who obtained the highest score were used to generate the test. For convergent validity assessment, the correlation between the PAV and a speech prosody comprehension test was examined using a gender- and age-matched control group. To investigate the effect of the PD on emotion perception, the PAV test was performed on 28 patients with mild PD between ages 50 and 70 years. The PAV showed a high internal consistency (Cronbach's α = 0.80). A significant positive correlation was observed between the PAV and the speech prosody comprehension test. The test-retest reliability also showed the high repeatability of the PAV (intraclass correlation coefficient = 0.815, P ≤ 0.001). A significant difference was observed between the patients with PD and the controls in all subtests. The PAV test is a useful psychometric tool for examining vocal emotion perception that can be used in both behavioral and neuroimaging studies. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Dual diathesis-stressor model of emotional and linguistic contributions to developmental stuttering.
Walden, Tedra A; Frankel, Carl B; Buhr, Anthony P; Johnson, Kia N; Conture, Edward G; Karrass, Jan M
2012-05-01
This study assessed emotional and speech-language contributions to childhood stuttering. A dual diathesis-stressor framework guided this study, in which both linguistic requirements and skills, and emotion and its regulation, are hypothesized to contribute to stuttering. The language diathesis consists of expressive and receptive language skills. The emotion diathesis consists of proclivities to emotional reactivity and regulation of emotion, and the emotion stressor consists of experimentally manipulated emotional inductions prior to narrative speaking tasks. Preschool-age children who do and do not stutter were exposed to three emotion-producing overheard conversations-neutral, positive, and angry. Emotion and emotion-regulatory behaviors were coded while participants listened to each conversation and while telling a story after each overheard conversation. Instances of stuttering during each story were counted. Although there was no main effect of conversation type, results indicated that stuttering in preschool-age children is influenced by emotion and language diatheses, as well as coping strategies and situational emotional stressors. Findings support the dual diathesis-stressor model of stuttering.
Dual Diathesis-Stressor Model of Emotional and Linguistic Contributions to Developmental Stuttering
Frankel, Carl B.; Buhr, Anthony P.; Johnson, Kia N.; Conture, Edward G.; Karrass, Jan M.
2013-01-01
This study assessed emotional and speech-language contributions to childhood stuttering. A dual diathesis-stressor framework guided this study, in which both linguistic requirements and skills, and emotion and its regulation, are hypothesized to contribute to stuttering. The language diathesis consists of expressive and receptive language skills. The emotion diathesis consists of proclivities to emotional reactivity and regulation of emotion, and the emotion stressor consists of experimentally manipulated emotional inductions prior to narrative speaking tasks. Preschool-age children who do and do not stutter were exposed to three emotion-producing overheard conversations—neutral, positive, and angry. Emotion and emotion-regulatory behaviors were coded while participants listened to each conversation and while telling a story after each overheard conversation. Instances of stuttering during each story were counted. Although there was no main effect of conversation type, results indicated that stuttering in preschool-age children is influenced by emotion and language diatheses, as well as coping strategies and situational emotional stressors. Findings support the dual diathesis-stressor model of stuttering. PMID:22016200
Loss of regional accent after damage to the speech production network.
Berthier, Marcelo L; Dávila, Guadalupe; Moreno-Torres, Ignacio; Beltrán-Corbellini, Álvaro; Santana-Moreno, Daniel; Roé-Vellvé, Núria; Thurnhofer-Hemsi, Karl; Torres-Prioris, María José; Massone, María Ignacia; Ruiz-Cruces, Rafael
2015-01-01
Lesion-symptom mapping studies reveal that selective damage to one or more components of the speech production network can be associated with foreign accent syndrome, changes in regional accent (e.g., from Parisian accent to Alsatian accent), stronger regional accent, or re-emergence of a previously learned and dormant regional accent. Here, we report loss of regional accent after rapidly regressive Broca's aphasia in three Argentinean patients who had suffered unilateral or bilateral focal lesions in components of the speech production network. All patients were monolingual speakers with three different native Spanish accents (Cordobés or central, Guaranítico or northeast, and Bonaerense). Samples of speech production from the patient with native Córdoba accent were compared with previous recordings of his voice, whereas data from the patient with native Guaranítico accent were compared with speech samples from one healthy control matched for age, gender, and native accent. Speech samples from the patient with native Buenos Aires's accent were compared with data obtained from four healthy control subjects with the same accent. Analysis of speech production revealed discrete slowing in speech rate, inappropriate long pauses, and monotonous intonation. Phonemic production remained similar to those of healthy Spanish speakers, but phonetic variants peculiar to each accent (e.g., intervocalic aspiration of /s/ in Córdoba accent) were absent. While basic normal prosodic features of Spanish prosody were preserved, features intrinsic to melody of certain geographical areas (e.g., rising end F0 excursion in declarative sentences intoned with Córdoba accent) were absent. All patients were also unable to produce sentences with different emotional prosody. Brain imaging disclosed focal left hemisphere lesions involving the middle part of the motor cortex, the post-central cortex, the posterior inferior and/or middle frontal cortices, insula, anterior putamen and supplementary motor area. Our findings suggest that lesions affecting the middle part of the left motor cortex and other components of the speech production network disrupt neural processes involved in the production of regional accent features.
Loss of regional accent after damage to the speech production network
Berthier, Marcelo L.; Dávila, Guadalupe; Moreno-Torres, Ignacio; Beltrán-Corbellini, Álvaro; Santana-Moreno, Daniel; Roé-Vellvé, Núria; Thurnhofer-Hemsi, Karl; Torres-Prioris, María José; Massone, María Ignacia; Ruiz-Cruces, Rafael
2015-01-01
Lesion-symptom mapping studies reveal that selective damage to one or more components of the speech production network can be associated with foreign accent syndrome, changes in regional accent (e.g., from Parisian accent to Alsatian accent), stronger regional accent, or re-emergence of a previously learned and dormant regional accent. Here, we report loss of regional accent after rapidly regressive Broca’s aphasia in three Argentinean patients who had suffered unilateral or bilateral focal lesions in components of the speech production network. All patients were monolingual speakers with three different native Spanish accents (Cordobés or central, Guaranítico or northeast, and Bonaerense). Samples of speech production from the patient with native Córdoba accent were compared with previous recordings of his voice, whereas data from the patient with native Guaranítico accent were compared with speech samples from one healthy control matched for age, gender, and native accent. Speech samples from the patient with native Buenos Aires’s accent were compared with data obtained from four healthy control subjects with the same accent. Analysis of speech production revealed discrete slowing in speech rate, inappropriate long pauses, and monotonous intonation. Phonemic production remained similar to those of healthy Spanish speakers, but phonetic variants peculiar to each accent (e.g., intervocalic aspiration of /s/ in Córdoba accent) were absent. While basic normal prosodic features of Spanish prosody were preserved, features intrinsic to melody of certain geographical areas (e.g., rising end F0 excursion in declarative sentences intoned with Córdoba accent) were absent. All patients were also unable to produce sentences with different emotional prosody. Brain imaging disclosed focal left hemisphere lesions involving the middle part of the motor cortex, the post-central cortex, the posterior inferior and/or middle frontal cortices, insula, anterior putamen and supplementary motor area. Our findings suggest that lesions affecting the middle part of the left motor cortex and other components of the speech production network disrupt neural processes involved in the production of regional accent features. PMID:26594161
ERIC Educational Resources Information Center
Schaadt, Gesa; Männel, Claudia; van der Meer, Elke; Pannekamp, Ann; Friederici, Angela D.
2016-01-01
Successful communication in everyday life crucially involves the processing of auditory and visual components of speech. Viewing our interlocutor and processing visual components of speech facilitates speech processing by triggering auditory processing. Auditory phoneme processing, analyzed by event-related brain potentials (ERP), has been shown…
Understanding the abstract role of speech in communication at 12 months.
Martin, Alia; Onishi, Kristine H; Vouloumanos, Athena
2012-04-01
Adult humans recognize that even unfamiliar speech can communicate information between third parties, demonstrating an ability to separate communicative function from linguistic content. We examined whether 12-month-old infants understand that speech can communicate before they understand the meanings of specific words. Specifically, we test the understanding that speech permits the transfer of information about a Communicator's target object to a Recipient. Initially, the Communicator selectively grasped one of two objects. In test, the Communicator could no longer reach the objects. She then turned to the Recipient and produced speech (a nonsense word) or non-speech (coughing). Infants looked longer when the Recipient selected the non-target than the target object when the Communicator had produced speech but not coughing (Experiment 1). Looking time patterns differed from the speech condition when the Recipient rather than the Communicator produced the speech (Experiment 2), and when the Communicator produced a positive emotional vocalization (Experiment 3), but did not differ when the Recipient had previously received information about the target by watching the Communicator's selective grasping (Experiment 4). Thus infants understand the information-transferring properties of speech and recognize some of the conditions under which others' information states can be updated. These results suggest that infants possess an abstract understanding of the communicative function of speech, providing an important potential mechanism for language and knowledge acquisition. Copyright © 2011 Elsevier B.V. All rights reserved.
Speech processing using maximum likelihood continuity mapping
Hogden, John E.
2000-01-01
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.E.
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
ERIC Educational Resources Information Center
Memphis State Univ., TN. Coll. of Education.
The document contains the proceedings of a 1983 Tennessee conference on "Provision of Services to the Severely and Emotionally Disturbed and Austistic." Areas covered were identified as priority needs by Tennessee educators and emphasize the practical rather than the theoretical aspects of providing services. After the text of the keynote speech,…
Computational Modeling of Emotions and Affect in Social-Cultural Interaction
2013-10-02
acoustic and textual information sources. Second, a cross-lingual study was performed that shed light on how human perception and automatic recognition...speech is produced, a speaker’s pitch and intonational pattern, and word usage. Better feature representation and advanced approaches were used to...recognition performance, and improved our understanding of language/cultural impact on human perception of emotion and automatic classification. • Units
Langereis, Margreet; Vermeulen, Anneke
2015-06-01
This study aimed to evaluate the long term effects of CI on auditory, language, educational and social-emotional development of deaf children in different educational-communicative settings. The outcomes of 58 children with profound hearing loss and normal non-verbal cognition, after 60 months of CI use have been analyzed. At testing the children were enrolled in three different educational settings; in mainstream education, where spoken language is used or in hard-of-hearing education where sign supported spoken language is used and in bilingual deaf education, with Sign Language of the Netherlands and Sign Supported Dutch. Children were assessed on auditory speech perception, receptive language, educational attainment and wellbeing. Auditory speech perception of children with CI in mainstream education enable them to acquire language and educational levels that are comparable to those of their normal hearing peers. Although the children in mainstream and hard-of-hearing settings show similar speech perception abilities, language development in children in hard-of-hearing settings lags significantly behind. Speech perception, language and educational attainments of children in deaf education remained extremely poor. Furthermore more children in mainstream and hard-of-hearing environments are resilient than in deaf educational settings. Regression analyses showed an important influence of educational setting. Children with CI who are placed in early intervention environments that facilitate auditory development are able to achieve good auditory speech perception, language and educational levels on the long term. Most parents of these children report no social-emotional concerns. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Hearing Voices and Seeing Things
... are serious and severely interfere with a child's thinking and functioning. Children who are psychotic often appear ... and agitated. They also may have disorganized speech, thinking, emotional reactions, and behavior, sometimes accompanied by hallucinations ...
Augustine, Ann Mary; Chrysolyte, Shipra B; Thenmozhi, K; Rupa, V
2013-04-01
In order to assess psychosocial and auditory handicap in Indian patients with unilateral sensorineural hearing loss (USNHL), a prospective study was conducted on 50 adults with USNHL in the ENT Outpatient clinic of a tertiary care centre. The hearing handicap inventory for adults (HHIA) as well as speech in noise and sound localization tests were administered to patients with USNHL. An equal number of age-matched, normal controls also underwent the speech and sound localization tests. The results showed that HHIA scores ranged from 0 to 60 (mean 20.7). Most patients (84.8 %) had either mild to moderate or no handicap. Emotional subscale scores were higher than social subscale scores (p = 0.01). When the effect of sociodemographic factors on HHIA scores was analysed, educated individuals were found to have higher social subscale scores (p = 0.04). Age, sex, side and duration of hearing loss, occupation and income did not affect HHIA scores. Speech in noise and sound localization were significantly poorer in cases compared to controls (p < 0.001). About 75 % of patients refused a rehabilitative device. We conclude that USNHL in Indian adults does not usually produce severe handicap. When present, the handicap is more emotional than social. USNHL significantly affects sound localization and speech in noise. Yet, affected patients seldom seek a rehabilitative device.
Ortwein, Heiderose; Benz, Alexander; Carl, Petra; Huwendiek, Sören; Pander, Tanja; Kiessling, Claudia
2017-02-01
To investigate whether the Verona Coding Definitions of Emotional Sequences to code health providers' responses (VR-CoDES-P) can be used for assessment of medical students' responses to patients' cues and concerns provided in written case vignettes. Student responses in direct speech to patient cues and concerns were analysed in 21 different case scenarios using VR-CoDES-P. A total of 977 student responses were available for coding, and 857 responses were codable with the VR-CoDES-P. In 74.6% of responses, the students used either a "reducing space" statement only or a "providing space" statement immediately followed by a "reducing space" statement. Overall, the most frequent response was explicit information advice (ERIa) followed by content exploring (EPCEx) and content acknowledgement (EPCAc). VR-CoDES-P were applicable to written responses of medical students when they were phrased in direct speech. The application of VR-CoDES-P is reliable and feasible when using the differentiation of "providing" and "reducing space" responses. Communication strategies described by students in non-direct speech were difficult to code and produced many missings. VR-CoDES-P are useful for analysis of medical students' written responses when focusing on emotional issues. Students need precise instructions for their response in the given test format. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The not face: A grammaticalization of facial expressions of emotion.
Benitez-Quiroz, C Fabian; Wilbur, Ronnie B; Martinez, Aleix M
2016-05-01
Facial expressions of emotion are thought to have evolved from the development of facial muscles used in sensory regulation and later adapted to express moral judgment. Negative moral judgment includes the expressions of anger, disgust and contempt. Here, we study the hypothesis that these facial expressions of negative moral judgment have further evolved into a facial expression of negation regularly used as a grammatical marker in human language. Specifically, we show that people from different cultures expressing negation use the same facial muscles as those employed to express negative moral judgment. We then show that this nonverbal signal is used as a co-articulator in speech and that, in American Sign Language, it has been grammaticalized as a non-manual marker. Furthermore, this facial expression of negation exhibits the theta oscillation (3-8 Hz) universally seen in syllable and mouthing production in speech and signing. These results provide evidence for the hypothesis that some components of human language have evolved from facial expressions of emotion, and suggest an evolutionary route for the emergence of grammatical markers. Copyright © 2016 Elsevier B.V. All rights reserved.
The Not Face: A grammaticalization of facial expressions of emotion
Benitez-Quiroz, C. Fabian; Wilbur, Ronnie B.; Martinez, Aleix M.
2016-01-01
Facial expressions of emotion are thought to have evolved from the development of facial muscles used in sensory regulation and later adapted to express moral judgment. Negative moral judgment includes the expressions of anger, disgust and contempt. Here, we study the hypothesis that these facial expressions of negative moral judgment have further evolved into a facial expression of negation regularly used as a grammatical marker in human language. Specifically, we show that people from different cultures expressing negation use the same facial muscles as those employed to express negative moral judgment. We then show that this nonverbal signal is used as a co-articulator in speech and that, in American Sign Language, it has been grammaticalized as a non-manual marker. Furthermore, this facial expression of negation exhibits the theta oscillation (3–8 Hz) universally seen in syllable and mouthing production in speech and signing. These results provide evidence for the hypothesis that some components of human language have evolved from facial expressions of emotion, and suggest an evolutionary route for the emergence of grammatical markers. PMID:26872248
Dorjee, Dusana; Lally, Níall; Darrall-Rew, Jonathan; Thierry, Guillaume
2015-08-01
Initial research shows that mindfulness training can enhance attention and modulate the affective response. However, links between mindfulness and language processing remain virtually unexplored despite the prominent role of overt and silent negative ruminative speech in depressive and anxiety-related symptomatology. Here, we measured dispositional mindfulness and recorded participants' event-related brain potential responses to positive and negative target words preceded by words congruent or incongruent with the targets in terms of semantic relatedness and emotional valence. While the low mindfulness group showed similar N400 effect pattern for positive and negative targets, high dispositional mindfulness was associated with larger N400 effect to negative targets. This result suggests that negative meanings are less readily accessible in people with high dispositional mindfulness. Furthermore, high dispositional mindfulness was associated with reduced P600 amplitudes to emotional words, suggesting less post-analysis and attentional effort which possibly relates to a lower inclination to ruminate. Overall, these findings provide initial evidence on associations between modifications in language systems and mindfulness. Copyright © 2015 Elsevier Ireland Ltd and the Japan Neuroscience Society. All rights reserved.
Audiovisual speech perception development at varying levels of perceptual processing
Lalonde, Kaylah; Holt, Rachael Frush
2016-01-01
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children. PMID:27106318
Audiovisual speech perception development at varying levels of perceptual processing.
Lalonde, Kaylah; Holt, Rachael Frush
2016-04-01
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.
Weeks, Justin W; Zoccola, Peggy M
2015-12-01
Accumulating evidence supports fear of evaluation in general as important in social anxiety, including fear of positive evaluation (FPE) and fear of negative evaluation (FNE). The present study examined state responses to an impromptu speech task with a sample of 81 undergraduates. This study is the first to compare and contrast physiological responses associated with FPE and FNE, and to examine both FPE- and FNE-related changes in state anxiety/affect in response to perceived social evaluation during a speech. FPE uniquely predicted (relative to FNE/depression) increases in mean heart rate during the speech; in contrast, neither FNE nor depression related to changes in heart rate. Both FPE and FNE related uniquely to increases in negative affect and state anxiety during the speech. Furthermore, pre-speech state anxiety mediated the relationship between trait FPE and diminished positive affect during the speech. Implications for the theoretical conceptualization and treatment of social anxiety are discussed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Behaviorally-based couple therapies reduce emotional arousal during couple conflict.
Baucom, Brian R; Sheng, Elisa; Christensen, Andrew; Georgiou, Panayiotis G; Narayanan, Shrikanth S; Atkins, David C
2015-09-01
Emotional arousal during relationship conflict is a major target for intervention in couple therapies. The current study examines changes in conflict-related emotional arousal in 104 couples that participated in a randomized clinical trial of two behaviorally-based couple therapies. Emotional arousal is measured using mean fundamental frequency of spouse's speech, and changes in emotional arousal from pre-to post-therapy are examined using multilevel models. Overall emotional arousal, the rate of increase in emotional arousal at the beginning of conflict, and the duration of emotional arousal declined for all couples. Reductions in overall arousal were stronger for TBCT wives than for IBCT wives but not significantly different for IBCT and TBCT husbands. Reductions in the rate of initial arousal were larger for TBCT couples than IBCT couples. Reductions in duration were larger for IBCT couples than TBCT couples. These findings suggest that both therapies can reduce emotional arousal, but that the two therapies create different kinds of change in emotional arousal. Copyright © 2015 Elsevier Ltd. All rights reserved.
Laban movement analysis to classify emotions from motion
NASA Astrophysics Data System (ADS)
Dewan, Swati; Agarwal, Shubham; Singh, Navjyoti
2018-04-01
In this paper, we present the study of Laban Movement Analysis (LMA) to understand basic human emotions from nonverbal human behaviors. While there are a lot of studies on understanding behavioral patterns based on natural language processing and speech processing applications, understanding emotions or behavior from non-verbal human motion is still a very challenging and unexplored field. LMA provides a rich overview of the scope of movement possibilities. These basic elements can be used for generating movement or for describing movement. They provide an inroad to understanding movement and for developing movement efficiency and expressiveness. Each human being combines these movement factors in his/her own unique way and organizes them to create phrases and relationships which reveal personal, artistic, or cultural style. In this work, we build a motion descriptor based on a deep understanding of Laban theory. The proposed descriptor builds up on previous works and encodes experiential features by using temporal windows. We present a more conceptually elaborate formulation of Laban theory and test it in a relatively new domain of behavioral research with applications in human-machine interaction. The recognition of affective human communication may be used to provide developers with a rich source of information for creating systems that are capable of interacting well with humans. We test our algorithm on UCLIC dataset which consists of body motions of 13 non-professional actors portraying angry, fear, happy and sad emotions. We achieve an accuracy of 87.30% on this dataset.
NASA Astrophysics Data System (ADS)
Maarif, H. A.; Akmeliawati, R.; Gunawan, T. S.; Shafie, A. A.
2013-12-01
Sign language synthesizer is a method to visualize the sign language movement from the spoken language. The sign language (SL) is one of means used by HSI people to communicate to normal people. But, unfortunately the number of people, including the HSI people, who are familiar with sign language is very limited. These cause difficulties in the communication between the normal people and the HSI people. The sign language is not only hand movement but also the face expression. Those two elements have complimentary aspect each other. The hand movement will show the meaning of each signing and the face expression will show the emotion of a person. Generally, Sign language synthesizer will recognize the spoken language by using speech recognition, the grammatical process will involve context free grammar, and 3D synthesizer will take part by involving recorded avatar. This paper will analyze and compare the existing techniques of developing a sign language synthesizer, which leads to IIUM Sign Language Synthesizer.
Childhood Stuttering: Where Are We and Where Are We Going?
Smith, Anne; Weber, Christine
2016-11-01
Remarkable progress has been made over the past two decades in expanding our understanding of the behavioral, peripheral physiologic, and central neurophysiologic bases of stuttering in early childhood. It is clear that stuttering is a neurodevelopmental disorder characterized by atypical development of speech motor planning and execution networks. The speech motor system must interact in complex ways with neural systems mediating language and other cognitive and emotional processes. During the time when stuttering typically appears and follows its path to either recovery or persistence, all of these neurobehavioral systems are undergoing rapid and dramatic developmental changes. We summarize our current understanding of the various developmental trajectories relevant for the understanding of stuttering in early childhood. We also present theoretical and experimental approaches that we believe will be optimal for even more rapid progress toward developing better and more targeted treatment for stuttering in the preschool children who are more likely to persist in stuttering. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
45 CFR 2490.103 - Definitions.
Code of Federal Regulations, 2014 CFR
2014-10-01
..., such diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, HIV disease (whether symptomatic or asymptomatic), and drug addiction and alcoholism. (2...
45 CFR 2490.103 - Definitions.
Code of Federal Regulations, 2012 CFR
2012-10-01
..., such diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, HIV disease (whether symptomatic or asymptomatic), and drug addiction and alcoholism. (2...
[Swallowing and Voice Disorders in Cancer Patients].
Tanuma, Akira
2015-07-01
Dysphagia sometimes occurs in patients with head and neck cancer, particularly in those undergoing surgery and radiotherapy for lingual, pharyngeal, and laryngeal cancer. It also occurs in patients with esophageal cancer and brain tumor. Patients who undergo glossectomy usually show impairment of the oral phase of swallowing, whereas those with pharyngeal, laryngeal, and esophageal cancer show impairment of the pharyngeal phase of swallowing. Videofluoroscopic examination of swallowing provides important information necessary for rehabilitation of swallowing in these patients. Appropriate swallowing exercises and compensatory strategies can be decided based on the findings of the evaluation. Palatal augmentation prostheses are sometimes used for rehabilitation in patients undergoing glossectomy. Patients who undergo total laryngectomy or total pharyngolaryngoesophagectomy should receive speech therapy to enable them to use alaryngeal speech methods, including electrolarynx, esophageal speech, or speech via tracheoesophageal puncture. Regaining swallowing function and speech can improve a patient's emotional health and quality of life. Therefore, it is important to manage swallowing and voice disorders appropriately.
Hemodynamics of speech production: An fNIRS investigation of children who stutter.
Walsh, B; Tian, F; Tourville, J A; Yücel, M A; Kuczek, T; Bostian, A J
2017-06-22
Stuttering affects nearly 1% of the population worldwide and often has life-altering negative consequences, including poorer mental health and emotional well-being, and reduced educational and employment achievements. Over two decades of neuroimaging research reveals clear anatomical and physiological differences in the speech neural networks of adults who stutter. However, there have been few neurophysiological investigations of speech production in children who stutter. Using functional near-infrared spectroscopy (fNIRS), we examined hemodynamic responses over neural regions integral to fluent speech production including inferior frontal gyrus, premotor cortex, and superior temporal gyrus during a picture description task. Thirty-two children (16 stuttering and 16 controls) aged 7-11 years participated in the study. We found distinctly different speech-related hemodynamic responses in the group of children who stutter compared to the control group. Whereas controls showed significant activation over left dorsal inferior frontal gyrus and left premotor cortex, children who stutter exhibited deactivation over these left hemisphere regions. This investigation of neural activation during natural, connected speech production in children who stutter demonstrates that in childhood stuttering, atypical functional organization for speech production is present and suggests promise for the use of fNIRS during natural speech production in future research with typical and atypical child populations.
Is talking to an automated teller machine natural and fun?
Chan, F Y; Khalid, H M
Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.
Speech in 10-Year-Olds Born With Cleft Lip and Palate: What Do Peers Say?
Nyberg, Jill; Havstam, Christina
2016-09-01
The aim of this study was to explore how 10-year-olds describe speech and communicative participation in children born with unilateral cleft lip and palate in their own words, whether they perceive signs of velopharyngeal insufficiency (VPI) and articulation errors of different degrees, and if so, which terminology they use. Methods/Participants: Nineteen 10-year-olds participated in three focus group interviews where they listened to 10 to 12 speech samples with different types of cleft speech characteristics assessed by speech and language pathologists (SLPs) and described what they heard. The interviews were transcribed and analyzed with qualitative content analysis. The analysis resulted in three interlinked categories encompassing different aspects of speech, personality, and social implications: descriptions of speech, thoughts on causes and consequences, and emotional reactions and associations. Each category contains four subcategories exemplified with quotes from the children's statements. More pronounced signs of VPI were perceived but referred to in terms relevant to 10-year-olds. Articulatory difficulties, even minor ones, were noted. Peers reflected on the risk to teasing and bullying and on how children with impaired speech might experience their situation. The SLPs and peers did not agree on minor signs of VPI, but they were unanimous in their analysis of clinically normal and more severely impaired speech. Articulatory impairments may be more important to treat than minor signs of VPI based on what peers say.
Yang, Jie; Andric, Michael; Mathew, Mili M
2015-10-01
Gestures play an important role in face-to-face communication and have been increasingly studied via functional magnetic resonance imaging. Although a large amount of data has been provided to describe the neural substrates of gesture comprehension, these findings have never been quantitatively summarized and the conclusion is still unclear. This activation likelihood estimation meta-analysis investigated the brain networks underpinning gesture comprehension while considering the impact of gesture type (co-speech gestures vs. speech-independent gestures) and task demand (implicit vs. explicit) on the brain activation of gesture comprehension. The meta-analysis of 31 papers showed that as hand actions, gestures involve a perceptual-motor network important for action recognition. As meaningful symbols, gestures involve a semantic network for conceptual processing. Finally, during face-to-face interactions, gestures involve a network for social emotive processes. Our finding also indicated that gesture type and task demand influence the involvement of the brain networks during gesture comprehension. The results highlight the complexity of gesture comprehension, and suggest that future research is necessary to clarify the dynamic interactions among these networks. Copyright © 2015 Elsevier Ltd. All rights reserved.
Biologically inspired emotion recognition from speech
NASA Astrophysics Data System (ADS)
Caponetti, Laura; Buscicchio, Cosimo Alessandro; Castellano, Giovanna
2011-12-01
Emotion recognition has become a fundamental task in human-computer interaction systems. In this article, we propose an emotion recognition approach based on biologically inspired methods. Specifically, emotion classification is performed using a long short-term memory (LSTM) recurrent neural network which is able to recognize long-range dependencies between successive temporal patterns. We propose to represent data using features derived from two different models: mel-frequency cepstral coefficients (MFCC) and the Lyon cochlear model. In the experimental phase, results obtained from the LSTM network and the two different feature sets are compared, showing that features derived from the Lyon cochlear model give better recognition results in comparison with those obtained with the traditional MFCC representation.
Four-Channel Biosignal Analysis and Feature Extraction for Automatic Emotion Recognition
NASA Astrophysics Data System (ADS)
Kim, Jonghwa; André, Elisabeth
This paper investigates the potential of physiological signals as a reliable channel for automatic recognition of user's emotial state. For the emotion recognition, little attention has been paid so far to physiological signals compared to audio-visual emotion channels such as facial expression or speech. All essential stages of automatic recognition system using biosignals are discussed, from recording physiological dataset up to feature-based multiclass classification. Four-channel biosensors are used to measure electromyogram, electrocardiogram, skin conductivity and respiration changes. A wide range of physiological features from various analysis domains, including time/frequency, entropy, geometric analysis, subband spectra, multiscale entropy, etc., is proposed in order to search the best emotion-relevant features and to correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by emotion recognition results.
Goswami, Usha
2004-03-01
Neuroscience is a relatively new discipline encompassing neurology, psychology and biology. It has made great strides in the last 100 years, during which many aspects of the physiology, biochemistry, pharmacology and structure of the vertebrate brain have been understood. Understanding of some of the basic perceptual, cognitive, attentional, emotional and mnemonic functions is also making progress, particularly since the advent of the cognitive neurosciences, which focus specifically on understanding higher level processes of cognition via imaging technology. Neuroimaging has enabled scientists to study the human brain at work in vivo, deepening our understanding of the very complex processes underpinning speech and language, thinking and reasoning, reading and mathematics. It seems timely, therefore, to consider how we might implement our increased understanding of brain development and brain function to explore educational questions.
[Family characteristics of stuttering children].
Simić-Ruzić, Budimirka; Jovanović, Aleksandar A
2008-01-01
Stuttering is a functional impairment of speech, which is manifested by conscious, but nonintentionally interrupted, disharmonic and disrhythmic fluctuation of sound varying in frequency and intensity. Aetiology of this disorder has been conceived within the frame of theoretical models, which tend to connect genetic and epigenetic factors. The goal of the paper was to study the characteristics of the family functioning of stuttering children in comparison to the family functioning of children without speech disorder, which confirmed the justification of the introduction of family orientated therapeutic interventions into the therapy spectrum of child stuttering. Seventy-nine nucleus families of 3 to 6-year-old children were examined; of these, 39 families had stuttering children and 40 had children without speech disorder. The assessment of family characteristics was made using the Family Health Scale, an observer-rating scale which according to semistructured interview and operational criteria, measures 6 basic dimensions of family functioning: Emotional State, Communication, Borders, Alliances, Adaptability & Stability, Family Skills. A total score calculated from the basic dimensions, is considered as a global index of family health. Families with stuttering children compared to families with children without speech disorder showed significantly lower scores in all the basic dimension of family functioning, as well as in the total score on the Family Health Scale. Our research results have shown that stuttering children in comparison with children without speech disorder live in families with unfavourable emotional atmosphere, impaired communication and worse control over situational and developmental difficulties, which affect children's development and well-being. In the light of previous research, the application of family therapy modified according to the child's needs is now considered indispensable in the therapeutic approach to stuttering children. The assessment of family characteristics with special reference to the ability of parents to recognize specific needs of children with speech disorder and adequate interaction, as well as readiness of parents for therapeutic collaboration are the necessary elements in legal custody evaluations.
Beattie, Geoffrey; Shovelton, Heather
2005-02-01
The design of effective communications depends upon an adequate model of the communication process. The traditional model is that speech conveys semantic information and bodily movement conveys information about emotion and interpersonal attitudes. But McNeill (2000) argues that this model is fundamentally wrong and that some bodily movements, namely spontaneous hand movements generated during talk (iconic gestures), are integral to semantic communication. But can we increase the effectiveness of communication using this new theory? Focusing on advertising we found that advertisements in which the message was split between speech and iconic gesture (possible on TV) were significantly more effective than advertisements in which meaning resided purely in speech or language (radio/newspaper). We also found that the significant differences in communicative effectiveness were maintained across five consecutive trials. We compared the communicative power of professionally made TV advertisements in which a spoken message was accompanied either by iconic gestures or by pictorial images, and found the iconic gestures to be more effective. We hypothesized that iconic gestures are so effective because they illustrate and isolate just the core semantic properties of a product. This research suggests that TV advertisements can be made more effective by incorporating iconic gestures with exactly the right temporal and semantic properties.
Neuromagnetic Vistas into Typical and Atypical Development of Frontal Lobe Functions
Taylor, Margot J.; Doesburg, Sam M.; Pang, Elizabeth W.
2014-01-01
The frontal lobes are involved in many higher-order cognitive functions such as social cognition executive functions and language and speech. These functions are complex and follow a prolonged developmental course from childhood through to early adulthood. Magnetoencephalography (MEG) is ideal for the study of development of these functions, due to its combination of temporal and spatial resolution which allows the determination of age-related changes in both neural timing and location. There are several challenges for MEG developmental studies: to design tasks appropriate to capture the neurodevelopmental trajectory of these cognitive functions, and to develop appropriate analysis strategies to capture various aspects of neuromagnetic frontal lobe activity. Here, we review our MEG research on social and executive functions, and speech in typically developing children and in two clinical groups – children with autism spectrum disorder and children born very preterm. The studies include facial emotional processing, inhibition, visual short-term memory, speech production, and resting-state networks. We present data from event-related analyses as well as on oscillations and connectivity analyses and review their contributions to understanding frontal lobe cognitive development. We also discuss the challenges of testing young children in the MEG and the development of age-appropriate technologies and paradigms. PMID:24994980
Oba, Sandra I.; Galvin, John J.; Fu, Qian-Jie
2014-01-01
Auditory training has been shown to significantly improve cochlear implant (CI) users’ speech and music perception. However, it is unclear whether post-training gains in performance were due to improved auditory perception or to generally improved attention, memory and/or cognitive processing. In this study, speech and music perception, as well as auditory and visual memory were assessed in ten CI users before, during, and after training with a non-auditory task. A visual digit span (VDS) task was used for training, in which subjects recalled sequences of digits presented visually. After the VDS training, VDS performance significantly improved. However, there were no significant improvements for most auditory outcome measures (auditory digit span, phoneme recognition, sentence recognition in noise, digit recognition in noise), except for small (but significant) improvements in vocal emotion recognition and melodic contour identification. Post-training gains were much smaller with the non-auditory VDS training than observed in previous auditory training studies with CI users. The results suggest that post-training gains observed in previous studies were not solely attributable to improved attention or memory, and were more likely due to improved auditory perception. The results also suggest that CI users may require targeted auditory training to improve speech and music perception. PMID:23516087
45 CFR 2301.103 - Definitions.
Code of Federal Regulations, 2010 CFR
2010-10-01
...; respiratory, including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and... “physical or mental impairment” includes, but is not limited to, such diseases and conditions as orthopedic..., cancer, heart disease, diabetes, mental retardation, emotional illness, HIV disease (whether symptomatic...
Code of Federal Regulations, 2010 CFR
2010-01-01
...; respiratory, including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and... “physical or mental impairment” includes, but is not limited to, such diseases and conditions as orthopedic..., cancer, heart disease, diabetes, mental retardation, emotional illness, HIV disease (whether symptomatic...
Code of Federal Regulations, 2010 CFR
2010-10-01
...; respiratory, including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and... “physical or mental impairment” includes, but is not limited to, such diseases and conditions as orthopedic..., cancer, heart disease, diabetes, mental retardation, emotional illness, and drug addiction and alcoholism...
Code of Federal Regulations, 2010 CFR
2010-10-01
...; respiratory, including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and... “physical, mental or sensory impairment” includes, but is not limited to, such diseases and conditions as... sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, drug addiction, and...
45 CFR 1214.103 - Definitions.
Code of Federal Regulations, 2014 CFR
2014-10-01
... impairment” includes, but is not limited to, such diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, and drug addiction and alcoholism. (2) Major...
45 CFR 1214.103 - Definitions.
Code of Federal Regulations, 2012 CFR
2012-10-01
... impairment” includes, but is not limited to, such diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, and drug addiction and alcoholism. (2) Major...
Vocal contagion of emotions in non-human animals
2018-01-01
Communicating emotions to conspecifics (emotion expression) allows the regulation of social interactions (e.g. approach and avoidance). Moreover, when emotions are transmitted from one individual to the next, leading to state matching (emotional contagion), information transfer and coordination between group members are facilitated. Despite the high potential for vocalizations to influence the affective state of surrounding individuals, vocal contagion of emotions has been largely unexplored in non-human animals. In this paper, I review the evidence for discrimination of vocal expression of emotions, which is a necessary step for emotional contagion to occur. I then describe possible proximate mechanisms underlying vocal contagion of emotions, propose criteria to assess this phenomenon and review the existing evidence. The literature so far shows that non-human animals are able to discriminate and be affected by conspecific and also potentially heterospecific (e.g. human) vocal expression of emotions. Since humans heavily rely on vocalizations to communicate (speech), I suggest that studying vocal contagion of emotions in non-human animals can lead to a better understanding of the evolution of emotional contagion and empathy. PMID:29491174
Minimalistic toy robot to analyze a scenery of speaker-listener condition in autism.
Giannopulu, Irini; Montreynaud, Valérie; Watanabe, Tomio
2016-05-01
Atypical neural architecture causes impairment in communication capabilities and reduces the ability of representing the referential statements of other people in children with autism. During a scenery of "speaker-listener" communication, we have analyzed verbal and emotional expressions in neurotypical children (n = 20) and in children with autism (n = 20). The speaker was always a child, and the listener was a human or a minimalistic robot which reacts to speech expression by nodding only. Although both groups performed the task, everything happens as if the robot could allow children with autism to elaborate a multivariate equation encoding and conceptualizing within his/her brain, and externalizing into unconscious emotion (heart rate) and conscious verbal speech (words). Such a behavior would indicate that minimalistic artificial environments such as toy robots could be considered as the root of neuronal organization and reorganization with the potential to improve brain activity.
Motivation and appraisal in perception of poorly specified speech.
Lidestam, Björn; Beskow, Jonas
2006-04-01
Normal-hearing students (n = 72) performed sentence, consonant, and word identification in either A (auditory), V (visual), or AV (audiovisual) modality. The auditory signal had difficult speech-to-noise relations. Talker (human vs. synthetic), topic (no cue vs. cue-words), and emotion (no cue vs. facially displayed vs. cue-words) were varied within groups. After the first block, effects of modality, face, topic, and emotion on initial appraisal and motivation were assessed. After the entire session, effects of modality on longer-term appraisal and motivation were assessed. The results from both assessments showed that V identification was more positively appraised than A identification. Correlations were tentatively interpreted such that evaluation of self-rated performance possibly depends on subjective standard and is reflected on motivation (if below subjective standard, AV group), or on appraisal (if above subjective standard, A group). Suggestions for further research are presented.
Prosodic alignment in human-computer interaction
NASA Astrophysics Data System (ADS)
Suzuki, N.; Katagiri, Y.
2007-06-01
Androids that replicate humans in form also need to replicate them in behaviour to achieve a high level of believability or lifelikeness. We explore the minimal social cues that can induce in people the human tendency for social acceptance, or ethopoeia, toward artifacts, including androids. It has been observed that people exhibit a strong tendency to adjust to each other, through a number of speech and language features in human-human conversational interactions, to obtain communication efficiency and emotional engagement. We investigate in this paper the phenomena related to prosodic alignment in human-computer interactions, with particular focus on human-computer alignment of speech characteristics. We found that people exhibit unidirectional and spontaneous short-term alignment of loudness and response latency in their speech in response to computer-generated speech. We believe this phenomenon of prosodic alignment provides one of the key components for building social acceptance of androids.
ERIC Educational Resources Information Center
Turney, Michael T.; And Others
This report on speech research contains papers describing experiments involving both information processing and speech production. The papers concerned with information processing cover such topics as peripheral and central processes in vision, separate speech and nonspeech processing in dichotic listening, and dichotic fusion along an acoustic…
Neurophysiological Influence of Musical Training on Speech Perception
Shahin, Antoine J.
2011-01-01
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL. PMID:21716639
Neurophysiological influence of musical training on speech perception.
Shahin, Antoine J
2011-01-01
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL.
Rith-Najarian, Leslie R.; McLaughlin, Katie A.; Sheridan, Margaret A.; Nock, Matthew K.
2014-01-01
Extensive research among adults supports the biopsychosocial (BPS) model of challenge and threat, which describes relationships among stress appraisals, physiological stress reactivity, and performance; however, no previous studies have examined these relationships in adolescents. Perceptions of stressors as well as physiological reactivity to stress increase during adolescence, highlighting the importance of understanding the relationships among stress appraisals, physiological reactivity, and performance during this developmental period. In this study, 79 adolescent participants reported on stress appraisals before and after a Trier Social Stress Test in which they performed a speech task. Physiological stress reactivity was defined by changes in cardiac output and total peripheral resistance from a baseline rest period to the speech task, and performance on the speech was coded using an objective rating system. We observed in adolescents only two relationships found in past adult research on the BPS model variables: (1) pre-task stress appraisal predicted post-task stress appraisal and (2) performance predicted post-task stress appraisal. Physiological reactivity during the speech was unrelated to pre- and post-task stress appraisals and to performance. We conclude that the lack of association between post-task stress appraisal and physiological stress reactivity suggests that adolescents might have low self-awareness of physiological emotional arousal. Our findings further suggest that adolescent stress appraisals are based largely on their performance during stressful situations. Developmental implications of this potential lack of awareness of one’s physiological and emotional state during adolescence are discussed. PMID:24491123
Rith-Najarian, Leslie R; McLaughlin, Katie A; Sheridan, Margaret A; Nock, Matthew K
2014-03-01
Extensive research among adults supports the biopsychosocial (BPS) model of challenge and threat, which describes relationships among stress appraisals, physiological stress reactivity, and performance; however, no previous studies have examined these relationships in adolescents. Perceptions of stressors as well as physiological reactivity to stress increase during adolescence, highlighting the importance of understanding the relationships among stress appraisals, physiological reactivity, and performance during this developmental period. In this study, 79 adolescent participants reported on stress appraisals before and after a Trier Social Stress Test in which they performed a speech task. Physiological stress reactivity was defined by changes in cardiac output and total peripheral resistance from a baseline rest period to the speech task, and performance on the speech was coded using an objective rating system. We observed in adolescents only two relationships found in past adult research on the BPS model variables: (1) pre-task stress appraisal predicted post-task stress appraisal and (2) performance predicted post-task stress appraisal. Physiological reactivity during the speech was unrelated to pre- and post-task stress appraisals and to performance. We conclude that the lack of association between post-task stress appraisal and physiological stress reactivity suggests that adolescents might have low self-awareness of physiological emotional arousal. Our findings further suggest that adolescent stress appraisals are based largely on their performance during stressful situations. Developmental implications of this potential lack of awareness of one's physiological and emotional state during adolescence are discussed.
Berthier, Marcelo L.; Roé-Vellvé, Núria; Moreno-Torres, Ignacio; Falcon, Carles; Thurnhofer-Hemsi, Karl; Paredes-Pacheco, José; Torres-Prioris, María J.; De-Torres, Irene; Alfaro, Francisco; Gutiérrez-Cardo, Antonio L.; Baquero, Miquel; Ruiz-Cruces, Rafael; Dávila, Guadalupe
2016-01-01
Foreign accent syndrome (FAS) is a speech disorder that is defined by the emergence of a peculiar manner of articulation and intonation which is perceived as foreign. In most cases of acquired FAS (AFAS) the new accent is secondary to small focal lesions involving components of the bilaterally distributed neural network for speech production. In the past few years FAS has also been described in different psychiatric conditions (conversion disorder, bipolar disorder, and schizophrenia) as well as in developmental disorders (specific language impairment, apraxia of speech). In the present study, two adult males, one with atypical phonetic production and the other one with cluttering, reported having developmental FAS (DFAS) since their adolescence. Perceptual analysis by naïve judges could not confirm the presence of foreign accent, possibly due to the mildness of the speech disorder. However, detailed linguistic analysis provided evidence of prosodic and segmental errors previously reported in AFAS cases. Cognitive testing showed reduced communication in activities of daily living and mild deficits related to psychiatric disorders. Psychiatric evaluation revealed long-lasting internalizing disorders (neuroticism, anxiety, obsessive-compulsive disorder, social phobia, depression, alexithymia, hopelessness, and apathy) in both subjects. Diffusion tensor imaging (DTI) data from each subject with DFAS were compared with data from a group of 21 age- and gender-matched healthy control subjects. Diffusion parameters (MD, AD, and RD) in predefined regions of interest showed changes of white matter microstructure in regions previously related with AFAS and psychiatric disorders. In conclusion, the present findings militate against the possibility that these two subjects have FAS of psychogenic origin. Rather, our findings provide evidence that mild DFAS occurring in the context of subtle, yet persistent, developmental speech disorders may be associated with structural brain anomalies. We suggest that the simultaneous involvement of speech and emotion regulation networks might result from disrupted neural organization during development, or compensatory or maladaptive plasticity. Future studies are required to examine whether the interplay between biological trait-like diathesis (shyness, neuroticism) and the stressful experience of living with mild DFAS lead to the development of internalizing psychiatric disorders. PMID:27555813
Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope.
Vanthornhout, Jonas; Decruy, Lien; Wouters, Jan; Simon, Jonathan Z; Francart, Tom
2018-04-01
Speech intelligibility is currently measured by scoring how well a person can identify a speech signal. The results of such behavioral measures reflect neural processing of the speech signal, but are also influenced by language processing, motivation, and memory. Very often, electrophysiological measures of hearing give insight in the neural processing of sound. However, in most methods, non-speech stimuli are used, making it hard to relate the results to behavioral measures of speech intelligibility. The use of natural running speech as a stimulus in electrophysiological measures of hearing is a paradigm shift which allows to bridge the gap between behavioral and electrophysiological measures. Here, by decoding the speech envelope from the electroencephalogram, and correlating it with the stimulus envelope, we demonstrate an electrophysiological measure of neural processing of running speech. We show that behaviorally measured speech intelligibility is strongly correlated with our electrophysiological measure. Our results pave the way towards an objective and automatic way of assessing neural processing of speech presented through auditory prostheses, reducing confounds such as attention and cognitive capabilities. We anticipate that our electrophysiological measure will allow better differential diagnosis of the auditory system, and will allow the development of closed-loop auditory prostheses that automatically adapt to individual users.
Asad, Areej Nimer; Purdy, Suzanne C; Ballard, Elaine; Fairgray, Liz; Bowen, Caroline
2018-04-27
In this descriptive study, phonological processes were examined in the speech of children aged 5;0-7;6 (years; months) with mild to profound hearing loss using hearing aids (HAs) and cochlear implants (CIs), in comparison to their peers. A second aim was to compare phonological processes of HA and CI users. Children with hearing loss (CWHL, N = 25) were compared to children with normal hearing (CWNH, N = 30) with similar age, gender, linguistic, and socioeconomic backgrounds. Speech samples obtained from a list of 88 words, derived from three standardized speech tests, were analyzed using the CASALA (Computer Aided Speech and Language Analysis) program to evaluate participants' phonological systems, based on lax (a process appeared at least twice in the speech of at least two children) and strict (a process appeared at least five times in the speech of at least two children) counting criteria. Developmental phonological processes were eliminated in the speech of younger and older CWNH while eleven developmental phonological processes persisted in the speech of both age groups of CWHL. CWHL showed a similar trend of age of elimination to CWNH, but at a slower rate. Children with HAs and CIs produced similar phonological processes. Final consonant deletion, weak syllable deletion, backing, and glottal replacement were present in the speech of HA users, affecting their overall speech intelligibility. Developmental and non-developmental phonological processes persist in the speech of children with mild to profound hearing loss compared to their peers with typical hearing. The findings indicate that it is important for clinicians to consider phonological assessment in pre-school CWHL and the use of evidence-based speech therapy in order to reduce non-developmental and non-age-appropriate developmental processes, thereby enhancing their speech intelligibility. Copyright © 2018 Elsevier Inc. All rights reserved.
Shriberg, Lawrence D; Strand, Edythe A; Fourakis, Marios; Jakielski, Kathy J; Hall, Sheryl D; Karlsson, Heather B; Mabie, Heather L; McSweeny, Jane L; Tilkens, Christie M; Wilson, David L
2017-04-14
Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. PM and other scores were obtained for 264 participants in 6 groups: CAS in idiopathic, neurogenetic, and complex neurodevelopmental disorders; adult-onset apraxia of speech (AAS) consequent to stroke and primary progressive apraxia of speech; and idiopathic speech delay. Participants with CAS and AAS had significantly lower scores than typically speaking reference participants and speech delay controls on measures posited to assess representational and transcoding processes. Representational deficits differed between CAS and AAS groups, with support for both underspecified linguistic representations and memory/access deficits in CAS, but for only the latter in AAS. CAS-AAS similarities in the age-sex standardized percentages of occurrence of the most frequent type of inappropriate pauses (abrupt) and significant differences in the standardized occurrence of appropriate pauses were consistent with speech processing findings. Results support the hypotheses of core representational and transcoding speech processing deficits in CAS and theoretical coherence of the PM's pause-speech elements with these deficits.
Architectural Considerations for Classrooms for Exceptional Children.
ERIC Educational Resources Information Center
Texas Education Agency, Austin.
Definitions are provided of the following exceptionalities: blind, partially sighted, physically handicapped, minimally brain injured, deaf, educable mentally retarded (primary, junior, and senior high levels), trainable mentally retarded, speech handicapped, and emotionally disturbed. Architectural guidelines specify classroom location, size,…
The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise
Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath
2017-01-01
Purpose Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing varying degrees of linguistic information (2-talker babble or pink noise). Method Twenty-nine monolingual English speakers were instructed to ignore the lexical status of spoken syllables (e.g., gift vs. kift) and to only categorize the initial phonemes (/g/ vs. /k/). The same participants then performed speech recognition tasks in the presence of 2-talker babble or pink noise in audio-only and audiovisual conditions. Results Individuals who demonstrated greater lexical influences on phonemic processing experienced greater speech processing difficulties in 2-talker babble than in pink noise. These selective difficulties were present across audio-only and audiovisual conditions. Conclusion Individuals with greater reliance on lexical processes during speech perception exhibit impaired speech recognition in listening conditions in which competing talkers introduce audible linguistic interferences. Future studies should examine the locus of lexical influences/interferences on phonemic processing and speech-in-speech processing. PMID:28586824
Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E.; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z.
2015-01-01
In the last decade, the debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. However, the exact role of the motor system in auditory speech processing remains elusive. Here we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. The patient’s spontaneous speech was marked by frequent phonological/articulatory errors, and those errors were caused, at least in part, by motor-level impairments with speech production. We found that the patient showed a normal phonemic categorical boundary when discriminating two nonwords that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the nonword stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labeling impairment. These data suggest that the identification (i.e. labeling) of nonword speech sounds may involve the speech motor system, but that the perception of speech sounds (i.e., discrimination) does not require the motor system. This means that motor processes are not causally involved in perception of the speech signal, and suggest that the motor system may be used when other cues (e.g., meaning, context) are not available. PMID:25951749
Yoder, Paul J.; Molfese, Dennis; Murray, Micah M.; Key, Alexandra P. F.
2013-01-01
Typically developing (TD) preschoolers and age-matched preschoolers with specific language impairment (SLI) received event-related potentials (ERPs) to four monosyllabic speech sounds prior to treatment and, in the SLI group, after 6 months of grammatical treatment. Before treatment, the TD group processed speech sounds faster than the SLI group. The SLI group increased the speed of their speech processing after treatment. Post-treatment speed of speech processing predicted later impairment in comprehending phrase elaboration in the SLI group. During the treatment phase, change in speed of speech processing predicted growth rate of grammar in the SLI group. PMID:24219693
Social Anxiety, Affect, Cortisol Response and Performance on a Speech Task.
Losiak, Wladyslaw; Blaut, Agata; Klosowska, Joanna; Slowik, Natalia
2016-01-01
Social anxiety is characterized by increased emotional reactivity to social stimuli, but results of studies focusing on affective reactions of socially anxious subjects in the situation of social exposition are inconclusive, especially in the case of endocrinological measures of affect. This study was designed to examine individual differences in endocrinological and affective reactions to social exposure as well as in performance on a speech task in a group of students (n = 44) comprising subjects with either high or low levels of social anxiety. Measures of salivary cortisol and positive and negative affect were taken before and after an impromptu speech. Self-ratings and observer ratings of performance were also obtained. Cortisol levels and negative affect increased in both groups after the speech task, and positive affect decreased; however, group × affect interactions were not significant. Assessments conducted after the speech task revealed that highly socially anxious participants had lower observer ratings of performance while cortisol increase and changes in self-reported affect were not related to performance. Socially anxious individuals do not differ from nonanxious individuals in affective reactions to social exposition, but reveal worse performance at a speech task. © 2015 S. Karger AG, Basel.
Self-awareness deficits following loss of inner speech: Dr. Jill Bolte Taylor's case study.
Morin, Alain
2009-06-01
In her 2006 book "My Stroke of Insight" Dr. Jill Bolte Taylor relates her experience of suffering from a left hemispheric stroke caused by a congenital arteriovenous malformation which led to a loss of inner speech. Her phenomenological account strongly suggests that this impairment produced a global self-awareness deficit as well as more specific dysfunctions related to corporeal awareness, sense of individuality, retrieval of autobiographical memories, and self-conscious emotions. These are examined in details and corroborated by numerous excerpts from Taylor's book.
Reaction times of normal listeners to laryngeal, alaryngeal, and synthetic speech.
Evitts, Paul M; Searl, Jeff
2006-12-01
The purpose of this study was to compare listener processing demands when decoding alaryngeal compared to laryngeal speech. Fifty-six listeners were presented with single words produced by 1 proficient speaker from 5 different modes of speech: normal, tracheosophageal (TE), esophageal (ES), electrolaryngeal (EL), and synthetic speech (SS). Cognitive processing load was indexed by listener reaction time (RT). To account for significant durational differences among the modes of speech, an RT ratio was calculated (stimulus duration divided by RT). Results indicated that the cognitive processing load was greater for ES and EL relative to normal speech. TE and normal speech did not differ in terms of RT ratio, suggesting fairly comparable cognitive demands placed on the listener. SS required greater cognitive processing load than normal and alaryngeal speech. The results are discussed relative to alaryngeal speech intelligibility and the role of the listener. Potential clinical applications and directions for future research are also presented.
Preliminary Support for a Generalized Arousal Model of Political Conservatism
Tritt, Shona M.; Inzlicht, Michael; Peterson, Jordan B.
2013-01-01
It is widely held that negative emotions such as threat, anxiety, and disgust represent the core psychological factors that enhance conservative political beliefs. We put forward an alternative hypothesis: that conservatism is fundamentally motivated by arousal, and that, in this context, the effect of negative emotion is due to engaging intensely arousing states. Here we show that study participants agreed more with right but not left-wing political speeches after being exposed to positive as well as negative emotion-inducing film-clips. No such effect emerged for neutral-content videos. A follow-up study replicated and extended this effect. These results are consistent with the idea that emotional arousal, in general, and not negative valence, specifically, may underlie political conservatism. PMID:24376687
Preliminary support for a generalized arousal model of political conservatism.
Tritt, Shona M; Inzlicht, Michael; Peterson, Jordan B
2013-01-01
It is widely held that negative emotions such as threat, anxiety, and disgust represent the core psychological factors that enhance conservative political beliefs. We put forward an alternative hypothesis: that conservatism is fundamentally motivated by arousal, and that, in this context, the effect of negative emotion is due to engaging intensely arousing states. Here we show that study participants agreed more with right but not left-wing political speeches after being exposed to positive as well as negative emotion-inducing film-clips. No such effect emerged for neutral-content videos. A follow-up study replicated and extended this effect. These results are consistent with the idea that emotional arousal, in general, and not negative valence, specifically, may underlie political conservatism.
Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
2017-01-01
Purpose Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method PM and other scores were obtained for 264 participants in 6 groups: CAS in idiopathic, neurogenetic, and complex neurodevelopmental disorders; adult-onset apraxia of speech (AAS) consequent to stroke and primary progressive apraxia of speech; and idiopathic speech delay. Results Participants with CAS and AAS had significantly lower scores than typically speaking reference participants and speech delay controls on measures posited to assess representational and transcoding processes. Representational deficits differed between CAS and AAS groups, with support for both underspecified linguistic representations and memory/access deficits in CAS, but for only the latter in AAS. CAS–AAS similarities in the age–sex standardized percentages of occurrence of the most frequent type of inappropriate pauses (abrupt) and significant differences in the standardized occurrence of appropriate pauses were consistent with speech processing findings. Conclusions Results support the hypotheses of core representational and transcoding speech processing deficits in CAS and theoretical coherence of the PM's pause-speech elements with these deficits. PMID:28384751
ERIC Educational Resources Information Center
Hickok, Gregory
2012-01-01
Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the…
The Timing and Effort of Lexical Access in Natural and Degraded Speech
Wagner, Anita E.; Toffanin, Paolo; Başkent, Deniz
2016-01-01
Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners’ quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal situations speech is perceived as an undemanding task. Degradation of the signal or the receiver channel can quickly bring this well-adjusted timing out of balance and lead to increase in mental effort. Incomplete and effortful processing at the early pre-lexical stages has its consequences on lexical processing as it adds uncertainty to the forming and revising of lexical hypotheses. PMID:27065901
Narayan, Angela J.; Sapienza, Julianna K.; Monn, Amy R.; Lingras, Katherine A.; Masten, Ann S.
2014-01-01
Objective This study examined risk, vulnerability, and protective processes of parental expressed emotion for children's peer relationships in families living in emergency shelters with high rates of exposure to parental violence (EPV). Parental criticism and negativity were hypothesized to exacerbate the association between EPV and poorer peer relations, while parental warmth was expected to buffer this association. Method Participants included 138 homeless parents (M = 30.77 years, SD = 6.33, range = 20.51-57.32 years; 64% African-American, 12% Caucasian, 24% other) and their 4-6-year-old children (43.5% male; M = 4.83, SD = .58, range = 4.83-6.92 years; 67% African-American, 2% Caucasian, 31% other). Families were assessed during the summer at three urban shelters, with parents completing the Five-Minute Speech Sample (FMSS), later scored for criticism, negativity, and warmth, and interview items about EPV. Teachers were subsequently contacted in the fall about children's classroom behavior, and they provided ratings of peer relations. Demographic factors, parental internalizing symptoms, and observed parental harshness were examined as covariates. Results Regression analyses indicated an interaction of EPV and warmth, consistent with a moderating effect of expressed emotion for EPV and peer relations, although no interactions were found for criticism or negativity. Observed harshness also directly predicted worse peer relations. Conclusions Parental warmth may be protective for positive peer relations among impoverished families with high levels of EPV. The FMSS is discussed as an efficient tool with potential for both basic clinical research and preventative interventions designed to target or assess change in parental expressed emotion. PMID:24635645
Gilboa-Schechtman, Eva; Shachar-Lavie, Iris
2013-01-01
Processing of nonverbal social cues (NVSCs) is essential to interpersonal functioning and is particularly relevant to models of social anxiety. This article provides a review of the literature on NVSC processing from the perspective of social rank and affiliation biobehavioral systems (ABSs), based on functional analysis of human sociality. We examine the potential of this framework for integrating cognitive, interpersonal, and evolutionary accounts of social anxiety. We argue that NVSCs are uniquely suited to rapid and effective conveyance of emotional, motivational, and trait information and that various channels are differentially effective in transmitting such information. First, we review studies on perception of NVSCs through face, voice, and body. We begin with studies that utilized information processing or imaging paradigms to assess NVSC perception. This research demonstrated that social anxiety is associated with biased attention to, and interpretation of, emotional facial expressions (EFEs) and emotional prosody. Findings regarding body and posture remain scarce. Next, we review studies on NVSC expression, which pinpointed links between social anxiety and disturbances in eye gaze, facial expressivity, and vocal properties of spontaneous and planned speech. Again, links between social anxiety and posture were understudied. Although cognitive, interpersonal, and evolutionary theories have described different pathways to social anxiety, all three models focus on interrelations among cognition, subjective experience, and social behavior. NVSC processing and production comprise the juncture where these theories intersect. In light of the conceptualizations emerging from the review, we highlight several directions for future research including focus on NVSCs as indexing reactions to changes in belongingness and social rank, the moderating role of gender, and the therapeutic opportunities offered by embodied cognition to treat social anxiety. PMID:24427129
Speech systems research at Texas Instruments
NASA Technical Reports Server (NTRS)
Doddington, George R.
1977-01-01
An assessment of automatic speech processing technology is presented. Fundamental problems in the development and the deployment of automatic speech processing systems are defined and a technology forecast for speech systems is presented.
Speech perception as an active cognitive process
Heald, Shannon L. M.; Nusbaum, Howard C.
2014-01-01
One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID:24672438
A Proposed Process for Managing the First Amendment Aspects of Campus Hate Speech.
ERIC Educational Resources Information Center
Kaplan, William A.
1992-01-01
A carefully structured process for campus administrative decision making concerning hate speech is proposed and suggestions for implementation are offered. In addition, criteria for evaluating hate speech processes are outlined, and First Amendment principles circumscribing the institution's discretion to regulate hate speech are discussed.…
Left Lateralized Enhancement of Orofacial Somatosensory Processing Due to Speech Sounds
ERIC Educational Resources Information Center
Ito, Takayuki; Johns, Alexis R.; Ostry, David J.
2013-01-01
Purpose: Somatosensory information associated with speech articulatory movements affects the perception of speech sounds and vice versa, suggesting an intimate linkage between speech production and perception systems. However, it is unclear which cortical processes are involved in the interaction between speech sounds and orofacial somatosensory…
NASA Astrophysics Data System (ADS)
Poock, G. K.; Martin, B. J.
1984-02-01
This was an applied investigation examining the ability of a speech recognition system to recognize speakers' inputs when the speakers were under different stress levels. Subjects were asked to speak to a voice recognition system under three conditions: (1) normal office environment, (2) emotional stress, and (3) perceptual-motor stress. Results indicate a definite relationship between voice recognition system performance and the type of low stress reference patterns used to achieve recognition.
Laukka, Petri; Neiberg, Daniel; Elfenbein, Hillary Anger
2014-06-01
The possibility of cultural differences in the fundamental acoustic patterns used to express emotion through the voice is an unanswered question central to the larger debate about the universality versus cultural specificity of emotion. This study used emotionally inflected standard-content speech segments expressing 11 emotions produced by 100 professional actors from 5 English-speaking cultures. Machine learning simulations were employed to classify expressions based on their acoustic features, using conditions where training and testing were conducted on stimuli coming from either the same or different cultures. A wide range of emotions were classified with above-chance accuracy in cross-cultural conditions, suggesting vocal expressions share important characteristics across cultures. However, classification showed an in-group advantage with higher accuracy in within- versus cross-cultural conditions. This finding demonstrates cultural differences in expressive vocal style, and supports the dialect theory of emotions according to which greater recognition of expressions from in-group members results from greater familiarity with culturally specific expressive styles.
Speech Processing and Recognition (SPaRe)
2011-01-01
results in the areas of automatic speech recognition (ASR), speech processing, machine translation (MT), natural language processing ( NLP ), and...Processing ( NLP ), Information Retrieval (IR) 16. SECURITY CLASSIFICATION OF: UNCLASSIFED 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME...Figure 9, the IOC was only expected to provide document submission and search; automatic speech recognition (ASR) for English, Spanish, Arabic , and
Parental Reactions to Cleft Palate Children.
ERIC Educational Resources Information Center
Vanpoelvoorde, Leah; Shaughnessy, Michael F.
1991-01-01
This paper reviews parents' emotional reactions following the birth of a cleft lip/palate child. It examines when parents were told of the deformity and discusses the duties of the speech-language pathologist and the psychologist in counseling the parents and the child. (Author/JDD)
Books Can Break Attitudinal Barriers Toward the Handicapped.
ERIC Educational Resources Information Center
Bauer, Carolyn J.
1985-01-01
Lists books dealing with the more prevalent handicaps of mainstreamed children: visual handicaps, speech handicaps, emotional disturbances, learning disabilities, auditory handicaps, intellectual impairments, and orthopedic handicaps. Recommends books for use from preschool to level three to expose children early and influence their attitudes…
Affective Aprosodia from a Medial Frontal Stroke
ERIC Educational Resources Information Center
Heilman, Kenneth M.; Leon, Susan A.; Rosenbek, John C.
2004-01-01
Background and objectives: Whereas injury to the left hemisphere induces aphasia, injury to the right hemisphere's perisylvian region induces an impairment of emotional speech prosody (affective aprosodia). Left-sided medial frontal lesions are associated with reduced verbal fluency with relatively intact comprehension and repetition…
Speech endpoint detection with non-language speech sounds for generic speech processing applications
NASA Astrophysics Data System (ADS)
McClain, Matthew; Romanowski, Brian
2009-05-01
Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Stress improves selective attention towards emotionally neutral left ear stimuli.
Hoskin, Robert; Hunter, M D; Woodruff, P W R
2014-09-01
Research concerning the impact of psychological stress on visual selective attention has produced mixed results. The current paper describes two experiments which utilise a novel auditory oddball paradigm to test the impact of psychological stress on auditory selective attention. Participants had to report the location of emotionally-neutral auditory stimuli, while ignoring task-irrelevant changes in their content. The results of the first experiment, in which speech stimuli were presented, suggested that stress improves the ability to selectively attend to left, but not right ear stimuli. When this experiment was repeated using tonal stimuli the same result was evident, but only for female participants. Females were also found to experience greater levels of distraction in general across the two experiments. These findings support the goal-shielding theory which suggests that stress improves selective attention by reducing the attentional resources available to process task-irrelevant information. The study also demonstrates, for the first time, that this goal-shielding effect extends to auditory perception. Copyright © 2014 Elsevier B.V. All rights reserved.
The influence of speech rate and accent on access and use of semantic information.
Sajin, Stanislav M; Connine, Cynthia M
2017-04-01
Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.
Borrie, Stephanie A
2015-03-01
This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Learning from human cadaveric prosections: Examining anxiety in speech therapy students.
Criado-Álvarez, Juan Jose; González González, Jaime; Romo Barrientos, Carmen; Ubeda-Bañon, Isabel; Saiz-Sanchez, Daniel; Flores-Cuadrado, Alicia; Albertos-Marco, Juan Carlos; Martinez-Marcos, Alino; Mohedano-Moriano, Alicia
2017-09-01
Human anatomy education often utilizes the essential practices of cadaver dissection and examination of prosected specimens. However, these exposures to human cadavers and confronting death can be stressful and anxiety-inducing for students. This study aims to understand the attitudes, reactions, fears, and states of anxiety that speech therapy students experience in the dissection room. To that end, a before-and-after cross-sectional analysis was conducted with speech therapy students undertaking a dissection course for the first time. An anonymous questionnaire was administered before and after the exercise to understand students' feelings and emotions. State-Trait Anxiety Inventory questionnaires (STAI-S and STAI-T) were used to evaluate anxiety levels. The results of the study revealed that baseline anxiety levels measured using the STAI-T remained stable and unchanged during the dissection room experience (P > 0.05). Levels of emotional anxiety measured using the STAI-S decreased, from 15.3 to 11.1 points (P < 0.05). In the initial phase of the study, before any contact with the dissection room environment, 17% of students experienced anxiety, and this rate remained unchanged by end of the session (P > 0.05). A total of 63.4% of students described having thoughts about life and death. After the session, 100% of students recommended the dissection exercise, giving it a mean score of 9.1/10 points. Anatomy is an important subject for students in the health sciences, and dissection and prosection exercises frequently involve a series of uncomfortable and stressful experiences. Experiences in the dissection room may challenge some students' emotional equilibria. However, students consider the exercise to be very useful in their education and recommend it. Anat Sci Educ 10: 487-494. © 2017 American Association of Anatomists. © 2017 American Association of Anatomists.
Rhythmic Priming Enhances the Phonological Processing of Speech
ERIC Educational Resources Information Center
Cason, Nia; Schon, Daniele
2012-01-01
While natural speech does not possess the same degree of temporal regularity found in music, there is recent evidence to suggest that temporal regularity enhances speech processing. The aim of this experiment was to examine whether speech processing would be enhanced by the prior presentation of a rhythmical prime. We recorded electrophysiological…
Visual and Auditory Input in Second-Language Speech Processing
ERIC Educational Resources Information Center
Hardison, Debra M.
2010-01-01
The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on…
NASA Astrophysics Data System (ADS)
Li, Ji; Ren, Fuji
Weblogs have greatly changed the communication ways of mankind. Affective analysis of blog posts is found valuable for many applications such as text-to-speech synthesis or computer-assisted recommendation. Traditional emotion recognition in text based on single-label classification can not satisfy higher requirements of affective computing. In this paper, the automatic identification of sentence emotion in weblogs is modeled as a multi-label text categorization task. Experiments are carried out on 12273 blog sentences from the Chinese emotion corpus Ren_CECps with 8-dimension emotion annotation. An ensemble algorithm RAKEL is used to recognize dominant emotions from the writer's perspective. Our emotion feature using detailed intensity representation for word emotions outperforms the other main features such as the word frequency feature and the traditional lexicon-based feature. In order to deal with relatively complex sentences, we integrate grammatical characteristics of punctuations, disjunctive connectives, modification relations and negation into features. It achieves 13.51% and 12.49% increases for Micro-averaged F1 and Macro-averaged F1 respectively compared to the traditional lexicon-based feature. Result shows that multiple-dimension emotion representation with grammatical features can efficiently classify sentence emotion in a multi-label problem.
Stoppelman, Nadav; Harpaz, Tamar; Ben-Shachar, Michal
2013-05-01
Speech processing engages multiple cortical regions in the temporal, parietal, and frontal lobes. Isolating speech-sensitive cortex in individual participants is of major clinical and scientific importance. This task is complicated by the fact that responses to sensory and linguistic aspects of speech are tightly packed within the posterior superior temporal cortex. In functional magnetic resonance imaging (fMRI), various baseline conditions are typically used in order to isolate speech-specific from basic auditory responses. Using a short, continuous sampling paradigm, we show that reversed ("backward") speech, a commonly used auditory baseline for speech processing, removes much of the speech responses in frontal and temporal language regions of adult individuals. On the other hand, signal correlated noise (SCN) serves as an effective baseline for removing primary auditory responses while maintaining strong signals in the same language regions. We show that the response to reversed speech in left inferior frontal gyrus decays significantly faster than the response to speech, thus suggesting that this response reflects bottom-up activation of speech analysis followed up by top-down attenuation once the signal is classified as nonspeech. The results overall favor SCN as an auditory baseline for speech processing.
Are precues effective in proactively controlling taboo interference during speech production?
White, Katherine K; Abrams, Lise; Hsi, Lisa R; Watkins, Emily C
2018-02-07
This research investigated whether precues engage proactive control to reduce emotional interference during speech production. A picture-word interference task required participants to name target pictures accompanied by taboo, negative, or neutral distractors. Proactive control was manipulated by presenting precues that signalled the type of distractor that would appear on the next trial. Experiment 1 included one block of trials with precues and one without, whereas Experiment 2 mixed precued and uncued trials. Consistent with previous research, picture naming was slowed in both experiments when distractors were taboo or negative compared to neutral, with the greatest slowing effect when distractors were taboo. Evidence that precues engaged proactive control to reduce interference from taboo (but not negative) distractors was found in Experiment 1. In contrast, mixing precued trials in Experiment 2 resulted in no taboo cueing benefit. These results suggest that item-level proactive control can be engaged under certain conditions to reduce taboo interference during speech production, findings that help to refine a role for cognitive control of distraction during speech production.
BP reactivity to public speaking in stage 1 hypertension: influence of different task scenarios.
Palatini, Paolo; Bratti, Paolo; Palomba, Daniela; Bonso, Elisa; Saladini, Francesca; Benetti, Elisabetta; Casiglia, Edoardo
2011-10-01
To investigate the blood pressure (BP) reaction to public speaking performed according to different emotionally distressing scenarios in stage 1 hypertension. METHODS. We assessed 64 hypertensive and 30 normotensive subjects. They performed three speech tasks with neutral, anger and anxiety scenarios. BP was assessed with the Finometer beat-to-beat non-invasive recording system throughout the test procedure. For all types of speech, the systolic BP response was greater in the hypertensive than the normotensive subjects (all p < 0.001). At repeated-measures analysis of covariate (R-M ANCOVA), a significant group-by-time interaction was found for all scenarios (p ≤ 0.001). For the diastolic BP response, the between-group difference was significant for the task with anxiety scenario (p < 0.05). At R-M ANCOVA, a group-by-time interaction of borderline statistical significance was found for the speech with anxiety content (p = 0.053) but not for the speeches with neutral or anger content. Within the hypertensive group, the diastolic BP increments during the speeches with anxiety and anger scenarios were greater than those during the speech with neutral scenario (both p < 0.001). These data indicate that reactivity to public speaking is increased in stage 1 hypertension. A speech with anxiety or anger scenario elicits a greater diastolic BP reaction than tasks with neutral content.
Mapping and Manipulating Facial Expression
ERIC Educational Resources Information Center
Theobald, Barry-John; Matthews, Iain; Mangini, Michael; Spies, Jeffrey R.; Brick, Timothy R.; Cohn, Jeffrey F.; Boker, Steven M.
2009-01-01
Nonverbal visual cues accompany speech to supplement the meaning of spoken words, signify emotional state, indicate position in discourse, and provide back-channel feedback. This visual information includes head movements, facial expressions and body gestures. In this article we describe techniques for manipulating both verbal and nonverbal facial…
Exceptional Pupils. Special Education Bulletin Number 1.
ERIC Educational Resources Information Center
Indiana State Dept. of Public Instruction, Indianapolis. Div. of Special Education.
An introduction to exceptional children precedes a discussion of each of the following areas of exceptionality; giftedness, mental retardation, physical handicaps and special health problems, blindness and partial vision, aural handicaps, speech handicaps, emotional disturbance, and learning disabilities. Each chapter is followed by a bibliography…
PRISE Reporter. Volume 12, 1980-81.
ERIC Educational Resources Information Center
PRISE Reporter, 1981
1981-01-01
The document consists of six issues of the "PRISE (Pennsylvania Resources and Information Center for Special Education) Reporter" which cover issues and happenings in the education of the mentally retarded, learning disabled, emotionally disturbed, physically handicapped, visually handicapped, and speech/hearing impaired. Lead articles include the…
Neuroscience and the fallacies of functionalism.
Reddy, William M
2010-01-01
Smail's "On Deep History and the Brain" is rightly critical of the functionalist fallacies that have plagued evolutionary theory, sociobiology, and evolutionary psychology. However, his attempt to improve on these efforts relies on functional explanations that themselves oversimplify the lessons of neuroscience. In addition, like explanations in evolutionary psychology, they are highly speculative and cannot be confirmed or disproved by evidence. Neuroscience research is too diverse to yield a single picture of brain functioning. Some recent developments in neuroscience research, however, do suggest that cognitive processing provides a kind of “operating system” that can support a great diversity of cultural material. These developments include evidence of “top-down” processing in motor control, in visual processing, in speech recognition, and in “emotion regulation.” The constraints that such a system may place on cultural learning and transmission are worth investigating. At the same time, historians are well advised to remain wary of the pitfalls of functionalism.
Prosody production networks are modulated by sensory cues and social context.
Klasen, Martin; von Marschall, Clara; Isman, Güldehen; Zvyagintsev, Mikhail; Gur, Ruben C; Mathiak, Klaus
2018-03-05
The neurobiology of emotional prosody production is not well investigated. In particular, the effects of cues and social context are not known. The present study sought to differentiate cued from free emotion generation and the effect of social feedback from a human listener. Online speech filtering enabled fMRI during prosodic communication in 30 participants. Emotional vocalizations were a) free, b) auditorily cued, c) visually cued, or d) with interactive feedback. In addition to distributed language networks, cued emotions increased activity in auditory and - in case of visual stimuli - visual cortex. Responses were larger in pSTG at the right hemisphere and the ventral striatum when participants were listened to and received feedback from the experimenter. Sensory, language, and reward networks contributed to prosody production and were modulated by cues and social context. The right pSTG is a central hub for communication in social interactions - in particular for interpersonal evaluation of vocal emotions.
Cortical oscillations and entrainment in speech processing during working memory load.
Hjortkjaer, Jens; Märcher-Rørsted, Jonatan; Fuglsang, Søren A; Dau, Torsten
2018-02-02
Neuronal oscillations are thought to play an important role in working memory (WM) and speech processing. Listening to speech in real-life situations is often cognitively demanding but it is unknown whether WM load influences how auditory cortical activity synchronizes to speech features. Here, we developed an auditory n-back paradigm to investigate cortical entrainment to speech envelope fluctuations under different degrees of WM load. We measured the electroencephalogram, pupil dilations and behavioural performance from 22 subjects listening to continuous speech with an embedded n-back task. The speech stimuli consisted of long spoken number sequences created to match natural speech in terms of sentence intonation, syllabic rate and phonetic content. To burden different WM functions during speech processing, listeners performed an n-back task on the speech sequences in different levels of background noise. Increasing WM load at higher n-back levels was associated with a decrease in posterior alpha power as well as increased pupil dilations. Frontal theta power increased at the start of the trial and increased additionally with higher n-back level. The observed alpha-theta power changes are consistent with visual n-back paradigms suggesting general oscillatory correlates of WM processing load. Speech entrainment was measured as a linear mapping between the envelope of the speech signal and low-frequency cortical activity (< 13 Hz). We found that increases in both types of WM load (background noise and n-back level) decreased cortical speech envelope entrainment. Although entrainment persisted under high load, our results suggest a top-down influence of WM processing on cortical speech entrainment. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
The influence of (central) auditory processing disorder in speech sound disorders.
Barrozo, Tatiane Faria; Pagan-Neves, Luciana de Oliveira; Vilela, Nadia; Carvallo, Renata Mota Mamede; Wertzner, Haydée Fiszbein
2016-01-01
Considering the importance of auditory information for the acquisition and organization of phonological rules, the assessment of (central) auditory processing contributes to both the diagnosis and targeting of speech therapy in children with speech sound disorders. To study phonological measures and (central) auditory processing of children with speech sound disorder. Clinical and experimental study, with 21 subjects with speech sound disorder aged between 7.0 and 9.11 years, divided into two groups according to their (central) auditory processing disorder. The assessment comprised tests of phonology, speech inconsistency, and metalinguistic abilities. The group with (central) auditory processing disorder demonstrated greater severity of speech sound disorder. The cutoff value obtained for the process density index was the one that best characterized the occurrence of phonological processes for children above 7 years of age. The comparison among the tests evaluated between the two groups showed differences in some phonological and metalinguistic abilities. Children with an index value above 0.54 demonstrated strong tendencies towards presenting a (central) auditory processing disorder, and this measure was effective to indicate the need for evaluation in children with speech sound disorder. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Perceiving emotion: towards a realistic understanding of the task.
Cowie, Roddy
2009-12-12
A decade ago, perceiving emotion was generally equated with taking a sample (a still photograph or a few seconds of speech) that unquestionably signified an archetypal emotional state, and attaching the appropriate label. Computational research has shifted that paradigm in multiple ways. Concern with realism is key. Emotion generally colours ongoing action and interaction: describing that colouring is a different problem from categorizing brief episodes of relatively pure emotion. Multiple challenges flow from that. Describing emotional colouring is a challenge in itself. One approach is to use everyday categories describing states that are partly emotional and partly cognitive. Another approach is to use dimensions. Both approaches need ways to deal with gradual changes over time and mixed emotions. Attaching target descriptions to a sample poses problems of both procedure and validation. Cues are likely to be distributed both in time and across modalities, and key decisions may depend heavily on context. The usefulness of acted data is limited because it tends not to reproduce these features. By engaging with these challenging issues, research is not only achieving impressive results, but also offering a much deeper understanding of the problem.
Studies in automatic speech recognition and its application in aerospace
NASA Astrophysics Data System (ADS)
Taylor, Michael Robinson
Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.
Gergely, Anna; Faragó, Tamás; Galambos, Ágoston; Topál, József
2017-10-23
There is growing evidence that dog-directed and infant-directed speech have similar acoustic characteristics, like high overall pitch, wide pitch range, and attention-getting devices. However, it is still unclear whether dog- and infant-directed speech have gender or context-dependent acoustic features. In the present study, we collected comparable infant-, dog-, and adult directed speech samples (IDS, DDS, and ADS) in four different speech situations (Storytelling, Task solving, Teaching, and Fixed sentences situations); we obtained the samples from parents whose infants were younger than 30 months of age and also had pet dog at home. We found that ADS was different from IDS and DDS, independently of the speakers' gender and the given situation. Higher overall pitch in DDS than in IDS during free situations was also found. Our results show that both parents hyperarticulate their vowels when talking to children but not when addressing dogs: this result is consistent with the goal of hyperspeech in language tutoring. Mothers, however, exaggerate their vowels for their infants under 18 months more than fathers do. Our findings suggest that IDS and DDS have context-dependent features and support the notion that people adapt their prosodic features to the acoustic preferences and emotional needs of their audience.
ERIC Educational Resources Information Center
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
2017-01-01
Purpose: Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method: PM and other…
Auditory-Motor Processing of Speech Sounds
Möttönen, Riikka; Dutton, Rebekah; Watkins, Kate E.
2013-01-01
The motor regions that control movements of the articulators activate during listening to speech and contribute to performance in demanding speech recognition and discrimination tasks. Whether the articulatory motor cortex modulates auditory processing of speech sounds is unknown. Here, we aimed to determine whether the articulatory motor cortex affects the auditory mechanisms underlying discrimination of speech sounds in the absence of demanding speech tasks. Using electroencephalography, we recorded responses to changes in sound sequences, while participants watched a silent video. We also disrupted the lip or the hand representation in left motor cortex using transcranial magnetic stimulation. Disruption of the lip representation suppressed responses to changes in speech sounds, but not piano tones. In contrast, disruption of the hand representation had no effect on responses to changes in speech sounds. These findings show that disruptions within, but not outside, the articulatory motor cortex impair automatic auditory discrimination of speech sounds. The findings provide evidence for the importance of auditory-motor processes in efficient neural analysis of speech sounds. PMID:22581846
Kirsch, Julie A; Lehman, Barbara J
2015-12-01
Previous research suggests that in contrast to invisible social support, visible social support produces exaggerated negative emotional responses. Drawing on work by Bolger and colleagues, this study disentangled social support visibility from negative social evaluation in an examination of the effects of social support on negative emotions and cardiovascular responses. As part of an anticipatory speech task, 73 female participants were randomly assigned to receive no social support, invisible social support, non-confounded visible social support or visible social support as delivered in a 2007 study by Bolger and Amarel. Twelve readings, each for systolic blood pressure, diastolic blood pressure and heart rate were taken at 5-min intervals throughout the periods of baseline, reactivity and recovery. Cardiovascular outcomes were tested by incorporating a series of theoretically driven planned contrasts into tests of stress reactivity conducted through piecewise growth curve modelling. Linear and quadratic trends established cardiovascular reactivity to the task. Further, in comparison to the control and replication conditions, the non-confounded visible and invisible social support conditions attenuated cardiovascular reactivity over time. Pre- and post-speech negative emotional responses were not affected by the social support manipulations. These results suggest that appropriately delivered visible social support may be as beneficial as invisible social support. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Jelinek, H. J.
1986-01-01
This is the Final Report of Electronic Design Associates on its Phase I SBIR project. The purpose of this project is to develop a method for correcting helium speech, as experienced in diver-surface communication. The goal of the Phase I study was to design, prototype, and evaluate a real time helium speech corrector system based upon digital signal processing techniques. The general approach was to develop hardware (an IBM PC board) to digitize helium speech and software (a LAMBDA computer based simulation) to translate the speech. As planned in the study proposal, this initial prototype may now be used to assess expected performance from a self contained real time system which uses an identical algorithm. The Final Report details the work carried out to produce the prototype system. Four major project tasks were: a signal processing scheme for converting helium speech to normal sounding speech was generated. The signal processing scheme was simulated on a general purpose (LAMDA) computer. Actual helium speech was supplied to the simulation and the converted speech was generated. An IBM-PC based 14 bit data Input/Output board was designed and built. A bibliography of references on speech processing was generated.
Processing changes when listening to foreign-accented speech
Romero-Rivas, Carlos; Martin, Clara D.; Costa, Albert
2015-01-01
This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker. PMID:25859209
The Hierarchical Cortical Organization of Human Speech Processing
de Heer, Wendy A.; Huth, Alexander G.; Griffiths, Thomas L.
2017-01-01
Speech comprehension requires that the brain extract semantic meaning from the spectral features represented at the cochlea. To investigate this process, we performed an fMRI experiment in which five men and two women passively listened to several hours of natural narrative speech. We then used voxelwise modeling to predict BOLD responses based on three different feature spaces that represent the spectral, articulatory, and semantic properties of speech. The amount of variance explained by each feature space was then assessed using a separate validation dataset. Because some responses might be explained equally well by more than one feature space, we used a variance partitioning analysis to determine the fraction of the variance that was uniquely explained by each feature space. Consistent with previous studies, we found that speech comprehension involves hierarchical representations starting in primary auditory areas and moving laterally on the temporal lobe: spectral features are found in the core of A1, mixtures of spectral and articulatory in STG, mixtures of articulatory and semantic in STS, and semantic in STS and beyond. Our data also show that both hemispheres are equally and actively involved in speech perception and interpretation. Further, responses as early in the auditory hierarchy as in STS are more correlated with semantic than spectral representations. These results illustrate the importance of using natural speech in neurolinguistic research. Our methodology also provides an efficient way to simultaneously test multiple specific hypotheses about the representations of speech without using block designs and segmented or synthetic speech. SIGNIFICANCE STATEMENT To investigate the processing steps performed by the human brain to transform natural speech sound into meaningful language, we used models based on a hierarchical set of speech features to predict BOLD responses of individual voxels recorded in an fMRI experiment while subjects listened to natural speech. Both cerebral hemispheres were actively involved in speech processing in large and equal amounts. Also, the transformation from spectral features to semantic elements occurs early in the cortical speech-processing stream. Our experimental and analytical approaches are important alternatives and complements to standard approaches that use segmented speech and block designs, which report more laterality in speech processing and associated semantic processing to higher levels of cortex than reported here. PMID:28588065
Barista: A Framework for Concurrent Speech Processing by USC-SAIL
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G.; Narayanan, Shrikanth S.
2016-01-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0. PMID:27610047
Barista: A Framework for Concurrent Speech Processing by USC-SAIL.
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G; Narayanan, Shrikanth S
2014-05-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0.
Beyond Early Intervention: Providing Support to Public School Personnel
ERIC Educational Resources Information Center
Wilson, Kathryn
2006-01-01
At age 3, children with hearing loss transition from Part C early intervention to Part B public school services. These children represent a heterogeneous population when considering factors such as communication approaches; speech, language, auditory and cognitive skills; social-emotional and motor development; parental involvement; hearing…
Coaching Athletes with Hidden Disabilities: Recommendations and Strategies for Coaching Education
ERIC Educational Resources Information Center
Vargas, Tiffanye; Flores, Margaret; Beyer, Robbi
2012-01-01
Hidden disabilities (HD) are those disabilities not readily apparent to the naked eye including specific learning disabilities, attention deficit hyperactivity disorder, emotional behavioral disorders, mild intellectual disabilities, and speech or language disabilities. Young athletes with HD may have difficulty listening to and following…
NASA Astrophysics Data System (ADS)
Campo, D.; Quintero, O. L.; Bastidas, M.
2016-04-01
We propose a study of the mathematical properties of voice as an audio signal. This work includes signals in which the channel conditions are not ideal for emotion recognition. Multiresolution analysis- discrete wavelet transform - was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states. ANNs proved to be a system that allows an appropriate classification of such states. This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features. Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify.
Neural pathways for visual speech perception
Bernstein, Lynne E.; Liebenthal, Einat
2014-01-01
This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611
Toni, Ivan; Hagoort, Peter; Kelly, Spencer D.; Özyürek, Aslı
2015-01-01
Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture. Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech–gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts. PMID:24652857
Terband, H; Maassen, B; Guenther, F H; Brumberg, J
2014-01-01
Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. The reader will be able to: (1) identify the difficulties in studying disordered speech motor development; (2) describe the differences in speech motor characteristics between SSD and subtype CAS; (3) describe the different types of learning that occur in the sensory-motor system during babbling and early speech acquisition; (4) identify the neural control subsystems involved in speech production; (5) describe the potential role of auditory self-monitoring in developmental speech disorders. Copyright © 2014 Elsevier Inc. All rights reserved.
Characterizing resonant component in speech: A different view of tracking fundamental frequency
NASA Astrophysics Data System (ADS)
Dong, Bin
2017-05-01
Inspired by the nonlinearity and nonstationarity and the modulations in speech, Hilbert-Huang Transform and cyclostationarity analysis are employed to investigate the speech resonance in vowel in sequence. Cyclostationarity analysis is not directly manipulated on the target vowel, but on its intrinsic mode functions one by one. Thanks to the equivalence between the fundamental frequency in speech and the cyclic frequency in cyclostationarity analysis, the modulation intensity distributions of the intrinsic mode functions provide much information for the estimation of the fundamental frequency. To highlight the relationship between frequency and time, the pseudo-Hilbert spectrum is proposed to replace the Hilbert spectrum here. After contrasting the pseudo-Hilbert spectra of and the modulation intensity distributions of the intrinsic mode functions, it finds that there is usually one intrinsic mode function which works as the fundamental component of the vowel. Furthermore, the fundamental frequency of the vowel can be determined by tracing the pseudo-Hilbert spectrum of its fundamental component along the time axis. The later method is more robust to estimate the fundamental frequency, when meeting nonlinear components. Two vowels [a] and [i], picked up from a speech database FAU Aibo Emotion Corpus, are applied to validate the above findings.
Origins of the stuttering stereotype: stereotype formation through anchoring-adjustment.
MacKinnon, Sean P; Hall, Shera; Macintyre, Peter D
2007-01-01
The stereotype of people who stutter is predominantly negative, holding that stutterers are excessively nervous, anxious, and reserved. The anchoring-adjustment hypothesis suggests that the stereotype of stuttering arises from a process of first anchoring the stereotype in personal feelings during times of normal speech disfluency, and then adjusting based on a rapid heuristic judgment. The current research sought to test this hypothesis, elaborating on previous research by [White, P. A., & Collins, S. R. (1984). Stereotype formation by inference: A possible explanation for the "stutterer" stereotype. Journal of Speech and Hearing Research, 27, 567-570]. Participants provided ratings of a hypothetical typical person who stutters, a person suffering from normal speech disfluency and a typical male on a 25-item semantic differential scale. Results showed a stereotype of people who stutter similar to that found in previous research. The pattern of results is consistent with the anchoring-adjustment hypothesis. Ratings of a male stutterer are very similar to a male experiencing temporary disfluency, both of which differ from ratings of a typical male. As expected, ratings of a stutterer show a small but statistically significant adjustment on several traits that makes the stereotype of stutterers less negative and less emotionally extreme than the temporarily disfluent male. Based on the results of this research, it appears that stereotype formation is a result of generalization and adjustment from personal experience during normal speech disfluency. The reader will be able to: (1) explain how the negative stereotype of people who stutter arises; (2) discuss the negative implications of stereotypes in the lives of people who stutter; and (3) summarize why the stereotype of people who stutter is so consistent and resistant to change.
Vandewalle, Ellen; Boets, Bart; Ghesquière, Pol; Zink, Inge
2012-01-01
This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay (n = 8), (2) children with SLI and normal literacy (n = 10) and (3) typically developing children (n = 14). Moreover, the relations between these auditory processing and speech perception skills and oral language and literacy skills in grade 1 and grade 3 were analyzed. The SLI group with literacy delay scored significantly lower than both other groups on speech perception, but not on temporal auditory processing. Both normal reading groups did not differ in terms of speech perception or auditory processing. Speech perception was significantly related to reading and spelling in grades 1 and 3 and had a unique predictive contribution to reading growth in grade 3, even after controlling reading level, phonological ability, auditory processing and oral language skills in grade 1. These findings indicated that speech perception also had a unique direct impact upon reading development and not only through its relation with phonological awareness. Moreover, speech perception seemed to be more associated with the development of literacy skills and less with oral language ability. Copyright © 2011 Elsevier Ltd. All rights reserved.
Expressed emotion displayed by the mothers of inhibited and uninhibited preschool-aged children.
Raishevich, Natoshia; Kennedy, Susan J; Rapee, Ronald M
2010-01-01
In the current study, the Five Minute Speech Sample was used to assess the association between parent attitudes and children's behavioral inhibition in mothers of 120 behaviorally inhibited (BI) and 37 behaviorally uninhibited preschool-aged children. Mothers of BI children demonstrated significantly higher levels of emotional over-involvement (EOI) and self-sacrificing/overprotective behavior (SS/OP). However, there was no significant relationship between inhibition status and maternal criticism. Multiple regression also indicated that child temperament, but not maternal anxiety, was a significant predictor of both EOI and SS/OP.
Labby, Alex; Mace, Jess C; Buncke, Michelle; MacArthur, Carol J
2016-09-01
To evaluate quality-of-life changes after bilateral pressure equalization tube placement with or without adenoidectomy for the treatment of chronic otitis media with effusion or recurrent acute otitis media in a pediatric Down syndrome population compared to controls. Prospective case-control observational study. The OM Outcome Survey (OMO-22) was administered to both patients with Down syndrome and controls before bilateral tube placement with or without adenoidectomy and at an average of 6-7 months postoperatively. Thirty-one patients with Down syndrome and 34 controls were recruited. Both pre-operative and post-operative between-group and within-group score comparisons were conducted for the Physical, Hearing/Balance, Speech, Emotional, and Social domains of the OMO-22. Both groups experienced improvement of mean symptom scores post-operatively. Patients with Down syndrome reported significant post-operative improvement in mean Physical and Hearing domain item scores while control patients reported significant improvement in Physical, Hearing, and Emotional domain item scores. All four symptom scores in the Speech domain, both pre-operatively and post-operatively, were significantly worse for Down syndrome patients compared to controls (p ≤ 0.008). Surgical placement of pressure equalizing tubes results in significant quality of life improvements in patients with Down syndrome and controls. Problems related to speech and balance are reported at a higher rate and persist despite intervention in the Down syndrome population. It is possible that longer follow up periods and/or more sensitive tools are required to measure speech improvements in the Down syndrome population after pressure equalizing tube placement ± adenoidectomy. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.