emotional speech database: Topics by Science.gov

Sample records for emotional speech database

Recognizing emotional speech in Persian: a validated database of Persian emotional speech (Persian ESD).

PubMed

Keshtiari, Niloofar; Kuhlmann, Michael; Eslami, Moharram; Klann-Delius, Gisela

2015-03-01

Research on emotional speech often requires valid stimuli for assessing perceived emotion through prosody and lexical content. To date, no comprehensive emotional speech database for Persian is officially available. The present article reports the process of designing, compiling, and evaluating a comprehensive emotional speech database for colloquial Persian. The database contains a set of 90 validated novel Persian sentences classified in five basic emotional categories (anger, disgust, fear, happiness, and sadness), as well as a neutral category. These sentences were validated in two experiments by a group of 1,126 native Persian speakers. The sentences were articulated by two native Persian speakers (one male, one female) in three conditions: (1) congruent (emotional lexical content articulated in a congruent emotional voice), (2) incongruent (neutral sentences articulated in an emotional voice), and (3) baseline (all emotional and neutral sentences articulated in neutral voice). The speech materials comprise about 470 sentences. The validity of the database was evaluated by a group of 34 native speakers in a perception test. Utterances recognized better than five times chance performance (71.4 %) were regarded as valid portrayals of the target emotions. Acoustic analysis of the valid emotional utterances revealed differences in pitch, intensity, and duration, attributes that may help listeners to correctly classify the intended emotion. The database is designed to be used as a reliable material source (for both text and speech) in future cross-cultural or cross-linguistic studies of emotional speech, and it is available for academic research purposes free of charge. To access the database, please contact the first author.
One approach to design of speech emotion database

NASA Astrophysics Data System (ADS)

Uhrin, Dominik; Chmelikova, Zdenka; Tovarek, Jaromir; Partila, Pavol; Voznak, Miroslav

2016-05-01

This article describes a system for evaluating the credibility of recordings with emotional character. Sound recordings form Czech language database for training and testing systems of speech emotion recognition. These systems are designed to detect human emotions in his voice. The emotional state of man is useful in the security forces and emergency call service. Man in action (soldier, police officer and firefighter) is often exposed to stress. Information about the emotional state (his voice) will help to dispatch to adapt control commands for procedure intervention. Call agents of emergency call service must recognize the mental state of the caller to adjust the mood of the conversation. In this case, the evaluation of the psychological state is the key factor for successful intervention. A quality database of sound recordings is essential for the creation of the mentioned systems. There are quality databases such as Berlin Database of Emotional Speech or Humaine. The actors have created these databases in an audio studio. It means that the recordings contain simulated emotions, not real. Our research aims at creating a database of the Czech emotional recordings of real human speech. Collecting sound samples to the database is only one of the tasks. Another one, no less important, is to evaluate the significance of recordings from the perspective of emotional states. The design of a methodology for evaluating emotional recordings credibility is described in this article. The results describe the advantages and applicability of the developed method.
Emotion recognition from speech: tools and challenges

NASA Astrophysics Data System (ADS)

Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.

2015-05-01

Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.
Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

PubMed Central

Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai

2017-01-01

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705
Random Deep Belief Networks for Recognizing Emotions from Speech Signals.

PubMed

Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

2017-01-01

Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
Random Deep Belief Networks for Recognizing Emotions from Speech Signals

PubMed Central

Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

2017-01-01

Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition. PMID:28356908
Emotion to emotion speech conversion in phoneme level

NASA Astrophysics Data System (ADS)

Bulut, Murtaza; Yildirim, Serdar; Busso, Carlos; Lee, Chul Min; Kazemzadeh, Ebrahim; Lee, Sungbok; Narayanan, Shrikanth

2004-10-01

Having an ability to synthesize emotional speech can make human-machine interaction more natural in spoken dialogue management. This study investigates the effectiveness of prosodic and spectral modification in phoneme level on emotion-to-emotion speech conversion. The prosody modification is performed with the TD-PSOLA algorithm (Moulines and Charpentier, 1990). We also transform the spectral envelopes of source phonemes to match those of target phonemes using LPC-based spectral transformation approach (Kain, 2001). Prosodic speech parameters (F0, duration, and energy) for target phonemes are estimated from the statistics obtained from the analysis of an emotional speech database of happy, angry, sad, and neutral utterances collected from actors. Listening experiments conducted with native American English speakers indicate that the modification of prosody only or spectrum only is not sufficient to elicit targeted emotions. The simultaneous modification of both prosody and spectrum results in higher acceptance rates of target emotions, suggesting that not only modeling speech prosody but also modeling spectral patterns that reflect underlying speech articulations are equally important to synthesize emotional speech with good quality. We are investigating suprasegmental level modifications for further improvement in speech quality and expressiveness.
Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

PubMed Central

Caballero-Morales, Santiago-Omar

2013-01-01

An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English

PubMed Central

Russo, Frank A.

2018-01-01

The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976. PMID:29768426
Speaker emotion recognition: from classical classifiers to deep neural networks

NASA Astrophysics Data System (ADS)

Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

2018-04-01

Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.
Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

PubMed Central

Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

2015-01-01

In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141
Memristive Computational Architecture of an Echo State Network for Real-Time Speech Emotion Recognition

DTIC Science & Technology

2015-05-28

recognition is simpler and requires less computational resources compared to other inputs such as facial expressions . The Berlin database of Emotional ...Processing Magazine, IEEE, vol. 18, no. 1, pp. 32– 80, 2001. [15] K. R. Scherer, T. Johnstone, and G. Klasmeyer, “Vocal expression of emotion ...Network for Real-Time Speech- Emotion Recognition 5a. CONTRACT NUMBER IN-HOUSE 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 62788F 6. AUTHOR(S) Q
Study of wavelet packet energy entropy for emotion classification in speech and glottal signals

NASA Astrophysics Data System (ADS)

He, Ling; Lech, Margaret; Zhang, Jing; Ren, Xiaomei; Deng, Lihua

2013-07-01

The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).
Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition.

PubMed

Wang, Kun-Ching

2015-01-14

The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

PubMed

Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado

2016-01-01

Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).
Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

PubMed Central

Wang, Kun-Ching

2015-01-01

The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590
Evaluating deep learning architectures for Speech Emotion Recognition.

PubMed

Fayek, Haytham M; Lech, Margaret; Cavedon, Lawrence

2017-08-01

Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances. Copyright © 2017 Elsevier Ltd. All rights reserved.
The development of the Athens Emotional States Inventory (AESI): collection, validation and automatic processing of emotionally loaded sentences.

PubMed

Chaspari, Theodora; Soldatos, Constantin; Maragos, Petros

2015-01-01

The development of ecologically valid procedures for collecting reliable and unbiased emotional data towards computer interfaces with social and affective intelligence targeting patients with mental disorders. Following its development, presented with, the Athens Emotional States Inventory (AESI) proposes the design, recording and validation of an audiovisual database for five emotional states: anger, fear, joy, sadness and neutral. The items of the AESI consist of sentences each having content indicative of the corresponding emotion. Emotional content was assessed through a survey of 40 young participants with a questionnaire following the Latin square design. The emotional sentences that were correctly identified by 85% of the participants were recorded in a soundproof room with microphones and cameras. A preliminary validation of AESI is performed through automatic emotion recognition experiments from speech. The resulting database contains 696 recorded utterances in Greek language by 20 native speakers and has a total duration of approximately 28 min. Speech classification results yield accuracy up to 75.15% for automatically recognizing the emotions in AESI. These results indicate the usefulness of our approach for collecting emotional data with reliable content, balanced across classes and with reduced environmental variability.
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?

PubMed Central

Martinelli, Eugenio; Mencattini, Arianna; Di Natale, Corrado

2016-01-01

Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present ‘intelligent personal assistants’, and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants’ emotional state, selective/differential data collection based on emotional content, etc.). PMID:27563724
Sound Processing Features for Speaker-Dependent and Phrase-Independent Emotion Recognition in Berlin Database

NASA Astrophysics Data System (ADS)

Anagnostopoulos, Christos Nikolaos; Vovoli, Eftichia

An emotion recognition framework based on sound processing could improve services in human-computer interaction. Various quantitative speech features obtained from sound processing of acting speech were tested, as to whether they are sufficient or not to discriminate between seven emotions. Multilayered perceptrons were trained to classify gender and emotions on the basis of a 24-input vector, which provide information about the prosody of the speaker over the entire sentence using statistics of sound features. Several experiments were performed and the results were presented analytically. Emotion recognition was successful when speakers and utterances were “known” to the classifier. However, severe misclassifications occurred during the utterance-independent framework. At least, the proposed feature vector achieved promising results for utterance-independent recognition of high- and low-arousal emotions.

The Atlanta Motor Speech Disorders Corpus: Motivation, Development, and Utility.

PubMed

Laures-Gore, Jacqueline; Russell, Scott; Patel, Rupal; Frankel, Michael

2016-01-01

This paper describes the design and collection of a comprehensive spoken language dataset from speakers with motor speech disorders in Atlanta, Ga., USA. This collaborative project aimed to gather a spoken database consisting of nonmainstream American English speakers residing in the Southeastern US in order to provide a more diverse perspective of motor speech disorders. Ninety-nine adults with an acquired neurogenic disorder resulting in a motor speech disorder were recruited. Stimuli include isolated vowels, single words, sentences with contrastive focus, sentences with emotional content and prosody, sentences with acoustic and perceptual sensitivity to motor speech disorders, as well as 'The Caterpillar' and 'The Grandfather' passages. Utility of this data in understanding the potential interplay of dialect and dysarthria was demonstrated with a subset of the speech samples existing in the database. The Atlanta Motor Speech Disorders Corpus will enrich our understanding of motor speech disorders through the examination of speech from a diverse group of speakers. © 2016 S. Karger AG, Basel.
Speech Processing in Realistic Battlefield Environments (Le Traitement de la Parole en Environnement de Combat Realiste)

DTIC Science & Technology

2009-04-01

Available Military Speech Databases 2-2 2.3.1 FELIN Database 2-2 2.3.1.1 Overview 2-2 2.3.1.2 Technical Specifications 2-3 2.3.1.3 Limitations...emotion, confusion due to conflicting information, psychological tension, pain , and other typical conditions encountered in the modern battlefield...too, the number of possible language combinations scale with N3. It is clear that in a field of research that has only recently started and with so
Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting.

PubMed

Wöllmer, Martin; Marchi, Erik; Squartini, Stefano; Schuller, Björn

2011-09-01

Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today's automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database-a corpus containing emotionally colored conversations with a cognitive system for "Sensitive Artificial Listening".
Emotional Prosody Measurement (EPM): a voice-based evaluation method for psychological therapy effectiveness.

PubMed

van den Broek, Egon L

2004-01-01

The voice embodies three sources of information: speech, the identity, and the emotional state of the speaker (i.e., emotional prosody). The latter feature is resembled by the variability of the F0 (also named fundamental frequency of pitch) (SD F0). To extract this feature, Emotional Prosody Measurement (EPM) was developed, which consists of 1) speech recording, 2) removal of speckle noise, 3) a Fourier Transform to extract the F0-signal, and 4) the determination of SD F0. After a pilot study in which six participants mimicked emotions by their voice, the core experiment was conducted to see whether EPM is successful. Twenty-five patients suffering from a panic disorder with agoraphobia participated. Two methods (story-telling and reliving) were used to trigger anxiety and were compared with comparable but more relaxed conditions. This resulted in a unique database of speech samples that was used to compare the EPM with the Subjective Unit of Distress to validate it as measure for anxiety/stress. The experimental manipulation of anxiety proved to be successful and EPM proved to be a successful evaluation method for psychological therapy effectiveness.
Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

PubMed Central

Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

2016-01-01

Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

PubMed

Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

2018-05-01

Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Acoustic analysis of speech under stress.

PubMed

Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish

2015-01-01

When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation.
Recognizing vocal emotions in Mandarin Chinese: a validated database of Chinese vocal emotional stimuli.

PubMed

Liu, Pan; Pell, Marc D

2012-12-01

To establish a valid database of vocal emotional stimuli in Mandarin Chinese, a set of Chinese pseudosentences (i.e., semantically meaningless sentences that resembled real Chinese) were produced by four native Mandarin speakers to express seven emotional meanings: anger, disgust, fear, sadness, happiness, pleasant surprise, and neutrality. These expressions were identified by a group of native Mandarin listeners in a seven-alternative forced choice task, and items reaching a recognition rate of at least three times chance performance in the seven-choice task were selected as a valid database and then subjected to acoustic analysis. The results demonstrated expected variations in both perceptual and acoustic patterns of the seven vocal emotions in Mandarin. For instance, fear, anger, sadness, and neutrality were associated with relatively high recognition, whereas happiness, disgust, and pleasant surprise were recognized less accurately. Acoustically, anger and pleasant surprise exhibited relatively high mean f0 values and large variation in f0 and amplitude; in contrast, sadness, disgust, fear, and neutrality exhibited relatively low mean f0 values and small amplitude variations, and happiness exhibited a moderate mean f0 value and f0 variation. Emotional expressions varied systematically in speech rate and harmonics-to-noise ratio values as well. This validated database is available to the research community and will contribute to future studies of emotional prosody for a number of purposes. To access the database, please contact pan.liu@mail.mcgill.ca.
Involvement of Right STS in Audio-Visual Integration for Affective Speech Demonstrated Using MEG

PubMed Central

Hagan, Cindy C.; Woods, Will; Johnson, Sam; Green, Gary G. R.; Young, Andrew W.

2013-01-01

Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals. PMID:23950977
Involvement of right STS in audio-visual integration for affective speech demonstrated using MEG.

PubMed

Hagan, Cindy C; Woods, Will; Johnson, Sam; Green, Gary G R; Young, Andrew W

2013-01-01

Speech and emotion perception are dynamic processes in which it may be optimal to integrate synchronous signals emitted from different sources. Studies of audio-visual (AV) perception of neutrally expressed speech demonstrate supra-additive (i.e., where AV>[unimodal auditory+unimodal visual]) responses in left STS to crossmodal speech stimuli. However, emotions are often conveyed simultaneously with speech; through the voice in the form of speech prosody and through the face in the form of facial expression. Previous studies of AV nonverbal emotion integration showed a role for right (rather than left) STS. The current study therefore examined whether the integration of facial and prosodic signals of emotional speech is associated with supra-additive responses in left (cf. results for speech integration) or right (due to emotional content) STS. As emotional displays are sometimes difficult to interpret, we also examined whether supra-additive responses were affected by emotional incongruence (i.e., ambiguity). Using magnetoencephalography, we continuously recorded eighteen participants as they viewed and heard AV congruent emotional and AV incongruent emotional speech stimuli. Significant supra-additive responses were observed in right STS within the first 250 ms for emotionally incongruent and emotionally congruent AV speech stimuli, which further underscores the role of right STS in processing crossmodal emotive signals.
Generating and Describing Affective Eye Behaviors

NASA Astrophysics Data System (ADS)

Mao, Xia; Li, Zheng

The manner of a person's eye movement conveys much about nonverbal information and emotional intent beyond speech. This paper describes work on expressing emotion through eye behaviors in virtual agents based on the parameters selected from the AU-Coded facial expression database and real-time eye movement data (pupil size, blink rate and saccade). A rule-based approach to generate primary (joyful, sad, angry, afraid, disgusted and surprise) and intermediate emotions (emotions that can be represented as the mixture of two primary emotions) utilized the MPEG4 FAPs (facial animation parameters) is introduced. Meanwhile, based on our research, a scripting tool, named EEMML (Emotional Eye Movement Markup Language) that enables authors to describe and generate emotional eye movement of virtual agents, is proposed.
Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech.

PubMed

Ben-David, Boaz M; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H H M

2016-02-01

Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5 discrete emotions (anger, fear, happiness, sadness, and neutral) presented in prosody and semantics. Listeners were asked to rate the sentence as a whole, integrating both speech channels, or to focus on one channel only (prosody or semantics). We observed supremacy of congruency, failure of selective attention, and prosodic dominance. Supremacy of congruency means that a sentence that presents the same emotion in both speech channels was rated highest; failure of selective attention means that listeners were unable to selectively attend to one channel when instructed; and prosodic dominance means that prosodic information plays a larger role than semantics in processing emotional speech. Emotional prosody and semantics are separate but not separable channels, and it is difficult to perceive one without the influence of the other. Our findings indicate that the Test for Rating of Emotions in Speech can reveal specific aspects in the processing of emotional speech and may in the future prove useful for understanding emotion-processing deficits in individuals with pathologies.
Sound frequency affects speech emotion perception: results from congenital amusia

PubMed Central

Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche

2015-01-01

Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718
Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

PubMed Central

Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor

2015-01-01

In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757
Speech emotion recognition methods: A literature review

NASA Astrophysics Data System (ADS)

Basharirad, Babak; Moradhaseli, Mohammadreza

2017-10-01

Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.
On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.

PubMed

Weninger, Felix; Eyben, Florian; Schuller, Björn W; Mortillaro, Marcello; Scherer, Klaus R

2013-01-01

WITHOUT DOUBT, THERE IS EMOTIONAL INFORMATION IN ALMOST ANY KIND OF SOUND RECEIVED BY HUMANS EVERY DAY: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow's pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of "the sound that something makes," in order to evaluate the system's auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.
Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech

ERIC Educational Resources Information Center

Ben-David, Boaz M.; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H. H. M.

2016-01-01

Purpose: Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. Method: We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5…
Effects of emotion on different phoneme classes

NASA Astrophysics Data System (ADS)

Lee, Chul Min; Yildirim, Serdar; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Abe; Lee, Sungbok; Narayanan, Shrikanth

2004-10-01

This study investigates the effects of emotion on different phoneme classes using short-term spectral features. In the research on emotion in speech, most studies have focused on prosodic features of speech. In this study, based on the hypothesis that different emotions have varying effects on the properties of the different speech sounds, we investigate the usefulness of phoneme-class level acoustic modeling for automatic emotion classification. Hidden Markov models (HMM) based on short-term spectral features for five broad phonetic classes are used for this purpose using data obtained from recordings of two actresses. Each speaker produces 211 sentences with four different emotions (neutral, sad, angry, happy). Using the speech material we trained and compared the performances of two sets of HMM classifiers: a generic set of ``emotional speech'' HMMs (one for each emotion) and a set of broad phonetic-class based HMMs (vowel, glide, nasal, stop, fricative) for each emotion type considered. Comparison of classification results indicates that different phoneme classes were affected differently by emotional change and that the vowel sounds are the most important indicator of emotions in speech. Detailed results and their implications on the underlying speech articulation will be discussed.
Common cues to emotion in the dynamic facial expressions of speech and song.

PubMed

Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

2015-01-01

Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.
Study of acoustic correlates associate with emotional speech

NASA Astrophysics Data System (ADS)

Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

2004-10-01

This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.

Some articulatory details of emotional speech

NASA Astrophysics Data System (ADS)

Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

2005-09-01

Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.
Psychoacoustic cues to emotion in speech prosody and music.

PubMed

Coutinho, Eduardo; Dibben, Nicola

2013-01-01

There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.
Emotional Speech Perception Unfolding in Time: The Role of the Basal Ganglia

PubMed Central

Paulmann, Silke; Ott, Derek V. M.; Kotz, Sonja A.

2011-01-01

The basal ganglia (BG) have repeatedly been linked to emotional speech processing in studies involving patients with neurodegenerative and structural changes of the BG. However, the majority of previous studies did not consider that (i) emotional speech processing entails multiple processing steps, and the possibility that (ii) the BG may engage in one rather than the other of these processing steps. In the present study we investigate three different stages of emotional speech processing (emotional salience detection, meaning-related processing, and identification) in the same patient group to verify whether lesions to the BG affect these stages in a qualitatively different manner. Specifically, we explore early implicit emotional speech processing (probe verification) in an ERP experiment followed by an explicit behavioral emotional recognition task. In both experiments, participants listened to emotional sentences expressing one of four emotions (anger, fear, disgust, happiness) or neutral sentences. In line with previous evidence patients and healthy controls show differentiation of emotional and neutral sentences in the P200 component (emotional salience detection) and a following negative-going brain wave (meaning-related processing). However, the behavioral recognition (identification stage) of emotional sentences was impaired in BG patients, but not in healthy controls. The current data provide further support that the BG are involved in late, explicit rather than early emotional speech processing stages. PMID:21437277
Valence-specific conflict moderation in the dorso-medial PFC and the caudate head in emotional speech.

PubMed

Kotz, Sonja A; Dengler, Reinhard; Wittfoth, Matthias

2015-02-01

Emotional speech comprises of complex multimodal verbal and non-verbal information that allows deducting others' emotional states or thoughts in social interactions. While the neural correlates of verbal and non-verbal aspects and their interaction in emotional speech have been identified, there is very little evidence on how we perceive and resolve incongruity in emotional speech, and whether such incongruity extends to current concepts of task-specific prediction errors as a consequence of unexpected action outcomes ('negative surprise'). Here, we explored this possibility while participants listened to congruent and incongruent angry, happy or neutral utterances and categorized the expressed emotions by their verbal (semantic) content. Results reveal valence-specific incongruity effects: negative verbal content expressed in a happy tone of voice increased activation in the dorso-medial prefrontal cortex (dmPFC) extending its role from conflict moderation to appraisal of valence-specific conflict in emotional speech. Conversely, the caudate head bilaterally responded selectively to positive verbal content expressed in an angry tone of voice broadening previous accounts of the caudate head in linguistic control to moderating valence-specific control in emotional speech. Together, these results suggest that control structures of the human brain (dmPFC and subcompartments of the basal ganglia) impact emotional speech differentially when conflict arises. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

PubMed Central

Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.

2013-01-01

Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144
Improving Understanding of Emotional Speech Acoustic Content

NASA Astrophysics Data System (ADS)

Tinnemore, Anna

Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.
Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition

PubMed Central

Rigoulot, Simon; Wassiliwizky, Eugen; Pell, Marc D.

2013-01-01

Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereas happiness and disgust are recognized relatively slowly; Pell and Kotz, 2011). To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated syllable-by-syllable from the offset rather than the onset of the stimulus). Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell and Kotz (2011). We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners' accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fear, anger, sadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgust were recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400–1200 ms time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech. PMID:23805115
Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

NASA Astrophysics Data System (ADS)

Přibil, J.; Přibilová, A.

2009-01-01

The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.
Second Language Ability and Emotional Prosody Perception

PubMed Central

Bhatara, Anjali; Laukka, Petri; Boll-Avetisyan, Natalie; Granjon, Lionel; Anger Elfenbein, Hillary; Bänziger, Tanja

2016-01-01

The present study examines the effect of language experience on vocal emotion perception in a second language. Native speakers of French with varying levels of self-reported English ability were asked to identify emotions from vocal expressions produced by American actors in a forced-choice task, and to rate their pleasantness, power, alertness and intensity on continuous scales. Stimuli included emotionally expressive English speech (emotional prosody) and non-linguistic vocalizations (affect bursts), and a baseline condition with Swiss-French pseudo-speech. Results revealed effects of English ability on the recognition of emotions in English speech but not in non-linguistic vocalizations. Specifically, higher English ability was associated with less accurate identification of positive emotions, but not with the interpretation of negative emotions. Moreover, higher English ability was associated with lower ratings of pleasantness and power, again only for emotional prosody. This suggests that second language skills may sometimes interfere with emotion recognition from speech prosody, particularly for positive emotions. PMID:27253326
Common cues to emotion in the dynamic facial expressions of speech and song

PubMed Central

Livingstone, Steven R.; Thompson, William F.; Wanderley, Marcelo M.; Palmer, Caroline

2015-01-01

Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech–song differences. Vocalists’ jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech–song. Vocalists’ emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists’ facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production. PMID:25424388
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians.

PubMed

Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan

2014-01-01

Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes.
Human emotions track changes in the acoustic environment.

PubMed

Ma, Weiyi; Thompson, William Forde

2015-11-24

Emotional responses to biologically significant events are essential for human survival. Do human emotions lawfully track changes in the acoustic environment? Here we report that changes in acoustic attributes that are well known to interact with human emotions in speech and music also trigger systematic emotional responses when they occur in environmental sounds, including sounds of human actions, animal calls, machinery, or natural phenomena, such as wind and rain. Three changes in acoustic attributes known to signal emotional states in speech and music were imposed upon 24 environmental sounds. Evaluations of stimuli indicated that human emotions track such changes in environmental sounds just as they do for speech and music. Such changes not only influenced evaluations of the sounds themselves, they also affected the way accompanying facial expressions were interpreted emotionally. The findings illustrate that human emotions are highly attuned to changes in the acoustic environment, and reignite a discussion of Charles Darwin's hypothesis that speech and music originated from a common emotional signal system based on the imitation and modification of environmental sounds.
Head movements encode emotions during speech and song.

PubMed

Livingstone, Steven R; Palmer, Caroline

2016-04-01

When speaking or singing, vocalists often move their heads in an expressive fashion, yet the influence of emotion on vocalists' head motion is unknown. Using a comparative speech/song task, we examined whether vocalists' intended emotions influence head movements and whether those movements influence the perceived emotion. In Experiment 1, vocalists were recorded with motion capture while speaking and singing each statement with different emotional intentions (very happy, happy, neutral, sad, very sad). Functional data analyses showed that head movements differed in translational and rotational displacement across emotional intentions, yet were similar across speech and song, transcending differences in F0 (varied freely in speech, fixed in song) and lexical variability. Head motion specific to emotional state occurred before and after vocalizations, as well as during sound production, confirming that some aspects of movement were not simply a by-product of sound production. In Experiment 2, observers accurately identified vocalists' intended emotion on the basis of silent, face-occluded videos of head movements during speech and song. These results provide the first evidence that head movements encode a vocalist's emotional intent and that observers decode emotional information from these movements. We discuss implications for models of head motion during vocalizations and applied outcomes in social robotics and automated emotion recognition. (c) 2016 APA, all rights reserved).
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians

PubMed Central

Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan

2015-01-01

Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes. PMID:25688196
[Perception of emotional intonation of noisy speech signal with different acoustic parameters by adults of different age and gender].

PubMed

Dmitrieva, E S; Gel'man, V Ia

2011-01-01

The listener-distinctive features of recognition of different emotional intonations (positive, negative and neutral) of male and female speakers in the presence or absence of background noise were studied in 49 adults aged 20-79 years. In all the listeners noise produced the most pronounced decrease in recognition accuracy for positive emotional intonation ("joy") as compared to other intonations, whereas it did not influence the recognition accuracy of "anger" in 65-79-year-old listeners. The higher emotion recognition rates of a noisy signal were observed for speech emotional intonations expressed by female speakers. Acoustic characteristics of noisy and clear speech signals underlying perception of speech emotional prosody were found for adult listeners of different age and gender.
The minor third communicates sadness in speech, mirroring its use in music.

PubMed

Curtis, Meagan E; Bharucha, Jamshed J

2010-06-01

There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.
[Perception features of emotional intonation of short pseudowords].

PubMed

Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M

2012-01-01

Reaction time and recognition accuracy of speech emotional intonations in short meaningless words that differed only in one phoneme with background noise and without it were studied in 49 adults of 20-79 years old. The results were compared with the same parameters of emotional intonations in intelligent speech utterances under similar conditions. Perception of emotional intonations at different linguistic levels (phonological and lexico-semantic) was found to have both common features and certain peculiarities. Recognition characteristics of emotional intonations depending on gender and age of listeners appeared to be invariant with regard to linguistic levels of speech stimuli. Phonemic composition of pseudowords was found to influence the emotional perception, especially against the background noise. The most significant stimuli acoustic characteristic responsible for the perception of speech emotional prosody in short meaningless words under the two experimental conditions, i.e. with and without background noise, was the fundamental frequency variation.
Multi-function robots with speech interaction and emotion feedback

NASA Astrophysics Data System (ADS)

Wang, Hongyu; Lou, Guanting; Ma, Mengchao

2018-03-01

Nowadays, the service robots have been applied in many public circumstances; however, most of them still don’t have the function of speech interaction, especially the function of speech-emotion interaction feedback. To make the robot more humanoid, Arduino microcontroller was used in this study for the speech recognition module and servo motor control module to achieve the functions of the robot’s speech interaction and emotion feedback. In addition, W5100 was adopted for network connection to achieve information transmission via Internet, providing broad application prospects for the robot in the area of Internet of Things (IoT).
[Influence of human personal features on acoustic correlates of speech emotional intonation characteristics].

PubMed

Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M

2009-01-01

Comparative study of acoustic correlates of emotional intonation was conducted on two types of speech material: sensible speech utterances and short meaningless words. The corpus of speech signals of different emotional intonations (happy, angry, frightened, sad and neutral) was created using the actor's method of simulation of emotions. Native Russian 20-70-year-old speakers (both professional actors and non-actors) participated in the study. In the corpus, the following characteristics were analyzed: mean values and standard deviations of the power, fundamental frequency, frequencies of the first and second formants, and utterance duration. Comparison of each emotional intonation with "neutral" utterances showed the greatest deviations of the fundamental frequency and frequencies of the first formant. The direction of these deviations was independent of the semantic content of speech utterance and its duration, age, gender, and being actor or non-actor, though the personal features of the speakers affected the absolute values of these frequencies.
Shared acoustic codes underlie emotional communication in music and speech-Evidence from deep transfer learning.

PubMed

Coutinho, Eduardo; Schuller, Björn

2017-01-01

Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies-the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain.

Did you or I say pretty, rude or brief? An ERP study of the effects of speaker's identity on emotional word processing.

PubMed

Pinheiro, Ana P; Rezaii, Neguine; Nestor, Paul G; Rauber, Andréia; Spencer, Kevin M; Niznikiewicz, Margaret

2016-02-01

During speech comprehension, multiple cues need to be integrated at a millisecond speed, including semantic information, as well as voice identity and affect cues. A processing advantage has been demonstrated for self-related stimuli when compared with non-self stimuli, and for emotional relative to neutral stimuli. However, very few studies investigated self-other speech discrimination and, in particular, how emotional valence and voice identity interactively modulate speech processing. In the present study we probed how the processing of words' semantic valence is modulated by speaker's identity (self vs. non-self voice). Sixteen healthy subjects listened to 420 prerecorded adjectives differing in voice identity (self vs. non-self) and semantic valence (neutral, positive and negative), while electroencephalographic data were recorded. Participants were instructed to decide whether the speech they heard was their own (self-speech condition), someone else's (non-self speech), or if they were unsure. The ERP results demonstrated interactive effects of speaker's identity and emotional valence on both early (N1, P2) and late (Late Positive Potential - LPP) processing stages: compared with non-self speech, self-speech with neutral valence elicited more negative N1 amplitude, self-speech with positive valence elicited more positive P2 amplitude, and self-speech with both positive and negative valence elicited more positive LPP. ERP differences between self and non-self speech occurred in spite of similar accuracy in the recognition of both types of stimuli. Together, these findings suggest that emotion and speaker's identity interact during speech processing, in line with observations of partially dependent processing of speech and speaker information. Copyright © 2016. Published by Elsevier Inc.
Human emotions track changes in the acoustic environment

PubMed Central

Ma, Weiyi; Thompson, William Forde

2015-01-01

Emotional responses to biologically significant events are essential for human survival. Do human emotions lawfully track changes in the acoustic environment? Here we report that changes in acoustic attributes that are well known to interact with human emotions in speech and music also trigger systematic emotional responses when they occur in environmental sounds, including sounds of human actions, animal calls, machinery, or natural phenomena, such as wind and rain. Three changes in acoustic attributes known to signal emotional states in speech and music were imposed upon 24 environmental sounds. Evaluations of stimuli indicated that human emotions track such changes in environmental sounds just as they do for speech and music. Such changes not only influenced evaluations of the sounds themselves, they also affected the way accompanying facial expressions were interpreted emotionally. The findings illustrate that human emotions are highly attuned to changes in the acoustic environment, and reignite a discussion of Charles Darwin’s hypothesis that speech and music originated from a common emotional signal system based on the imitation and modification of environmental sounds. PMID:26553987
Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

NASA Astrophysics Data System (ADS)

Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng

It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.
Speech Situation Checklist-Revised: Investigation With Adults Who Do Not Stutter and Treatment-Seeking Adults Who Stutter.

PubMed

Vanryckeghem, Martine; Matthews, Michael; Xu, Peixin

2017-11-08

The aim of this study was to evaluate the usefulness of the Speech Situation Checklist for adults who stutter (SSC) in differentiating people who stutter (PWS) from speakers with no stutter based on self-reports of anxiety and speech disruption in communicative settings. The SSC's psychometric properties were examined, norms were established, and suggestions for treatment were formulated. The SSC was administered to 88 PWS seeking treatment and 209 speakers with no stutter between the ages of 18 and 62. The SSC consists of 2 sections investigating negative emotional reaction and speech disruption in 38 speech situations that are identical in both sections. The SSC-Emotional Reaction and SSC-Speech Disruption data show that these self-report tests differentiate PWS from speakers with no stutter to a statistically significant extent and have great discriminative value. The tests have good internal reliability, content, and construct validity. Age and gender do not affect the scores of the PWS. The SSC-Emotional Reaction and SSC-Speech Disruption seem to be powerful measures to investigate negative emotion and speech breakdown in an array of speech situations. The item scores give direction to treatment by suggesting speech situations that need a clinician's attention in terms of generalization and carry-over of within-clinic therapeutic gains into in vivo settings.
Autonomic and Emotional Responses of Graduate Student Clinicians in Speech-Language Pathology to Stuttered Speech

ERIC Educational Resources Information Center

Guntupalli, Vijaya K.; Nanjundeswaran, Chayadevie; Dayalu, Vikram N.; Kalinowski, Joseph

2012-01-01

Background: Fluent speakers and people who stutter manifest alterations in autonomic and emotional responses as they view stuttered relative to fluent speech samples. These reactions are indicative of an aroused autonomic state and are hypothesized to be triggered by the abrupt breakdown in fluency exemplified in stuttered speech. Furthermore,…
Emotional recognition from the speech signal for a virtual education agent

NASA Astrophysics Data System (ADS)

Tickle, A.; Raghu, S.; Elshaw, M.

2013-06-01

This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.
Preschoolers' real-time coordination of vocal and facial emotional information.

PubMed

Berman, Jared M J; Chambers, Craig G; Graham, Susan A

2016-02-01

An eye-tracking methodology was used to examine the time course of 3- and 5-year-olds' ability to link speech bearing different acoustic cues to emotion (i.e., happy-sounding, neutral, and sad-sounding intonation) to photographs of faces reflecting different emotional expressions. Analyses of saccadic eye movement patterns indicated that, for both 3- and 5-year-olds, sad-sounding speech triggered gaze shifts to a matching (sad-looking) face from the earliest moments of speech processing. However, it was not until approximately 800ms into a happy-sounding utterance that preschoolers began to use the emotional cues from speech to identify a matching (happy-looking) face. Complementary analyses based on conscious/controlled behaviors (children's explicit points toward the faces) indicated that 5-year-olds, but not 3-year-olds, could successfully match happy-sounding and sad-sounding vocal affect to a corresponding emotional face. Together, the findings clarify developmental patterns in preschoolers' implicit versus explicit ability to coordinate emotional cues across modalities and highlight preschoolers' greater sensitivity to sad-sounding speech as the auditory signal unfolds in time. Copyright © 2015 Elsevier Inc. All rights reserved.
Musician effect on perception of spectro-temporally degraded speech, vocal emotion, and music in young adolescents.

PubMed

Başkent, Deniz; Fuller, Christina D; Galvin, John J; Schepel, Like; Gaudrain, Etienne; Free, Rolien H

2018-05-01

In adult normal-hearing musicians, perception of music, vocal emotion, and speech in noise has been previously shown to be better than non-musicians, sometimes even with spectro-temporally degraded stimuli. In this study, melodic contour identification, vocal emotion identification, and speech understanding in noise were measured in young adolescent normal-hearing musicians and non-musicians listening to unprocessed or degraded signals. Different from adults, there was no musician effect for vocal emotion identification or speech in noise. Melodic contour identification with degraded signals was significantly better in musicians, suggesting potential benefits from music training for young cochlear-implant users, who experience similar spectro-temporal signal degradations.
From disgust to contempt-speech: The nature of contempt on the map of prejudicial emotions.

PubMed

Bilewicz, Michal; Kamińska, Olga Katarzyna; Winiewski, Mikołaj; Soral, Wiktor

2017-01-01

Analyzing the contempt as an intergroup emotion, we suggest that contempt and anger are not built upon each other, whereas disgust seems to be the most elementary and specific basic-emotional antecedent of contempt. Concurring with Gervais & Fessler, we suggest that many instances of "hate speech" are in fact instances of "contempt speech" - being based on disgust-driven contempt rather than hate.
Shared acoustic codes underlie emotional communication in music and speech—Evidence from deep transfer learning

PubMed Central

Schuller, Björn

2017-01-01

Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies—the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain. PMID:28658285
The Sound of Feelings: Electrophysiological Responses to Emotional Speech in Alexithymia

PubMed Central

Goerlich, Katharina Sophia; Aleman, André; Martens, Sander

2012-01-01

Background Alexithymia is a personality trait characterized by difficulties in the cognitive processing of emotions (cognitive dimension) and in the experience of emotions (affective dimension). Previous research focused mainly on visual emotional processing in the cognitive alexithymia dimension. We investigated the impact of both alexithymia dimensions on electrophysiological responses to emotional speech in 60 female subjects. Methodology During unattended processing, subjects watched a movie while an emotional prosody oddball paradigm was presented in the background. During attended processing, subjects detected deviants in emotional prosody. The cognitive alexithymia dimension was associated with a left-hemisphere bias during early stages of unattended emotional speech processing, and with generally reduced amplitudes of the late P3 component during attended processing. In contrast, the affective dimension did not modulate unattended emotional prosody perception, but was associated with reduced P3 amplitudes during attended processing particularly to emotional prosody spoken in high intensity. Conclusions Our results provide evidence for a dissociable impact of the two alexithymia dimensions on electrophysiological responses during the attended and unattended processing of emotional prosody. The observed electrophysiological modulations are indicative of a reduced sensitivity to the emotional qualities of speech, which may be a contributing factor to problems in interpersonal communication associated with alexithymia. PMID:22615853
Emotional reactivity and regulation in preschool-age children who stutter.

PubMed

Ntourou, Katerina; Conture, Edward G; Walden, Tedra A

2013-09-01

This study experimentally investigated behavioral correlates of emotional reactivity and emotion regulation and their relation to speech (dis)fluency in preschool-age children who do (CWS) and do not (CWNS) stutter during emotion-eliciting conditions. Participants (18 CWS, 14 boys; 18 CWNS, 14 boys) completed two experimental tasks (1) a neutral ("apples and leaves in a transparent box," ALTB) and (2) a frustrating ("attractive toy in a transparent box," ATTB) task, both of which were followed by a narrative task. Dependent measures were emotional reactivity (positive affect, negative affect), emotion regulation (self-speech, distraction) exhibited during the ALTB and the ATTB tasks, percentage of stuttered disfluencies (SDs) and percentage of non-stuttered disfluencies (NSDs) produced during the narratives. Results indicated that preschool-age CWS exhibited significantly more negative emotion and more self-speech than preschool-age CWNS. For CWS only, emotion regulation behaviors (i.e., distraction, self-speech) during the experimental tasks were predictive of stuttered disfluencies during the subsequent narrative tasks. Furthermore, for CWS there was no relation between emotional processes and non-stuttered disfluencies, but CWNS's negative affect was significantly related to nonstuttered disfluencies. In general, present findings support the notion that emotional processes are associated with childhood stuttering. Specifically, findings are consistent with the notion that preschool-age CWS are more emotionally reactive than CWNS and that their self-speech regulatory attempts may be less than effective in modulating their emotions. The reader will be able to: (a) communicate the relevance of studying the role of emotion in developmental stuttering close to the onset of stuttering and (b) describe the main findings of the present study in relation to previous studies that have used different methodologies to investigate the role of emotion in developmental stuttering of young children who stutter. Copyright © 2013 Elsevier Inc. All rights reserved.
Double Fourier analysis for Emotion Identification in Voiced Speech

NASA Astrophysics Data System (ADS)

Sierra-Sosa, D.; Bastidas, M.; Ortiz P., D.; Quintero, O. L.

2016-04-01

We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech. Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions. A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds. Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions. Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it. Finally features related with emotions in voiced speech are extracted and presented.
Gender differences in identifying emotions from auditory and visual stimuli.

PubMed

Waaramaa, Teija

2017-12-01

The present study focused on gender differences in emotion identification from auditory and visual stimuli produced by two male and two female actors. Differences in emotion identification from nonsense samples, language samples and prolonged vowels were investigated. It was also studied whether auditory stimuli can convey the emotional content of speech without visual stimuli, and whether visual stimuli can convey the emotional content of speech without auditory stimuli. The aim was to get a better knowledge of vocal attributes and a more holistic understanding of the nonverbal communication of emotion. Females tended to be more accurate in emotion identification than males. Voice quality parameters played a role in emotion identification in both genders. The emotional content of the samples was best conveyed by nonsense sentences, better than by prolonged vowels or shared native language of the speakers and participants. Thus, vocal non-verbal communication tends to affect the interpretation of emotion even in the absence of language. The emotional stimuli were better recognized from visual stimuli than auditory stimuli by both genders. Visual information about speech may not be connected to the language; instead, it may be based on the human ability to understand the kinetic movements in speech production more readily than the characteristics of the acoustic cues.
Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang Xiaojia; Mao Qirong; Zhan Yongzhao

There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions.more » The experiments show that this method can improve the recognition rate and the time of feature extraction.« less
Characterizing resonant component in speech: A different view of tracking fundamental frequency

NASA Astrophysics Data System (ADS)

Dong, Bin

2017-05-01

Inspired by the nonlinearity and nonstationarity and the modulations in speech, Hilbert-Huang Transform and cyclostationarity analysis are employed to investigate the speech resonance in vowel in sequence. Cyclostationarity analysis is not directly manipulated on the target vowel, but on its intrinsic mode functions one by one. Thanks to the equivalence between the fundamental frequency in speech and the cyclic frequency in cyclostationarity analysis, the modulation intensity distributions of the intrinsic mode functions provide much information for the estimation of the fundamental frequency. To highlight the relationship between frequency and time, the pseudo-Hilbert spectrum is proposed to replace the Hilbert spectrum here. After contrasting the pseudo-Hilbert spectra of and the modulation intensity distributions of the intrinsic mode functions, it finds that there is usually one intrinsic mode function which works as the fundamental component of the vowel. Furthermore, the fundamental frequency of the vowel can be determined by tracing the pseudo-Hilbert spectrum of its fundamental component along the time axis. The later method is more robust to estimate the fundamental frequency, when meeting nonlinear components. Two vowels [a] and [i], picked up from a speech database FAU Aibo Emotion Corpus, are applied to validate the above findings.
The effect of emotion on articulation rate in persistence and recovery of childhood stuttering.

PubMed

Erdemir, Aysu; Walden, Tedra A; Jefferson, Caswell M; Choi, Dahye; Jones, Robin M

2018-06-01

This study investigated the possible association of emotional processes and articulation rate in pre-school age children who stutter and persist (persisting), children who stutter and recover (recovered) and children who do not stutter (nonstuttering). The participants were ten persisting, ten recovered, and ten nonstuttering children between the ages of 3-5 years; who were classified as persisting, recovered, or nonstuttering approximately 2-2.5 years after the experimental testing took place. The children were exposed to three emotionally-arousing video clips (baseline, positive and negative) and produced a narrative based on a text-free storybook following each video clip. From the audio-recordings of these narratives, individual utterances were transcribed and articulation rates were calculated. Results indicated that persisting children exhibited significantly slower articulation rates following the negative emotion condition, unlike recovered and nonstuttering children whose articulation rates were not affected by either of the two emotion-inducing conditions. Moreover, all stuttering children displayed faster rates during fluent compared to stuttered speech; however, the recovered children were significantly faster than the persisting children during fluent speech. Negative emotion plays a detrimental role on the speech-motor control processes of children who persist, whereas children who eventually recover seem to exhibit a relatively more stable and mature speech-motor system. This suggests that complex interactions between speech-motor and emotional processes are at play in stuttering recovery and persistency; and articulation rates following negative emotion or during stuttered versus fluent speech might be considered as potential factors to prospectively predict persistence and recovery from stuttering. Copyright © 2017 Elsevier Inc. All rights reserved.
Benefits of Music Training for Perception of Emotional Speech Prosody in Deaf Children With Cochlear Implants

PubMed Central

Gordon, Karen A.; Papsin, Blake C.; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A.

2017-01-01

Objectives: Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Design: Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Results: Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Conclusions: Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation. PMID:28085739
Benefits of Music Training for Perception of Emotional Speech Prosody in Deaf Children With Cochlear Implants.

PubMed

Good, Arla; Gordon, Karen A; Papsin, Blake C; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A

Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation.
Language for Winning Hearts and Minds: Verb Aspect in U.S. Presidential Campaign Speeches for Engaging Emotion.

PubMed

Havas, David A; Chapp, Christopher B

2016-01-01

How does language influence the emotions and actions of large audiences? Functionally, emotions help address environmental uncertainty by constraining the body to support adaptive responses and social coordination. We propose emotions provide a similar function in language processing by constraining the mental simulation of language content to facilitate comprehension, and to foster alignment of mental states in message recipients. Consequently, we predicted that emotion-inducing language should be found in speeches specifically designed to create audience alignment - stump speeches of United States presidential candidates. We focused on phrases in the past imperfective verb aspect ("a bad economy was burdening us") that leave a mental simulation of the language content open-ended, and thus unconstrained, relative to past perfective sentences ("we were burdened by a bad economy"). As predicted, imperfective phrases appeared more frequently in stump versus comparison speeches, relative to perfective phrases. In a subsequent experiment, participants rated phrases from presidential speeches as more emotionally intense when written in the imperfective aspect compared to the same phrases written in the perfective aspect, particularly for sentences perceived as negative in valence. These findings are consistent with the notion that emotions have a role in constraining the comprehension of language, a role that may be used in communication with large audiences.

Speech Databases of Typical Children and Children with SLI

PubMed Central

Grill, Pavel; Tučková, Jana

2016-01-01

The extent of research on children’s speech in general and on disordered speech specifically is very limited. In this article, we describe the process of creating databases of children’s speech and the possibilities for using such databases, which have been created by the LANNA research group in the Faculty of Electrical Engineering at Czech Technical University in Prague. These databases have been principally compiled for medical research but also for use in other areas, such as linguistics. Two databases were recorded: one for healthy children’s speech (recorded in kindergarten and in the first level of elementary school) and the other for pathological speech of children with a Specific Language Impairment (recorded at a surgery of speech and language therapists and at the hospital). Both databases were sub-divided according to specific demands of medical research. Their utilization can be exoteric, specifically for linguistic research and pedagogical use as well as for studies of speech-signal processing. PMID:26963508
The Voice of Emotion: Acoustic Properties of Six Emotional Expressions.

NASA Astrophysics Data System (ADS)

Baldwin, Carol May

Studies in the perceptual identification of emotional states suggested that listeners seemed to depend on a limited set of vocal cues to distinguish among emotions. Linguistics and speech science literatures have indicated that this small set of cues included intensity, fundamental frequency, and temporal properties such as speech rate and duration. Little research has been done, however, to validate these cues in the production of emotional speech, or to determine if specific dimensions of each cue are associated with the production of a particular emotion for a variety of speakers. This study addressed deficiencies in understanding of the acoustical properties of duration and intensity as components of emotional speech by means of speech science instrumentation. Acoustic data were conveyed in a brief sentence spoken by twelve English speaking adult male and female subjects, half with dramatic training, and half without such training. Simulated expressions included: happiness, surprise, sadness, fear, anger, and disgust. The study demonstrated that the acoustic property of mean intensity served as an important cue for a vocal taxonomy. Overall duration was rejected as an element for a general taxonomy due to interactions involving gender and role. Findings suggested a gender-related taxonomy, however, based on differences in the ways in which men and women use the duration cue in their emotional expressions. Results also indicated that speaker training may influence greater use of the duration cue in expressions of emotion, particularly for male actors. Discussion of these results provided linkages to (1) practical management of emotional interactions in clinical and interpersonal environments, (2) implications for differences in the ways in which males and females may be socialized to express emotions, and (3) guidelines for future perceptual studies of emotional sensitivity.
Crossmodal and Incremental Perception of Audiovisual Cues to Emotional Speech

ERIC Educational Resources Information Center

Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

2010-01-01

In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests…
Action Unit Models of Facial Expression of Emotion in the Presence of Speech

PubMed Central

Shah, Miraj; Cooper, David G.; Cao, Houwei; Gur, Ruben C.; Nenkova, Ani; Verma, Ragini

2014-01-01

Automatic recognition of emotion using facial expressions in the presence of speech poses a unique challenge because talking reveals clues for the affective state of the speaker but distorts the canonical expression of emotion on the face. We introduce a corpus of acted emotion expression where speech is either present (talking) or absent (silent). The corpus is uniquely suited for analysis of the interplay between the two conditions. We use a multimodal decision level fusion classifier to combine models of emotion from talking and silent faces as well as from audio to recognize five basic emotions: anger, disgust, fear, happy and sad. Our results strongly indicate that emotion prediction in the presence of speech from action unit facial features is less accurate when the person is talking. Modeling talking and silent expressions separately and fusing the two models greatly improves accuracy of prediction in the talking setting. The advantages are most pronounced when silent and talking face models are fused with predictions from audio features. In this multi-modal prediction both the combination of modalities and the separate models of talking and silent facial expression of emotion contribute to the improvement. PMID:25525561
Private Speech Moderates the Effects of Effortful Control on Emotionality

ERIC Educational Resources Information Center

Day, Kimberly L.; Smith, Cynthia L.; Neal, Amy; Dunsmore, Julie C.

2018-01-01

Research Findings: In addition to being a regulatory strategy, children's private speech may enhance or interfere with their effortful control used to regulate emotion. The goal of the current study was to investigate whether children's private speech during a selective attention task moderated the relations of their effortful control to their…
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

PubMed Central

Schutz, Michael

2017-01-01

Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy”) pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech. PMID:29249997
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion.

PubMed

Schutz, Michael

2017-01-01

Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally "happy") pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers "trade off" cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music-widely recognized for its artistic significance-complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.
Crossmodal and incremental perception of audiovisual cues to emotional speech.

PubMed

Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

2010-01-01

In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores raise faster for positive than negative conditions. Finally, the gating results suggest that incongruent emotions are perceived as more intense than congruent emotions, as the former get more extreme recognition scores than the latter, already after a short period of exposure.
Explicit authenticity and stimulus features interact to modulate BOLD response induced by emotional speech.

PubMed

Drolet, Matthis; Schubotz, Ricarda I; Fischer, Julia

2013-06-01

Context has been found to have a profound effect on the recognition of social stimuli and correlated brain activation. The present study was designed to determine whether knowledge about emotional authenticity influences emotion recognition expressed through speech intonation. Participants classified emotionally expressive speech in an fMRI experimental design as sad, happy, angry, or fearful. For some trials, stimuli were cued as either authentic or play-acted in order to manipulate participant top-down belief about authenticity, and these labels were presented both congruently and incongruently to the emotional authenticity of the stimulus. Contrasting authentic versus play-acted stimuli during uncued trials indicated that play-acted stimuli spontaneously up-regulate activity in the auditory cortex and regions associated with emotional speech processing. In addition, a clear interaction effect of cue and stimulus authenticity showed up-regulation in the posterior superior temporal sulcus and the anterior cingulate cortex, indicating that cueing had an impact on the perception of authenticity. In particular, when a cue indicating an authentic stimulus was followed by a play-acted stimulus, additional activation occurred in the temporoparietal junction, probably pointing to increased load on perspective taking in such trials. While actual authenticity has a significant impact on brain activation, individual belief about stimulus authenticity can additionally modulate the brain response to differences in emotionally expressive speech.
Exploring expressivity and emotion with artificial voice and speech technologies.

PubMed

Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

2013-10-01

Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.
An experiment with spectral analysis of emotional speech affected by orthodontic appliances

NASA Astrophysics Data System (ADS)

Přibil, Jiří; Přibilová, Anna; Ďuračková, Daniela

2012-11-01

The contribution describes the effect of the fixed and removable orthodontic appliances on spectral properties of emotional speech. Spectral changes were analyzed and evaluated by spectrograms and mean Welch’s periodograms. This alternative approach to the standard listening test enables to obtain objective comparison based on statistical analysis by ANOVA and hypothesis tests. Obtained results of analysis performed on short sentences of a female speaker in four emotional states (joyous, sad, angry, and neutral) show that, first of all, the removable orthodontic appliance affects the spectrograms of produced speech.
Effect of Acting Experience on Emotion Expression and Recognition in Voice: Non-Actors Provide Better Stimuli than Expected.

PubMed

Jürgens, Rebecca; Grass, Annika; Drolet, Matthis; Fischer, Julia

Both in the performative arts and in emotion research, professional actors are assumed to be capable of delivering emotions comparable to spontaneous emotional expressions. This study examines the effects of acting training on vocal emotion depiction and recognition. We predicted that professional actors express emotions in a more realistic fashion than non-professional actors. However, professional acting training may lead to a particular speech pattern; this might account for vocal expressions by actors that are less comparable to authentic samples than the ones by non-professional actors. We compared 80 emotional speech tokens from radio interviews with 80 re-enactments by professional and inexperienced actors, respectively. We analyzed recognition accuracies for emotion and authenticity ratings and compared the acoustic structure of the speech tokens. Both play-acted conditions yielded similar recognition accuracies and possessed more variable pitch contours than the spontaneous recordings. However, professional actors exhibited signs of different articulation patterns compared to non-trained speakers. Our results indicate that for emotion research, emotional expressions by professional actors are not better suited than those from non-actors.
Not all sounds sound the same: Parkinson's disease affects differently emotion processing in music and in speech prosody.

PubMed

Lima, César F; Garrett, Carolina; Castro, São Luís

2013-01-01

Does emotion processing in music and speech prosody recruit common neurocognitive mechanisms? To examine this question, we implemented a cross-domain comparative design in Parkinson's disease (PD). Twenty-four patients and 25 controls performed emotion recognition tasks for music and spoken sentences. In music, patients had impaired recognition of happiness and peacefulness, and intact recognition of sadness and fear; this pattern was independent of general cognitive and perceptual abilities. In speech, patients had a small global impairment, which was significantly mediated by executive dysfunction. Hence, PD affected differently musical and prosodic emotions. This dissociation indicates that the mechanisms underlying the two domains are partly independent.
Postcategorical auditory distraction in short-term memory: Insights from increased task load and task type.

PubMed

Marsh, John E; Yang, Jingqi; Qualter, Pamela; Richardson, Cassandra; Perham, Nick; Vachon, François; Hughes, Robert W

2018-06-01

Task-irrelevant speech impairs short-term serial recall appreciably. On the interference-by-process account, the processing of physical (i.e., precategorical) changes in speech yields order cues that conflict with the serial-ordering process deployed to perform the serial recall task. In this view, the postcategorical properties (e.g., phonology, meaning) of speech play no role. The present study reassessed the implications of recent demonstrations of auditory postcategorical distraction in serial recall that have been taken as support for an alternative, attentional-diversion, account of the irrelevant speech effect. Focusing on the disruptive effect of emotionally valent compared with neutral words on serial recall, we show that the distracter-valence effect is eliminated under conditions-high task-encoding load-thought to shield against attentional diversion whereas the general effect of speech (neutral words compared with quiet) remains unaffected (Experiment 1). Furthermore, the distracter-valence effect generalizes to a task that does not require the processing of serial order-the missing-item task-whereas the effect of speech per se is attenuated in this task (Experiment 2). We conclude that postcategorical auditory distraction phenomena in serial short-term memory (STM) are incidental: they are observable in such a setting but, unlike the acoustically driven irrelevant speech effect, are not integral to it. As such, the findings support a duplex-mechanism account over a unitary view of auditory distraction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
An ERP study of vocal emotion processing in asymmetric Parkinson’s disease

PubMed Central

Garrido-Vásquez, Patricia; Pell, Marc D.; Paulmann, Silke; Strecker, Karl; Schwarz, Johannes; Kotz, Sonja A.

2013-01-01

Parkinson’s disease (PD) has been related to impaired processing of emotional speech intonation (emotional prosody). One distinctive feature of idiopathic PD is motor symptom asymmetry, with striatal dysfunction being strongest in the hemisphere contralateral to the most affected body side. It is still unclear whether this asymmetry may affect vocal emotion perception. Here, we tested 22 PD patients (10 with predominantly left-sided [LPD] and 12 with predominantly right-sided motor symptoms) and 22 healthy controls in an event-related potential study. Sentences conveying different emotional intonations were presented in lexical and pseudo-speech versions. Task varied between an explicit and an implicit instruction. Of specific interest was emotional salience detection from prosody, reflected in the P200 component. We predicted that patients with predominantly right-striatal dysfunction (LPD) would exhibit P200 alterations. Our results support this assumption. LPD patients showed enhanced P200 amplitudes, and specific deficits were observed for disgust prosody, explicit anger processing and implicit processing of happy prosody. Lexical speech was predominantly affected while the processing of pseudo-speech was largely intact. P200 amplitude in patients correlated significantly with left motor scores and asymmetry indices. The data suggest that emotional salience detection from prosody is affected by asymmetric neuronal degeneration in PD. PMID:22956665
An audiovisual emotion recognition system

NASA Astrophysics Data System (ADS)

Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun

2007-12-01

Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.
Linguistic Correlates of Social Anxiety Disorder

PubMed Central

Hofmann, Stefan G.; Moore, Philippa M.; Gutner, Cassidy; Weeks, Justin W.

2012-01-01

The goal of this study was to examine the linguistic correlates of social anxiety disorder (SAD). Twenty-four individuals with SAD (8 of them with a generalized subtype) and 21 nonanxious controls were asked to give speeches in front of an audience. The transcribed speeches were examined for the frequency of negations, I-statements, we-statements, negative emotion words, and positive emotion words. During their speech, individuals with either SAD subtype used positive emotion words more often than controls. No significant differences were observed in the other linguistic categories. These results are discussed in the context of evolutionary and cognitive perspectives of SAD. PMID:21851248
[The role of sex in voice restoration and emotional functioning after laryngectomy].

PubMed

Keszte, J; Wollbrück, D; Meyer, A; Fuchs, M; Meister, E; Pabst, F; Oeken, J; Schock, J; Wulke, C; Singer, S

2012-04-01

Data on psychosocial factors of laryngectomized women is rare. All means of alaryngeal voice production sound male due to low fundamental frequency and roughness, which makes postlaryngectomy voice rehabilitation especially challenging to women. Aim of this study was to investigate whether women use alaryngeal speech more seldomly and therefore are more emotionally distressed. In a cross-sectional multi-centred study 12 female and 138 male laryngectomees were interviewed. To identify risc factors on seldom use of alaryngeal speech and emotional functioning, logistic regression was used and odds ratios were adjusted to age, time since laryngectomy, physical functioning, social activity and feelings of stigmatization. Esophageal speech was used by 83% of the female and 57% of the male patients, prosthetic speech was used by 17% of the female and 20% of the male patients and electrolaryngeal speech was used by 17% of the female and 29% of the male patients. There was a higher risk for laryngectomees to be more emotionally distressed when feeling physically bad (OR=2,48; p=0,02) or having feelings of stigmatization (OR=3,94; p≤0,00). Besides more women tended to be socially active than men (83% vs. 54%; p=0,05). There was no influence of sex neither on use of alaryngeal speech nor on emotional functioning. Since there is evidence for a different psychosocial adjustment in laryngectomized men and women, more investigation including bigger sample sizes will be needed on this special issue. © Georg Thieme Verlag KG Stuttgart · New York.
On the recognition of emotional vocal expressions: motivations for a holistic approach.

PubMed

Esposito, Anna; Esposito, Antonietta M

2012-10-01

Human beings seem to be able to recognize emotions from speech very well and information communication technology aims to implement machines and agents that can do the same. However, to be able to automatically recognize affective states from speech signals, it is necessary to solve two main technological problems. The former concerns the identification of effective and efficient processing algorithms capable of capturing emotional acoustic features from speech sentences. The latter focuses on finding computational models able to classify, with an approximation as good as human listeners, a given set of emotional states. This paper will survey these topics and provide some insights for a holistic approach to the automatic analysis, recognition and synthesis of affective states.
Anxiety and speaking in people who stutter: an investigation using the emotional Stroop task.

PubMed

Hennessey, Neville W; Dourado, Esther; Beilby, Janet M

2014-06-01

People with anxiety disorders show an attentional bias towards threat or negative emotion words. This exploratory study examined whether people who stutter (PWS), who can be anxious when speaking, show similar bias and whether reactions to threat words also influence speech motor planning and execution. Comparisons were made between 31 PWS and 31 fluent controls in a modified emotional Stroop task where, depending on a visual cue, participants named the colour of threat and neutral words at either a normal or fast articulation rate. In a manual version of the same task participants pressed the corresponding colour button with either a long or short duration. PWS but not controls were slower to respond to threat words than neutral words, however, this emotionality effect was only evident for verbal responding. Emotionality did not interact with speech rate, but the size of the emotionality effect among PWS did correlate with frequency of stuttering. Results suggest PWS show an attentional bias to threat words similar to that found in people with anxiety disorder. In addition, this bias appears to be contingent on engaging the speech production system as a response modality. No evidence was found to indicate that emotional reactivity during the Stroop task constrains or destabilises, perhaps via arousal mechanisms, speech motor adjustment or execution for PWS. The reader will be able to: (1) explain the importance of cognitive aspects of anxiety, such as attentional biases, in the possible cause and/or maintenance of anxiety in people who stutter, (2) explain how the emotional Stroop task can be used as a measure of attentional bias to threat information, and (3) evaluate the findings with respect to the relationship between attentional bias to threat information and speech production in people who stutter. Copyright © 2013 Elsevier Inc. All rights reserved.

Real-time speech-driven animation of expressive talking faces

NASA Astrophysics Data System (ADS)

Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli

2011-05-01

In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.
Impact of personality on the cerebral processing of emotional prosody.

PubMed

Brück, Carolin; Kreifelts, Benjamin; Kaza, Evangelia; Lotze, Martin; Wildgruber, Dirk

2011-09-01

While several studies have focused on identifying common brain mechanisms governing the decoding of emotional speech melody, interindividual variations in the cerebral processing of prosodic information, in comparison, have received only little attention to date: Albeit, for instance, differences in personality among individuals have been shown to modulate emotional brain responses, personality influences on the neural basis of prosody decoding have not been investigated systematically yet. Thus, the present study aimed at delineating relationships between interindividual differences in personality and hemodynamic responses evoked by emotional speech melody. To determine personality-dependent modulations of brain reactivity, fMRI activation patterns during the processing of emotional speech cues were acquired from 24 healthy volunteers and subsequently correlated with individual trait measures of extraversion and neuroticism obtained for each participant. Whereas correlation analysis did not indicate any link between brain activation and extraversion, strong positive correlations between measures of neuroticism and hemodynamic responses of the right amygdala, the left postcentral gyrus as well as medial frontal structures including the right anterior cingulate cortex emerged, suggesting that brain mechanisms mediating the decoding of emotional speech melody may vary depending on differences in neuroticism among individuals. Observed trait-specific modulations are discussed in the light of processing biases as well as differences in emotion control or task strategies which may be associated with the personality trait of neuroticism. Copyright © 2011 Elsevier Inc. All rights reserved.
Emotional Expression in Husbands and Wives.

ERIC Educational Resources Information Center

Notarius, Clifford I.; Johnson, Jennifer S.

1982-01-01

Investigated the emotional expression and physiological reactivity of spouses (N=6) as they discussed a salient interpersonal issue. Results indicated that wive's speech was characterized by less neutral and more negative behavior. Wives also reciprocated their husbands' positive and negative speech, while husbands did not reciprocate their wives'…
End-to-End Multimodal Emotion Recognition Using Deep Neural Networks

NASA Astrophysics Data System (ADS)

Tzirakis, Panagiotis; Trigeorgis, George; Nicolaou, Mihalis A.; Schuller, Bjorn W.; Zafeiriou, Stefanos

2017-12-01

Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications can be found in many domains including multimedia retrieval and human computer interaction. In recent years, deep neural networks have been used with great success in determining emotional states. Inspired by this success, we propose an emotion recognition system using auditory and visual modalities. To capture the emotional content for various styles of speaking, robust features need to be extracted. To this purpose, we utilize a Convolutional Neural Network (CNN) to extract features from the speech, while for the visual modality a deep residual network (ResNet) of 50 layers. In addition to the importance of feature extraction, a machine learning algorithm needs also to be insensitive to outliers while being able to model the context. To tackle this problem, Long Short-Term Memory (LSTM) networks are utilized. The system is then trained in an end-to-end fashion where - by also taking advantage of the correlations of the each of the streams - we manage to significantly outperform the traditional approaches based on auditory and visual handcrafted features for the prediction of spontaneous and natural emotions on the RECOLA database of the AVEC 2016 research challenge on emotion recognition.
Classification Influence of Features on Given Emotions and Its Application in Feature Selection

NASA Astrophysics Data System (ADS)

Xing, Yin; Chen, Chuang; Liu, Li-Long

2018-04-01

In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.
Processing of prosodic changes in natural speech stimuli in school-age children.

PubMed

Lindström, R; Lepistö, T; Makkonen, T; Kujala, T

2012-12-01

Speech prosody conveys information about important aspects of communication: the meaning of the sentence and the emotional state or intention of the speaker. The present study addressed processing of emotional prosodic changes in natural speech stimuli in school-age children (mean age 10 years) by recording the electroencephalogram, facial electromyography, and behavioral responses. The stimulus was a semantically neutral Finnish word uttered with four different emotional connotations: neutral, commanding, sad, and scornful. In the behavioral sound-discrimination task the reaction times were fastest for the commanding stimulus and longest for the scornful stimulus, and faster for the neutral than for the sad stimulus. EEG and EMG responses were measured during non-attentive oddball paradigm. Prosodic changes elicited a negative-going, fronto-centrally distributed neural response peaking at about 500 ms from the onset of the stimulus, followed by a fronto-central positive deflection, peaking at about 740 ms. For the commanding stimulus also a rapid negative deflection peaking at about 290 ms from stimulus onset was elicited. No reliable stimulus type specific rapid facial reactions were found. The results show that prosodic changes in natural speech stimuli activate pre-attentive neural change-detection mechanisms in school-age children. However, the results do not support the suggestion of automaticity of emotion specific facial muscle responses to non-attended emotional speech stimuli in children. Copyright © 2012 Elsevier B.V. All rights reserved.
Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech.

PubMed

Poon, Matthew; Schutz, Michael

2015-01-01

Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music.
Hearing Feelings: Affective Categorization of Music and Speech in Alexithymia, an ERP Study

PubMed Central

Goerlich, Katharina Sophia; Witteman, Jurriaan; Aleman, André; Martens, Sander

2011-01-01

Background Alexithymia, a condition characterized by deficits in interpreting and regulating feelings, is a risk factor for a variety of psychiatric conditions. Little is known about how alexithymia influences the processing of emotions in music and speech. Appreciation of such emotional qualities in auditory material is fundamental to human experience and has profound consequences for functioning in daily life. We investigated the neural signature of such emotional processing in alexithymia by means of event-related potentials. Methodology Affective music and speech prosody were presented as targets following affectively congruent or incongruent visual word primes in two conditions. In two further conditions, affective music and speech prosody served as primes and visually presented words with affective connotations were presented as targets. Thirty-two participants (16 male) judged the affective valence of the targets. We tested the influence of alexithymia on cross-modal affective priming and on N400 amplitudes, indicative of individual sensitivity to an affective mismatch between words, prosody, and music. Our results indicate that the affective priming effect for prosody targets tended to be reduced with increasing scores on alexithymia, while no behavioral differences were observed for music and word targets. At the electrophysiological level, alexithymia was associated with significantly smaller N400 amplitudes in response to affectively incongruent music and speech targets, but not to incongruent word targets. Conclusions Our results suggest a reduced sensitivity for the emotional qualities of speech and music in alexithymia during affective categorization. This deficit becomes evident primarily in situations in which a verbalization of emotional information is required. PMID:21573026
Structure and weights optimisation of a modified Elman network emotion classifier using hybrid computational intelligence algorithms: a comparative study

NASA Astrophysics Data System (ADS)

Sheikhan, Mansour; Abbasnezhad Arabi, Mahdi; Gharavian, Davood

2015-10-01

Artificial neural networks are efficient models in pattern recognition applications, but their performance is dependent on employing suitable structure and connection weights. This study used a hybrid method for obtaining the optimal weight set and architecture of a recurrent neural emotion classifier based on gravitational search algorithm (GSA) and its binary version (BGSA), respectively. By considering the features of speech signal that were related to prosody, voice quality, and spectrum, a rich feature set was constructed. To select more efficient features, a fast feature selection method was employed. The performance of the proposed hybrid GSA-BGSA method was compared with similar hybrid methods based on particle swarm optimisation (PSO) algorithm and its binary version, PSO and discrete firefly algorithm, and hybrid of error back-propagation and genetic algorithm that were used for optimisation. Experimental tests on Berlin emotional database demonstrated the superior performance of the proposed method using a lighter network structure.
What is the Value of Embedding Artificial Emotional Prosody in Human–Computer Interactions? Implications for Theory and Design in Psychological Science

PubMed Central

Mitchell, Rachel L. C.; Xu, Yi

2015-01-01

In computerized technology, artificial speech is becoming increasingly important, and is already used in ATMs, online gaming and healthcare contexts. However, today’s artificial speech typically sounds monotonous, a main reason for this being the lack of meaningful prosody. One particularly important function of prosody is to convey different emotions. This is because successful encoding and decoding of emotions is vital for effective social cognition, which is increasingly recognized in human–computer interaction contexts. Current attempts to artificially synthesize emotional prosody are much improved relative to early attempts, but there remains much work to be done due to methodological problems, lack of agreed acoustic correlates, and lack of theoretical grounding. If the addition of synthetic emotional prosody is not of sufficient quality, it may risk alienating users instead of enhancing their experience. So the value of embedding emotion cues in artificial speech may ultimately depend on the quality of the synthetic emotional prosody. However, early evidence on reactions to synthesized non-verbal cues in the facial modality bodes well. Attempts to implement the recognition of emotional prosody into artificial applications and interfaces have perhaps been met with greater success, but the ultimate test of synthetic emotional prosody will be to critically compare how people react to synthetic emotional prosody vs. natural emotional prosody, at the behavioral, socio-cognitive and neural levels. PMID:26617563
Exploring Speech Recognition Technology: Children with Learning and Emotional/Behavioral Disorders.

ERIC Educational Resources Information Center

Faris-Cole, Debra; Lewis, Rena

2001-01-01

Intermediate grade students with disabilities in written expression and emotional/behavioral disorders were trained to use discrete or continuous speech input devices for written work. The study found extreme variability in the fidelity of the devices, PowerSecretary and Dragon NaturallySpeaking ranging from 49 percent to 87 percent. Both devices…
Emotional Speech Acts and the Educational Perlocutions of Speech

ERIC Educational Resources Information Center

Gasparatou, Renia

2016-01-01

Over the past decades, there has been an ongoing debate about whether education should aim at the cultivation of emotional wellbeing of self-esteeming personalities or whether it should prioritise literacy and the cognitive development of students. However, it might be the case that the two are not easily distinguished in educational contexts. In…
Differences in the speech of 10- to 13-year-old boys from divorced and nondivorced families against the background of emotional attachment.

PubMed

Böhm, Birgit

2004-01-01

In Germany, an increasing number of children live with one parent alone and have to cope with the separation or divorce of their parents. Emotional drawbacks have frequently been hypothesized for these children. Thus, we studied if such experiences are reflected in speech behavior. Twenty-eight 10- to 13-year-old boys from separated parents (physical separation of the parents was 2 years before the investigation) were compared with 26 boys from parents living together in an interview focusing on attachment-related themes and everyday situations. The interviews were analyzed with regard to coherence of speech, coping with emotional problems, reflectivity, child representation of both parents, and verbal and nonverbal expression of feelings. Boys from separated parents had incoherent speech, difficulties in coping with emotional problems, a poorer reflectivity (thinking about their own mental states and those of others), they represented neither parent supportively and did not show their feelings openly. These results can be traced back to an insecure attachment representation of the boys with separated parents. Copyright 2004 S. Karger AG, Basel
Effects of musical expertise on oscillatory brain activity in response to emotional sounds.

PubMed

Nolden, Sophie; Rigoulot, Simon; Jolicoeur, Pierre; Armony, Jorge L

2017-08-01

Emotions can be conveyed through a variety of channels in the auditory domain, be it via music, non-linguistic vocalizations, or speech prosody. Moreover, recent studies suggest that expertise in one sound category can impact the processing of emotional sounds in other sound categories as they found that musicians process more efficiently emotional musical and vocal sounds than non-musicians. However, the neural correlates of these modulations, especially their time course, are not very well understood. Consequently, we focused here on how the neural processing of emotional information varies as a function of sound category and expertise of participants. Electroencephalogram (EEG) of 20 non-musicians and 17 musicians was recorded while they listened to vocal (speech and vocalizations) and musical sounds. The amplitude of EEG-oscillatory activity in the theta, alpha, beta, and gamma band was quantified and Independent Component Analysis (ICA) was used to identify underlying components of brain activity in each band. Category differences were found in theta and alpha bands, due to larger responses to music and speech than to vocalizations, and in posterior beta, mainly due to differential processing of speech. In addition, we observed greater activation in frontal theta and alpha for musicians than for non-musicians, as well as an interaction between expertise and emotional content of sounds in frontal alpha. The results reflect musicians' expertise in recognition of emotion-conveying music, which seems to also generalize to emotional expressions conveyed by the human voice, in line with previous accounts of effects of expertise on musical and vocal sounds processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech

PubMed Central

Poon, Matthew; Schutz, Michael

2015-01-01

Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound “happier” than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of “balanced” major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach’s Well-Tempered Clavier (book 1), as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma (“A,” “B,” “C,” etc.). Consistent with predictions derived from speech, we found major-key (nominally “happy”) pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally “sad”) pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music. PMID:26578990
Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

PubMed Central

Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio

2014-01-01

Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686
ERP evidence for the recognition of emotional prosody through simulated cochlear implant strategies.

PubMed

Agrawal, Deepashri; Timm, Lydia; Viola, Filipa Campos; Debener, Stefan; Büchner, Andreas; Dengler, Reinhard; Wittfoth, Matthias

2012-09-20

Emotionally salient information in spoken language can be provided by variations in speech melody (prosody) or by emotional semantics. Emotional prosody is essential to convey feelings through speech. In sensori-neural hearing loss, impaired speech perception can be improved by cochlear implants (CIs). Aim of this study was to investigate the performance of normal-hearing (NH) participants on the perception of emotional prosody with vocoded stimuli. Semantically neutral sentences with emotional (happy, angry and neutral) prosody were used. Sentences were manipulated to simulate two CI speech-coding strategies: the Advance Combination Encoder (ACE) and the newly developed Psychoacoustic Advanced Combination Encoder (PACE). Twenty NH adults were asked to recognize emotional prosody from ACE and PACE simulations. Performance was assessed using behavioral tests and event-related potentials (ERPs). Behavioral data revealed superior performance with original stimuli compared to the simulations. For simulations, better recognition for happy and angry prosody was observed compared to the neutral. Irrespective of simulated or unsimulated stimulus type, a significantly larger P200 event-related potential was observed for happy prosody after sentence onset than the other two emotions. Further, the amplitude of P200 was significantly more positive for PACE strategy use compared to the ACE strategy. Results suggested P200 peak as an indicator of active differentiation and recognition of emotional prosody. Larger P200 peak amplitude for happy prosody indicated importance of fundamental frequency (F0) cues in prosody processing. Advantage of PACE over ACE highlighted a privileged role of the psychoacoustic masking model in improving prosody perception. Taken together, the study emphasizes on the importance of vocoded simulation to better understand the prosodic cues which CI users may be utilizing.
Impact of human emotions on physiological characteristics

NASA Astrophysics Data System (ADS)

Partila, P.; Voznak, M.; Peterek, T.; Penhaker, M.; Novak, V.; Tovarek, J.; Mehic, Miralem; Vojtech, L.

2014-05-01

Emotional states of humans and their impact on physiological and neurological characteristics are discussed in this paper. This problem is the goal of many teams who have dealt with this topic. Nowadays, it is necessary to increase the accuracy of methods for obtaining information about correlations between emotional state and physiological changes. To be able to record these changes, we focused on two majority emotional states. Studied subjects were psychologically stimulated to neutral - calm and then to the stress state. Electrocardiography, Electroencephalography and blood pressure represented neurological and physiological samples that were collected during patient's stimulated conditions. Speech activity was recording during the patient was reading selected text. Feature extraction was calculated by speech processing operations. Classifier based on Gaussian Mixture Model was trained and tested using Mel-Frequency Cepstral Coefficients extracted from the patient's speech. All measurements were performed in a chamber with electromagnetic compatibility. The article discusses a method for determining the influence of stress emotional state on the human and his physiological and neurological changes.
Attitudes toward Speech Disorders: Sampling the Views of Cantonese-Speaking Americans.

ERIC Educational Resources Information Center

Bebout, Linda; Arthur, Bradford

1997-01-01

A study of 60 Chinese Americans and 46 controls found the Chinese Americans were more likely to believe persons with speech disorders could improve speech by "trying hard," to view people using deaf speech and people with cleft palates as perhaps being emotionally disturbed, and to regard deaf speech as a limitation. (Author/CR)
Children with bilateral cochlear implants identify emotion in speech and music.

PubMed

Volkova, Anna; Trehub, Sandra E; Schellenberg, E Glenn; Papsin, Blake C; Gordon, Karen A

2013-03-01

This study examined the ability of prelingually deaf children with bilateral implants to identify emotion (i.e. happiness or sadness) in speech and music. Participants in Experiment 1 were 14 prelingually deaf children from 5-7 years of age who had bilateral implants and 18 normally hearing children from 4-6 years of age. They judged whether linguistically neutral utterances produced by a man and woman sounded happy or sad. Participants in Experiment 2 were 14 bilateral implant users from 4-6 years of age and the same normally hearing children as in Experiment 1. They judged whether synthesized piano excerpts sounded happy or sad. Child implant users' accuracy of identifying happiness and sadness in speech was well above chance levels but significantly below the accuracy achieved by children with normal hearing. Similarly, their accuracy of identifying happiness and sadness in music was well above chance levels but significantly below that of children with normal hearing, who performed at ceiling. For the 12 implant users who participated in both experiments, performance on the speech task correlated significantly with performance on the music task and implant experience was correlated with performance on both tasks. Child implant users' accurate identification of emotion in speech exceeded performance in previous studies, which may be attributable to fewer response alternatives and the use of child-directed speech. Moreover, child implant users' successful identification of emotion in music indicates that the relevant cues are accessible at a relatively young age.

The Effects of the Literal Meaning of Emotional Phrases on the Identification of Vocal Emotions.

PubMed

Shigeno, Sumi

2018-02-01

This study investigates the discrepancy between the literal emotional content of speech and emotional tone in the identification of speakers' vocal emotions in both the listeners' native language (Japanese), and in an unfamiliar language (random-spliced Japanese). Both experiments involve a "congruent condition," in which the emotion contained in the literal meaning of speech (words and phrases) was compatible with vocal emotion, and an "incongruent condition," in which these forms of emotional information were discordant. Results for Japanese indicated that performance in identifying emotions did not differ significantly between the congruent and incongruent conditions. However, the results for random-spliced Japanese indicated that vocal emotion was correctly identified more often in the congruent than in the incongruent condition. The different results for Japanese and random-spliced Japanese suggested that the literal meaning of emotional phrases influences the listener's perception of the speaker's emotion, and that Japanese participants could infer speakers' intended emotions in the incongruent condition.
Inhibitory Control as a Moderator of Threat-related Interference Biases in Social Anxiety

PubMed Central

Gorlin, Eugenia I.; Teachman, Bethany A.

2014-01-01

Prior findings are mixed regarding the presence and direction of threat-related interference biases in social anxiety. The current study examined general inhibitory control (IC), measured by the classic color-word Stroop, as a moderator of the relationship between both threat interference biases (indexed by the emotional Stroop) and several social anxiety indicators. High socially anxious undergraduate students (N=159) completed the emotional and color-word Stroop tasks, followed by an anxiety-inducing speech task. Participants completed measures of trait social anxiety, state anxiety before and during the speech, negative task-interfering cognitions during the speech, and overall self-evaluation of speech performance. Speech duration was used to measure behavioral avoidance. In line with hypotheses, IC moderated the relationship between emotional Stroop bias and every anxiety indicator (with the exception of behavioral avoidance), such that greater social-threat interference was associated with higher anxiety among those with weak IC, whereas lesser social-threat interference was associated with higher anxiety among those with strong IC. Implications for the theory and treatment of threat interference biases in socially anxious individuals are discussed. PMID:24967719
The Relationships between Processing Facial Identity, Emotional Expression, Facial Speech, and Gaze Direction during Development

ERIC Educational Resources Information Center

Spangler, Sibylle M.; Schwarzer, Gudrun; Korell, Monika; Maier-Karius, Johanna

2010-01-01

Four experiments were conducted with 5- to 11-year-olds and adults to investigate whether facial identity, facial speech, emotional expression, and gaze direction are processed independently of or in interaction with one another. In a computer-based, speeded sorting task, participants sorted faces according to facial identity while disregarding…
IQ, Non-Cognitive and Social-Emotional Parameters Influencing Education in Speech- and Language-Impaired Children

ERIC Educational Resources Information Center

Ullrich, Dieter; Ullrich, Katja; Marten, Magret

2017-01-01

Speech-/language-impaired (SL)-children face problems in school and later life. The significance of "non-cognitive, social-emotional skills" (NCSES) in these children is often underestimated. Aim: Present study of affected SL-children was assessed to analyse the influence of NCSES for long-term school education. Methods: Nineteen…
Speech-rhythm characteristics of client-centered, Gestalt, and rational-emotive therapy interviews.

PubMed

Chen, C L

1981-07-01

The aim of this study was to discover whether client-centered, Gestalt, and rational-emotive psychotherapy interviews could be described and differentiated on the basis of quantitative measurement of their speech rhythms. These measures were taken from the sound portion of a film showing interviews by Carl Rogers, Frederick Perls, and Albert Ellis. The variables used were total session and percentage of speaking times, speaking turns, vocalizations, interruptions, inside and switching pauses, and speaking rates. The three types of interview had very distinctive patterns of speech-rhythm variables. These patterns suggested that Rogers's Client-centered therapy interview was patient dominated, that Ellis's rational-emotive therapy interview was therapist dominated, and that Perls's Gestalt therapy interview was neither therapist nor patient dominated.
Emotional speech comprehension in children and adolescents with autism spectrum disorders.

PubMed

Le Sourn-Bissaoui, Sandrine; Aguert, Marc; Girard, Pauline; Chevreuil, Claire; Laval, Virginie

2013-01-01

We examined the understanding of emotional speech by children and adolescents with autism spectrum disorders (ASD). We predicted that they would have difficulty understanding emotional speech, not because of an emotional prosody processing impairment but because of problems drawing appropriate inferences, especially in multiple-cue environments. Twenty-six children and adolescents with ASD and 26 typically developing controls performed a computerized task featuring emotional prosody, either embedded in a discrepant context or without any context at all. They must identify the speaker's feeling. When the prosody was the sole cue, participants with ASD performed just as well as controls, relying on this cue to infer the speaker's intention. When the prosody was embedded in a discrepant context, both ASD and TD participants exhibited a contextual bias and a negativity bias. However ASD participants relied less on the emotional prosody than the controls when it was positive. We discuss these findings with respect to executive function and intermodal processing. After reading this article, the reader should be able to (1) describe the ASD participants pragmatic impairments, (2) explain why ASD participants did not have an emotional prosody processing impairment, and (3) explain why ASD participants had difficulty inferring the speaker's intention from emotional prosody in a discrepant situation. Copyright © 2013 Elsevier Inc. All rights reserved.
Elements of a Plan-Based Theory of Speech Acts. Technical Report No. 141.

ERIC Educational Resources Information Center

Cohen, Philip R.; Perrault, C. Raymond

This report proposes that people often plan their speech acts to affect their listeners' beliefs, goals, and emotional states and that such language use can be modeled by viewing speech acts as operators in a planning system, allowing both physical and speech acts to be integrated into plans. Methodological issues of how speech acts should be…
Frontal Brain Electrical Activity (EEG) and Heart Rate in Response to Affective Infant-Directed (ID) Speech in 9-Month-Old Infants

ERIC Educational Resources Information Center

Santesso, Diane L.; Schmidt, Louis A.; Trainor, Laurel J.

2007-01-01

Many studies have shown that infants prefer infant-directed (ID) speech to adult-directed (AD) speech. ID speech functions to aid language learning, obtain and/or maintain an infant's attention, and create emotional communication between the infant and caregiver. We examined psychophysiological responses to ID speech that varied in affective…
Effects of social cognitive impairment on speech disorder in schizophrenia.

PubMed

Docherty, Nancy M; McCleery, Amanda; Divilbiss, Marielle; Schumann, Emily B; Moe, Aubrey; Shakeel, Mohammed K

2013-05-01

Disordered speech in schizophrenia impairs social functioning because it impedes communication with others. Treatment approaches targeting this symptom have been limited by an incomplete understanding of its causes. This study examined the process underpinnings of speech disorder, assessed in terms of communication failure. Contributions of impairments in 2 social cognitive abilities, emotion perception and theory of mind (ToM), to speech disorder were assessed in 63 patients with schizophrenia or schizoaffective disorder and 21 nonpsychiatric participants, after controlling for the effects of verbal intelligence and impairments in basic language-related neurocognitive abilities. After removal of the effects of the neurocognitive variables, impairments in emotion perception and ToM each explained additional variance in speech disorder in the patients but not the controls. The neurocognitive and social cognitive variables, taken together, explained 51% of the variance in speech disorder in the patients. Schizophrenic disordered speech may be less a concomitant of "positive" psychotic process than of illness-related limitations in neurocognitive and social cognitive functioning.
Speech comprehension and emotional/behavioral problems in children with specific language impairment (SLI).

PubMed

Gregl, Ana; Kirigin, Marin; Bilać, Snjeiana; Sućeska Ligutić, Radojka; Jaksić, Nenad; Jakovljević, Miro

2014-09-01

This research aims to investigate differences in speech comprehension between children with specific language impairment (SLI) and their developmentally normal peers, and the relationship between speech comprehension and emotional/behavioral problems on Achenbach's Child Behavior Checklist (CBCL) and Caregiver Teacher's Report Form (C-TRF) according to the DSMIV The clinical sample comprised 97preschool children with SLI, while the peer sample comprised 60 developmentally normal preschool children. Children with SLI had significant delays in speech comprehension and more emotional/behavioral problems than peers. In children with SLI, speech comprehension significantly correlated with scores on Attention Deficit/Hyperactivity Problems (CBCL and C-TRF), and Pervasive Developmental Problems scales (CBCL)(p<0.05). In the peer sample, speech comprehension significantly correlated with scores on Affective Problems and Attention Deficit/Hyperactivity Problems (C-TRF) scales. Regression analysis showed that 12.8% of variance in speech comprehension is saturated with 5 CBCL variables, of which Attention Deficit/Hyperactivity (beta = -0.281) and Pervasive Developmental Problems (beta = -0.280) are statistically significant (p < 0.05). In the reduced regression model Attention Deficit/Hyperactivity explains 7.3% of the variance in speech comprehension, (beta = -0.270, p < 0.01). It is possible that, to a certain degree, the same neurodevelopmental process lies in the background of problems with speech comprehension, problems with attention and hyperactivity, and pervasive developmental problems. This study confirms the importance of triage for behavioral problems and attention training in the rehabilitation of children with SLI and children with normal language development that exhibit ADHD symptoms.
Expression of Emotion in Eastern and Western Music Mirrors Vocalization

PubMed Central

Bowling, Daniel Liu; Sundararajan, Janani; Han, Shui'er; Purves, Dale

2012-01-01

In Western music, the major mode is typically used to convey excited, happy, bright or martial emotions, whereas the minor mode typically conveys subdued, sad or dark emotions. Recent studies indicate that the differences between these modes parallel differences between the prosodic and spectral characteristics of voiced speech sounds uttered in corresponding emotional states. Here we ask whether tonality and emotion are similarly linked in an Eastern musical tradition. The results show that the tonal relationships used to express positive/excited and negative/subdued emotions in classical South Indian music are much the same as those used in Western music. Moreover, tonal variations in the prosody of English and Tamil speech uttered in different emotional states are parallel to the tonal trends in music. These results are consistent with the hypothesis that the association between musical tonality and emotion is based on universal vocal characteristics of different affective states. PMID:22431970
Expression of emotion in Eastern and Western music mirrors vocalization.

PubMed

Bowling, Daniel Liu; Sundararajan, Janani; Han, Shui'er; Purves, Dale

2012-01-01

In Western music, the major mode is typically used to convey excited, happy, bright or martial emotions, whereas the minor mode typically conveys subdued, sad or dark emotions. Recent studies indicate that the differences between these modes parallel differences between the prosodic and spectral characteristics of voiced speech sounds uttered in corresponding emotional states. Here we ask whether tonality and emotion are similarly linked in an Eastern musical tradition. The results show that the tonal relationships used to express positive/excited and negative/subdued emotions in classical South Indian music are much the same as those used in Western music. Moreover, tonal variations in the prosody of English and Tamil speech uttered in different emotional states are parallel to the tonal trends in music. These results are consistent with the hypothesis that the association between musical tonality and emotion is based on universal vocal characteristics of different affective states.
Judgments of Emotion in Clear and Conversational Speech by Young Adults with Normal Hearing and Older Adults with Hearing Impairment

ERIC Educational Resources Information Center

Morgan, Shae D.; Ferguson, Sarah Hargus

2017-01-01

Purpose: In this study, we investigated the emotion perceived by young listeners with normal hearing (YNH listeners) and older adults with hearing impairment (OHI listeners) when listening to speech produced conversationally or in a clear speaking style. Method: The first experiment included 18 YNH listeners, and the second included 10 additional…
The Role of Visual Image and Perception in Speech Development of Children with Speech Pathology

ERIC Educational Resources Information Center

Tsvetkova, L. S.; Kuznetsova, T. M.

1977-01-01

Investigated with 125 children (4-14 years old) with speech, language, or emotional disorders was the assumption that the naming function can be underdeveloped because of defects in the word's gnostic base. (Author/DB)
Speaking under pressure: low linguistic complexity is linked to high physiological and emotional stress reactivity.

PubMed

Saslow, Laura R; McCoy, Shannon; van der Löwe, Ilmo; Cosley, Brandon; Vartan, Arbi; Oveis, Christopher; Keltner, Dacher; Moskowitz, Judith T; Epel, Elissa S

2014-03-01

What can a speech reveal about someone's state? We tested the idea that greater stress reactivity would relate to lower linguistic cognitive complexity while speaking. In Study 1, we tested whether heart rate and emotional stress reactivity to a stressful discussion would relate to lower linguistic complexity. In Studies 2 and 3, we tested whether a greater cortisol response to a standardized stressful task including a speech (Trier Social Stress Test) would be linked to speaking with less linguistic complexity during the task. We found evidence that measures of stress responsivity (emotional and physiological) and chronic stress are tied to variability in the cognitive complexity of speech. Taken together, these results provide evidence that our individual experiences of stress or "stress signatures"-how our body and mind react to stress both in the moment and over the longer term-are linked to how complex our speech under stress. Copyright © 2013 Society for Psychophysiological Research.
Emotions in freely varying and mono-pitched vowels, acoustic and EGG analyses.

PubMed

Waaramaa, Teija; Palo, Pertti; Kankare, Elina

2015-12-01

Vocal emotions are expressed either by speech or singing. The difference is that in singing the pitch is predetermined while in speech it may vary freely. It was of interest to study whether there were voice quality differences between freely varying and mono-pitched vowels expressed by professional actors. Given their profession, actors have to be able to express emotions both by speech and singing. Electroglottogram and acoustic analyses of emotional utterances embedded in expressions of freely varying vowels [a:], [i:], [u:] (96 samples) and mono-pitched protracted vowels (96 samples) were studied. Contact quotient (CQEGG) was calculated using 35%, 55%, and 80% threshold levels. Three different threshold levels were used in order to evaluate their effects on emotions. Genders were studied separately. The results suggested significant gender differences for CQEGG 80% threshold level. SPL, CQEGG, and F4 were used to convey emotions, but to a lesser degree, when F0 was predetermined. Moreover, females showed fewer significant variations than males. Both genders used more hypofunctional phonation type in mono-pitched utterances than in the expressions with freely varying pitch. The present material warrants further study of the interplay between CQEGG threshold levels and formant frequencies, and listening tests to investigate the perceptual value of the mono-pitched vowels in the communication of emotions.
"Do We Make Ourselves Clear?" Developing a Social, Emotional and Behavioural Difficulties (SEBD) Support Service's Effectiveness in Detecting and Supporting Children Experiencing Speech, Language and Communication Difficulties (SLCD)

ERIC Educational Resources Information Center

Stiles, Matthew

2013-01-01

Research has identified a significant relationship between social, emotional and behavioural difficulties (SEBD) and speech, language and communication difficulties (SLCD). However, little has been published regarding the levels of knowledge and skill that practitioners working with pupils experiencing SEBD have in this important area, nor how…
From Speech to Emotional Interaction: EmotiRob Project

NASA Astrophysics Data System (ADS)

Le Tallec, Marc; Saint-Aimé, Sébastien; Jost, Céline; Villaneau, Jeanne; Antoine, Jean-Yves; Letellier-Zarshenas, Sabine; Le-Pévédic, Brigitte; Duhaut, Dominique

This article presents research work done in the domain of nonverbal emotional interaction for the EmotiRob project. It is a component of the MAPH project, the objective of which is to give comfort to vulnerable children and/or those undergoing long-term hospitalisation through the help of an emotional robot companion. It is important to note that we are not trying to reproduce human emotion and behavior, but trying to make a robot emotionally expressive. This paper will present the different hypotheses we have used from understanding to emotional reaction. We begin the article with a presentation of the MAPH and EmotiRob project. Then, we quickly describe the speech undestanding system, the iGrace computational model of emotions and integration of dynamics behavior. We conclude with a description of the architecture of Emi, as well as improvements to be made to its next generation.
The Reliability of Methodological Ratings for speechBITE Using the PEDro-P Scale

ERIC Educational Resources Information Center

Murray, Elizabeth; Power, Emma; Togher, Leanne; McCabe, Patricia; Munro, Natalie; Smith, Katherine

2013-01-01

Background: speechBITE (http://www.speechbite.com) is an online database established in order to help speech and language therapists gain faster access to relevant research that can used in clinical decision-making. In addition to containing more than 3000 journal references, the database also provides methodological ratings on the PEDro-P (an…
Methods for eliciting, annotating, and analyzing databases for child speech development.

PubMed

Beckman, Mary E; Plummer, Andrew R; Munson, Benjamin; Reidy, Patrick F

2017-09-01

Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.

Contralateral Bimodal Stimulation: A Way to Enhance Speech Performance in Arabic-Speaking Cochlear Implant Patients.

PubMed

Abdeltawwab, Mohamed M; Khater, Ahmed; El-Anwar, Mohammad W

2016-01-01

The combination of acoustic and electric stimulation as a way to enhance speech recognition performance in cochlear implant (CI) users has generated considerable interest in the recent years. The purpose of this study was to evaluate the bimodal advantage of the FS4 speech processing strategy in combination with hearing aids (HA) as a means to improve low-frequency resolution in CI patients. Nineteen postlingual CI adults were selected to participate in this study. All patients wore implants on one side and HA on the contralateral side with residual hearing. Monosyllabic word recognition, speech in noise, and emotion and talker identification were assessed using CI with fine structure processing/FS4 and high-definition continuous interleaved sampling strategies, HA alone, and a combination of CI and HA. The bimodal stimulation showed improvement in speech performance and emotion identification for the question/statement/order tasks, which was statistically significant compared to patients with CI alone, but there were no significant statistical differences in intragender talker discrimination and emotion identification for the happy/angry/neutral tasks. The poorest performance was obtained with HA only, and it was statistically significant compared to the other modalities. The bimodal stimulation showed enhanced speech performance in CI patients, and it improves the limitations provided by electric or acoustic stimulation alone. © 2016 S. Karger AG, Basel.
A study of speech emotion recognition based on hybrid algorithm

NASA Astrophysics Data System (ADS)

Zhu, Ju-xia; Zhang, Chao; Lv, Zhao; Rao, Yao-quan; Wu, Xiao-pei

2011-10-01

To effectively improve the recognition accuracy of the speech emotion recognition system, a hybrid algorithm which combines Continuous Hidden Markov Model (CHMM), All-Class-in-One Neural Network (ACON) and Support Vector Machine (SVM) is proposed. In SVM and ACON methods, some global statistics are used as emotional features, while in CHMM method, instantaneous features are employed. The recognition rate by the proposed method is 92.25%, with the rejection rate to be 0.78%. Furthermore, it obtains the relative increasing of 8.53%, 4.69% and 0.78% compared with ACON, CHMM and SVM methods respectively. The experiment result confirms the efficiency of distinguishing anger, happiness, neutral and sadness emotional states.
Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC)

PubMed Central

Narayanan, Shrikanth; Toutios, Asterios; Ramanarayanan, Vikram; Lammert, Adam; Kim, Jangwon; Lee, Sungbok; Nayak, Krishna; Kim, Yoon-Chul; Zhu, Yinghua; Goldstein, Louis; Byrd, Dani; Bresch, Erik; Ghosh, Prasanta; Katsamanis, Athanasios; Proctor, Michael

2014-01-01

USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community. PMID:25190403
Authentic and Play-Acted Vocal Emotion Expressions Reveal Acoustic Differences

PubMed Central

Jürgens, Rebecca; Hammerschmidt, Kurt; Fischer, Julia

2011-01-01

Play-acted emotional expressions are a frequent aspect in our life, ranging from deception to theater, film, and radio drama, to emotion research. To date, however, it remained unclear whether play-acted emotions correspond to spontaneous emotion expressions. To test whether acting influences the vocal expression of emotion, we compared radio sequences of naturally occurring emotions to actors’ portrayals. It was hypothesized that play-acted expressions were performed in a more stereotyped and aroused fashion. Our results demonstrate that speech segments extracted from play-acted and authentic expressions differ in their voice quality. Additionally, the play-acted speech tokens revealed a more variable F0-contour. Despite these differences, the results did not support the hypothesis that the variation was due to changes in arousal. This analysis revealed that differences in perception of play-acted and authentic emotional stimuli reported previously cannot simply be attributed to differences in arousal, but by slight and implicitly perceptible differences in encoding. PMID:21847385
Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis.

PubMed

Patel, Aniruddh D

2011-01-01

Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The "OPERA" hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.
Job Stress of School-Based Speech-Language Pathologists

ERIC Educational Resources Information Center

Harris, Stephanie Ferney; Prater, Mary Anne; Dyches, Tina Taylor; Heath, Melissa Allen

2009-01-01

Stress and burnout contribute significantly to the shortages of school-based speech-language pathologists (SLPs). At the request of the Utah State Office of Education, the researchers measured the stress levels of 97 school-based SLPs using the "Speech-Language Pathologist Stress Inventory." Results indicated that participants' emotional-fatigue…
Neurogenic Communication Disorders and Paralleling Agraphic Disturbances: Implications for Concerns in Basic Writing.

ERIC Educational Resources Information Center

De Jarnette, Glenda

Vertical and lateral integration are two important nervous system integrations that affect the development of oral behaviors. There are three progressions in the vertical integration process for speech nervous system development: R-complex speech (ritualistic, memorized expressions), limbic speech (emotional expressions), and cortical speech…
Quadcopter Control Using Speech Recognition

NASA Astrophysics Data System (ADS)

Malik, H.; Darma, S.; Soekirno, S.

2018-04-01

This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).
Disentangling the brain networks supporting affective speech comprehension.

PubMed

Hervé, Pierre-Yves; Razafimandimby, Annick; Vigneau, Mathieu; Mazoyer, Bernard; Tzourio-Mazoyer, Nathalie

2012-07-16

Areas involved in social cognition, such as the medial prefrontal cortex (mPFC) and the left temporo-parietal junction (TPJ) appear to be active during the classification of sentences according to emotional criteria (happy, angry or sad, [Beaucousin et al., 2007]). These two regions are frequently co-activated in studies about theory of mind (ToM). To confirm that these regions constitute a coherent network during affective speech comprehension, new event-related functional magnetic resonance imaging data were acquired, using the emotional and grammatical-person sentence classification tasks on a larger sample of 51 participants. The comparison of the emotional and grammatical tasks confirmed the previous findings. Functional connectivity analyses established a clear demarcation between a "Medial" network, including the mPFC and TPJ regions, and a bilateral "Language" network, which gathered inferior frontal and temporal areas. These findings suggest that emotional speech comprehension results from interactions between language, ToM and emotion processing networks. The language network, active during both tasks, would be involved in the extraction of lexical and prosodic emotional cues, while the medial network, active only during the emotional task, would drive the making of inferences about the sentences' emotional content, based on their meanings. The left and right amygdalae displayed a stronger response during the emotional condition, but were seldom correlated with the other regions, and thus formed a third entity. Finally, distinct regions belonging to the Language and Medial networks were found in the left angular gyrus, where these two systems could interface. Copyright © 2012 Elsevier Inc. All rights reserved.
Speaking under pressure: Low linguistic complexity is linked to high physiological and emotional stress reactivity

PubMed Central

Saslow, Laura R.; McCoy, Shannon; van der Löwe, Ilmo; Cosley, Brandon; Vartan, Arbi; Oveis, Christopher; Keltner, Dacher; Moskowitz, Judith T.; Epel, Elissa S.

2014-01-01

What can a speech reveal about someone's state? We tested the idea that greater stress reactivity would relate to lower linguistic cognitive complexity while speaking. In Study 1, we tested whether heart rate and emotional stress reactivity to a stressful discussion would relate to lower linguistic complexity. In Studies 2 and 3 we tested whether a greater cortisol response to a standardized stressful task including a speech (Trier Social Stress Test) would be linked to speaking with less linguistic complexity during the task. We found evidence that measures of stress responsivity (emotional and physiological) and chronic stress are tied to variability in the cognitive complexity of speech. Taken together, these results provide evidence that our individual experiences of stress or ‘stress signatures’—how our body and mind react to stress both in the moment and over the longer term—are linked to how complexly we speak under stress. PMID:24354732
A hypothesis on the biological origins and social evolution of music and dance.

PubMed

Wang, Tianyan

2015-01-01

The origins of music and musical emotions is still an enigma, here I propose a comprehensive hypothesis on the origins and evolution of music, dance, and speech from a biological and sociological perspective. I suggest that every pitch interval between neighboring notes in music represents corresponding movement pattern through interpreting the Doppler effect of sound, which not only provides a possible explanation for the transposition invariance of music, but also integrates music and dance into a common form-rhythmic movements. Accordingly, investigating the origins of music poses the question: why do humans appreciate rhythmic movements? I suggest that human appreciation of rhythmic movements and rhythmic events developed from the natural selection of organisms adapting to the internal and external rhythmic environments. The perception and production of, as well as synchronization with external and internal rhythms are so vital for an organism's survival and reproduction, that animals have a rhythm-related reward and emotion (RRRE) system. The RRRE system enables the appreciation of rhythmic movements and events, and is integral to the origination of music, dance and speech. The first type of rewards and emotions (rhythm-related rewards and emotions, RRREs) are evoked by music and dance, and have biological and social functions, which in turn, promote the evolution of music, dance and speech. These functions also evoke a second type of rewards and emotions, which I name society-related rewards and emotions (SRREs). The neural circuits of RRREs and SRREs develop in species formation and personal growth, with congenital and acquired characteristics, respectively, namely music is the combination of nature and culture. This hypothesis provides probable selection pressures and outlines the evolution of music, dance, and speech. The links between the Doppler effect and the RRREs and SRREs can be empirically tested, making the current hypothesis scientifically concrete.
Aging Affects Identification of Vocal Emotions in Semantically Neutral Sentences

ERIC Educational Resources Information Center

Dupuis, Kate; Pichora-Fuller, M. Kathleen

2015-01-01

Purpose: The authors determined the accuracy of younger and older adults in identifying vocal emotions using the Toronto Emotional Speech Set (TESS; Dupuis & Pichora-Fuller, 2010a) and investigated the possible contributions of auditory acuity and suprathreshold processing to emotion identification accuracy. Method: In 2 experiments, younger…
Gaze Aversion to Stuttered Speech: A Pilot Study Investigating Differential Visual Attention to Stuttered and Fluent Speech

ERIC Educational Resources Information Center

Bowers, Andrew L.; Crawcour, Stephen C.; Saltuklaroglu, Tim; Kalinowski, Joseph

2010-01-01

Background: People who stutter are often acutely aware that their speech disruptions, halted communication, and aberrant struggle behaviours evoke reactions in communication partners. Considering that eye gaze behaviours have emotional, cognitive, and pragmatic overtones for communicative interactions and that previous studies have indicated…
Stuttered and Fluent Speakers' Heart Rate and Skin Conductance in Response to Fluent and Stuttered Speech

ERIC Educational Resources Information Center

Zhang, Jianliang; Kalinowski, Joseph; Saltuklaroglu, Tim; Hudock, Daniel

2010-01-01

Background: Previous studies have found simultaneous increases in skin conductance response and decreases in heart rate when normally fluent speakers watched and listened to stuttered speech compared with fluent speech, suggesting that stuttering induces arousal and emotional unpleasantness in listeners. However, physiological responses of persons…
Emotional and Physiological Responses of Fluent Listeners while Watching the Speech of Adults Who Stutter

ERIC Educational Resources Information Center

Guntupalli, Vijaya K.; Everhart, D. Erik; Kalinowski, Joseph; Nanjundeswaran, Chayadevie; Saltuklaroglu, Tim

2007-01-01

Background: People who stutter produce speech that is characterized by intermittent, involuntary part-word repetitions and prolongations. In addition to these signature acoustic manifestations, those who stutter often display repetitive and fixated behaviours outside the speech producing mechanism (e.g. in the head, arm, fingers, nares, etc.).…
Perspective taking in children's narratives about jealousy.

PubMed

Aldrich, Naomi J; Tenenbaum, Harriet R; Brooks, Patricia J; Harrison, Karine; Sines, Jennie

2011-03-01

This study explored relationships between perspective-taking, emotion understanding, and children's narrative abilities. Younger (23 5-/6-year-olds) and older (24 7-/8-year-olds) children generated fictional narratives, using a wordless picture book, about a frog experiencing jealousy. Children's emotion understanding was assessed through a standardized test of emotion comprehension and their ability to convey the jealousy theme of the story. Perspective-taking ability was assessed with respect to children's use of narrative evaluation (i.e., narrative coherence, mental state language, supplementary evaluative speech, use of subjective language, and placement of emotion expression). Older children scored higher than younger children on emotion comprehension and on understanding the story's complex emotional theme, including the ability to identify a rival. They were more advanced in perspective-taking abilities, and selectively used emotion expressions to highlight story episodes. Subjective perspective taking and narrative coherence were predictive of children's elaboration of the jealousy theme. Use of supplementary evaluative speech, in turn, was predictive of both subjective perspective taking and narrative coherence. ©2010 The British Psychological Society.
Intimate insight: MDMA changes how people talk about significant others

PubMed Central

Baggott, Matthew J.; Kirkpatrick, Matthew G.; Bedi, Gillinder; de Wit, Harriet

2015-01-01

Rationale ±3,4-methylenedioxymethamphetamine (MDMA) is widely believed to increase sociability. The drug alters speech production and fluency, and may influence speech content. Here, we investigated the effect of MDMA on speech content, which may reveal how this drug affects social interactions. Method 35 healthy volunteers with prior MDMA experience completed this two-session, within-subjects, double-blind study during which they received 1.5 mg/kg oral MDMA and placebo. Participants completed a 5-min standardized talking task during which they discussed a close personal relationship (e.g., a friend or family member) with a research assistant. The conversations were analyzed for selected content categories (e.g., words pertaining to affect, social interaction, and cognition), using both a standard dictionary method (Pennebaker’s Linguistic Inquiry and Word Count: LIWC) and a machine learning method using random forest classifiers. Results Both analytic methods revealed that MDMA altered speech content relative to placebo. Using LIWC scores, the drug increased use of social and sexual words, consistent with reports that MDMA increases willingness to disclose. Using the machine learning algorithm, we found that MDMA increased use of social words and words relating to both positive and negative emotions. Conclusions These findings are consistent with reports that MDMA acutely alters speech content, specifically increasing emotional and social content during a brief semistructured dyadic interaction. Studying effects of psychoactive drugs on speech content may offer new insights into drug effects on mental states, and on emotional and psychosocial interaction. PMID:25922420
Intimate insight: MDMA changes how people talk about significant others.

PubMed

Baggott, Matthew J; Kirkpatrick, Matthew G; Bedi, Gillinder; de Wit, Harriet

2015-06-01

±3,4-methylenedioxymethamphetamine (MDMA) is widely believed to increase sociability. The drug alters speech production and fluency, and may influence speech content. Here, we investigated the effect of MDMA on speech content, which may reveal how this drug affects social interactions. Thirty-five healthy volunteers with prior MDMA experience completed this two-session, within-subjects, double-blind study during which they received 1.5 mg/kg oral MDMA and placebo. Participants completed a five-minute standardized talking task during which they discussed a close personal relationship (e.g. a friend or family member) with a research assistant. The conversations were analyzed for selected content categories (e.g. words pertaining to affect, social interaction, and cognition), using both a standard dictionary method (Pennebaker's Linguistic Inquiry and Word Count: LIWC) and a machine learning method using random forest classifiers. Both analytic methods revealed that MDMA altered speech content relative to placebo. Using LIWC scores, the drug increased use of social and sexual words, consistent with reports that MDMA increases willingness to disclose. Using the machine learning algorithm, we found that MDMA increased use of social words and words relating to both positive and negative emotions. These findings are consistent with reports that MDMA acutely alters speech content, specifically increasing emotional and social content during a brief semistructured dyadic interaction. Studying effects of psychoactive drugs on speech content may offer new insights into drug effects on mental states, and on emotional and psychosocial interaction. © The Author(s) 2015.
Frontal brain electrical activity (EEG) and heart rate in response to affective infant-directed (ID) speech in 9-month-old infants.

PubMed

Santesso, Diane L; Schmidt, Louis A; Trainor, Laurel J

2007-10-01

Many studies have shown that infants prefer infant-directed (ID) speech to adult-directed (AD) speech. ID speech functions to aid language learning, obtain and/or maintain an infant's attention, and create emotional communication between the infant and caregiver. We examined psychophysiological responses to ID speech that varied in affective content (i.e., love/comfort, surprise, fear) in a group of typically developing 9-month-old infants. Regional EEG and heart rate were collected continuously during stimulus presentation. We found the pattern of overall frontal EEG power was linearly related to affective intensity of the ID speech, such that EEG power was greatest in response to fear, than surprise than love/comfort; this linear pattern was specific to the frontal region. We also noted that heart rate decelerated to ID speech independent of affective content. As well, infants who were reported by their mothers as temperamentally distressed tended to exhibit greater relative right frontal EEG activity during baseline and in response to affective ID speech, consistent with previous work with visual stimuli and extending it to the auditory modality. Findings are discussed in terms of how increases in frontal EEG power in response to different affective intensity may reflect the cognitive aspects of emotional processing across sensory domains in infancy.
On the Development of Speech Resources for the Mixtec Language

PubMed Central

2013-01-01

The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). PMID:23710134

An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors

NASA Astrophysics Data System (ADS)

Sauter, Disa

This PhD is an investigation of vocal expressions of emotions, mainly focusing on non-verbal sounds such as laughter, cries and sighs. The research examines the roles of categorical and dimensional factors, the contributions of a number of acoustic cues, and the influence of culture. A series of studies established that naive listeners can reliably identify non-verbal vocalisations of positive and negative emotions in forced-choice and rating tasks. Some evidence for underlying dimensions of arousal and valence is found, although each emotion had a discrete expression. The role of acoustic characteristics of the sounds is investigated experimentally and analytically. This work shows that the cues used to identify different emotions vary, although pitch and pitch variation play a central role. The cues used to identify emotions in non-verbal vocalisations differ from the cues used when comprehending speech. An additional set of studies using stimuli consisting of emotional speech demonstrates that these sounds can also be reliably identified, and rely on similar acoustic cues. A series of studies with a pre-literate Namibian tribe shows that non-verbal vocalisations can be recognized across cultures. An fMRI study carried out to investigate the neural processing of non-verbal vocalisations of emotions is presented. The results show activation in pre-motor regions arising from passive listening to non-verbal emotional vocalisations, suggesting neural auditory-motor interactions in the perception of these sounds. In sum, this thesis demonstrates that non-verbal vocalisations of emotions are reliably identifiable tokens of information that belong to discrete categories. These vocalisations are recognisable across vastly different cultures and thus seem to, like facial expressions of emotions, comprise human universals. Listeners rely mainly on pitch and pitch variation to identify emotions in non verbal vocalisations, which differs with the cues used to comprehend speech. When listening to others' emotional vocalisations, a neural system of preparatory motor activation is engaged.
Long Term Suboxone™ Emotional Reactivity As Measured by Automatic Detection in Speech

PubMed Central

Hill, Edward; Han, David; Dumouchel, Pierre; Dehak, Najim; Quatieri, Thomas; Moehs, Charles; Oscar-Berman, Marlene; Giordano, John; Simpatico, Thomas; Blum, Kenneth

2013-01-01

Addictions to illicit drugs are among the nation’s most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX) as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states (“true ground emotionality”) in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of “true” emotionality in 36 SUBX patients compared to 44 individuals from the general population (GP) and 33 members of Alcoholics Anonymous (AA). Other less objective studies have investigated emotional reactivity of heroin, methadone and opioid abstinent patients. These studies indicate that current opioid users have abnormal emotional experience, characterized by heightened response to unpleasant stimuli and blunted response to pleasant stimuli. However, this is the first study to our knowledge to evaluate “true ground” emotionality in long-term buprenorphine/naloxone combination (Suboxone™). We found in long-term SUBX patients a significantly flat affect (p<0.01), and they had less self-awareness of being happy, sad, and anxious compared to both the GP and AA groups. We caution definitive interpretation of these seemingly important results until we compare the emotional reactivity of an opioid abstinent control using automatic detection in speech. These findings encourage continued research strategies in SUBX patients to target the specific brain regions responsible for relapse prevention of opioid addiction. PMID:23874860
The affective reactivity of psychotic speech: The role of internal source monitoring in explaining increased thought disorder under emotional challenge.

PubMed

de Sousa, Paulo; Sellwood, William; Spray, Amy; Bentall, Richard P

2016-04-01

Thought disorder (TD) has been shown to vary in relation to negative affect. Here we examine the role internal source monitoring (iSM, i.e. ability to discriminate between inner speech and verbalized speech) in TD and whether changes in iSM performance are implicated in the affective reactivity effect (deterioration of TD when participants are asked to talk about emotionally-laden topics). Eighty patients diagnosed with schizophrenia-spectrum disorder and thirty healthy controls received interviews that promoted personal disclosure (emotionally salient) and interviews on everyday topics (non-salient) on separate days. During the interviews, participants were tested on iSM, self-reported affect and immediate auditory recall. Patients had more TD, poorer ability to discriminate between inner and verbalized speech, poorer immediate auditory recall and reported more negative affect than controls. Both groups displayed more TD and negative affect in salient interviews but only patients showed poorer performance on iSM. Immediate auditory recall did not change significantly across affective conditions. In patients, the relationship between self-reported negative affect and TD was mediated by deterioration in the ability to discriminate between inner speech and speech that was directed to others and socially shared (performance on the iSM) in both interviews. Furthermore, deterioration in patients' performance on iSM across conditions significantly predicted deterioration in TD across the interviews (affective reactivity of speech). Poor iSM is significantly associated with TD. Negative affect, leading to further impaired iSM, leads to increased TD in patients with psychosis. Avenues for future research as well as clinical implications of these findings are discussed. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Emotion Analysis of Telephone Complaints from Customer Based on Affective Computing.

PubMed

Gong, Shuangping; Dai, Yonghui; Ji, Jun; Wang, Jinzhao; Sun, Hai

2015-01-01

Customer complaint has been the important feedback for modern enterprises to improve their product and service quality as well as the customer's loyalty. As one of the commonly used manners in customer complaint, telephone communication carries rich emotional information of speeches, which provides valuable resources for perceiving the customer's satisfaction and studying the complaint handling skills. This paper studies the characteristics of telephone complaint speeches and proposes an analysis method based on affective computing technology, which can recognize the dynamic changes of customer emotions from the conversations between the service staff and the customer. The recognition process includes speaker recognition, emotional feature parameter extraction, and dynamic emotion recognition. Experimental results show that this method is effective and can reach high recognition rates of happy and angry states. It has been successfully applied to the operation quality and service administration in telecom and Internet service company.
A longitudinal study of emotion regulation and anxiety in middle childhood: Associations with frontal EEG asymmetry in early childhood.

PubMed

Hannesdóttir, Dagmar Kr; Doxie, Jacquelyn; Bell, Martha Ann; Ollendick, Thomas H; Wolfe, Christy D

2010-03-01

We investigated whether brain electrical activity during early childhood was associated with anxiety symptoms and emotion regulation during a stressful situation during middle childhood. Frontal electroencephalogram (EEG) asymmetries were measured during baseline and during a cognitive control task at 4 1/2 years. Anxiety and emotion regulation were assessed during a stressful situation at age 9 (speech task), along with measures of heart rate (HR) and heart rate variability (HRV). Questionnaires were also used to assess anxiety and emotion regulation at age 9. Results from this longitudinal study indicated that children who exhibited right frontal asymmetry in early childhood experienced more physiological arousal (increased HR, decreased HRV) during the speech task at age 9 and less ability to regulate their emotions as reported by their parents. Findings are discussed in light of the associations between temperament and development of anxiety disorders.
Emotion Analysis of Telephone Complaints from Customer Based on Affective Computing

PubMed Central

Gong, Shuangping; Ji, Jun; Wang, Jinzhao; Sun, Hai

2015-01-01

Customer complaint has been the important feedback for modern enterprises to improve their product and service quality as well as the customer's loyalty. As one of the commonly used manners in customer complaint, telephone communication carries rich emotional information of speeches, which provides valuable resources for perceiving the customer's satisfaction and studying the complaint handling skills. This paper studies the characteristics of telephone complaint speeches and proposes an analysis method based on affective computing technology, which can recognize the dynamic changes of customer emotions from the conversations between the service staff and the customer. The recognition process includes speaker recognition, emotional feature parameter extraction, and dynamic emotion recognition. Experimental results show that this method is effective and can reach high recognition rates of happy and angry states. It has been successfully applied to the operation quality and service administration in telecom and Internet service company. PMID:26633967
MEG demonstrates a supra-additive response to facial and vocal emotion in the right superior temporal sulcus.

PubMed

Hagan, Cindy C; Woods, Will; Johnson, Sam; Calder, Andrew J; Green, Gary G R; Young, Andrew W

2009-11-24

An influential neural model of face perception suggests that the posterior superior temporal sulcus (STS) is sensitive to those aspects of faces that produce transient visual changes, including facial expression. Other researchers note that recognition of expression involves multiple sensory modalities and suggest that the STS also may respond to crossmodal facial signals that change transiently. Indeed, many studies of audiovisual (AV) speech perception show STS involvement in AV speech integration. Here we examine whether these findings extend to AV emotion. We used magnetoencephalography to measure the neural responses of participants as they viewed and heard emotionally congruent fear and minimally congruent neutral face and voice stimuli. We demonstrate significant supra-additive responses (i.e., where AV > [unimodal auditory + unimodal visual]) in the posterior STS within the first 250 ms for emotionally congruent AV stimuli. These findings show a role for the STS in processing crossmodal emotive signals.
Eliciting and maintaining ruminative thought: the role of social-evaluative threat.

PubMed

Zoccola, Peggy M; Dickerson, Sally S; Lam, Suman

2012-08-01

This study tested whether a performance stressor characterized by social-evaluative threat (SET) elicits more rumination than a stressor without this explicit evaluative component and whether this difference persists minutes, hours, and days later. The mediating role of shame-related cognition and emotion (SRCE) was also examined. During a laboratory visit, 144 undergraduates (50% female) were randomly assigned to complete a speech stressor in a social-evaluative threat condition (SET; n = 86), in which an audience was present, or a nonexplicit social-evaluative threat condition (ne-SET; n = 58), in which they were alone in a room. Participants completed measures of stressor-related rumination 10 and 40 min posttask, later that night, and upon returning to the laboratory 3-5 days later. SRCE and other emotions experienced during the stressor (fear, anger, and sadness) were assessed immediately posttask. As hypothesized, the SET speech stressor elicited more rumination than the ne-SET speech stressor, and these differences persisted for 3-5 days. SRCE-but not other specific negative emotions or general emotional arousal-mediated the effect of stressor context on rumination. Stressors characterized by SET may be likely candidates for eliciting and maintaining ruminative thought immediately and also days later, potentially by eliciting shame-related emotions and cognitions.
Analysis of False Starts in Spontaneous Speech.

ERIC Educational Resources Information Center

O'Shaughnessy, Douglas

A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…
45 CFR 1214.103 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-10-01

..., including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and lymphatic; skin... impairment” includes, but is not limited to, such diseases and conditions as orthopedic, visual, speech, and... disease, diabetes, mental retardation, emotional illness, and drug addiction and alcoholism. (2) Major...
Implications of Multilingual Interoperability of Speech Technology for Military Use (Les implications de l’interoperabilite multilingue des technologies vocales pour applications militaires)

DTIC Science & Technology

2004-09-01

Databases 2-2 2.3.1 Translanguage English Database 2-2 2.3.2 Australian National Database of Spoken Language 2-3 2.3.3 Strange Corpus 2-3 2.3.4...some relevance to speech technology research. 2.3.1 Translanguage English Database In a daring plan Joseph Mariani, then at LIMSI-CNRS, proposed to...native speakers. The database is known as the ‘ Translanguage English Database’ but is often referred to as the ‘terrible English database.’ About 28
Comparison of Two Music Training Approaches on Music and Speech Perception in Cochlear Implant Users

PubMed Central

Fuller, Christina D.; Galvin, John J.; Maat, Bert; Başkent, Deniz; Free, Rolien H.

2018-01-01

In normal-hearing (NH) adults, long-term music training may benefit music and speech perception, even when listening to spectro-temporally degraded signals as experienced by cochlear implant (CI) users. In this study, we compared two different music training approaches in CI users and their effects on speech and music perception, as it remains unclear which approach to music training might be best. The approaches differed in terms of music exercises and social interaction. For the pitch/timbre group, melodic contour identification (MCI) training was performed using computer software. For the music therapy group, training involved face-to-face group exercises (rhythm perception, musical speech perception, music perception, singing, vocal emotion identification, and music improvisation). For the control group, training involved group nonmusic activities (e.g., writing, cooking, and woodworking). Training consisted of weekly 2-hr sessions over a 6-week period. Speech intelligibility in quiet and noise, vocal emotion identification, MCI, and quality of life (QoL) were measured before and after training. The different training approaches appeared to offer different benefits for music and speech perception. Training effects were observed within-domain (better MCI performance for the pitch/timbre group), with little cross-domain transfer of music training (emotion identification significantly improved for the music therapy group). While training had no significant effect on QoL, the music therapy group reported better perceptual skills across training sessions. These results suggest that more extensive and intensive training approaches that combine pitch training with the social aspects of music therapy may further benefit CI users. PMID:29621947
Comparison of Two Music Training Approaches on Music and Speech Perception in Cochlear Implant Users.

PubMed

Fuller, Christina D; Galvin, John J; Maat, Bert; Başkent, Deniz; Free, Rolien H

2018-01-01

In normal-hearing (NH) adults, long-term music training may benefit music and speech perception, even when listening to spectro-temporally degraded signals as experienced by cochlear implant (CI) users. In this study, we compared two different music training approaches in CI users and their effects on speech and music perception, as it remains unclear which approach to music training might be best. The approaches differed in terms of music exercises and social interaction. For the pitch/timbre group, melodic contour identification (MCI) training was performed using computer software. For the music therapy group, training involved face-to-face group exercises (rhythm perception, musical speech perception, music perception, singing, vocal emotion identification, and music improvisation). For the control group, training involved group nonmusic activities (e.g., writing, cooking, and woodworking). Training consisted of weekly 2-hr sessions over a 6-week period. Speech intelligibility in quiet and noise, vocal emotion identification, MCI, and quality of life (QoL) were measured before and after training. The different training approaches appeared to offer different benefits for music and speech perception. Training effects were observed within-domain (better MCI performance for the pitch/timbre group), with little cross-domain transfer of music training (emotion identification significantly improved for the music therapy group). While training had no significant effect on QoL, the music therapy group reported better perceptual skills across training sessions. These results suggest that more extensive and intensive training approaches that combine pitch training with the social aspects of music therapy may further benefit CI users.
Identification of four class emotion from Indonesian spoken language using acoustic and lexical features

NASA Astrophysics Data System (ADS)

Kasyidi, Fatan; Puji Lestari, Dessi

2018-03-01

One of the important aspects in human to human communication is to understand emotion of each party. Recently, interactions between human and computer continues to develop, especially affective interaction where emotion recognition is one of its important components. This paper presents our extended works on emotion recognition of Indonesian spoken language to identify four main class of emotions: Happy, Sad, Angry, and Contentment using combination of acoustic/prosodic features and lexical features. We construct emotion speech corpus from Indonesia television talk show where the situations are as close as possible to the natural situation. After constructing the emotion speech corpus, the acoustic/prosodic and lexical features are extracted to train the emotion model. We employ some machine learning algorithms such as Support Vector Machine (SVM), Naive Bayes, and Random Forest to get the best model. The experiment result of testing data shows that the best model has an F-measure score of 0.447 by using only the acoustic/prosodic feature and F-measure score of 0.488 by using both acoustic/prosodic and lexical features to recognize four class emotion using the SVM RBF Kernel.
Intelligibility of emotional speech in younger and older adults.

PubMed

Dupuis, Kate; Pichora-Fuller, M Kathleen

2014-01-01

Little is known about the influence of vocal emotions on speech understanding. Word recognition accuracy for stimuli spoken to portray seven emotions (anger, disgust, fear, sadness, neutral, happiness, and pleasant surprise) was tested in younger and older listeners. Emotions were presented in either mixed (heterogeneous emotions mixed in a list) or blocked (homogeneous emotion blocked in a list) conditions. Three main hypotheses were tested. First, vocal emotion affects word recognition accuracy; specifically, portrayals of fear enhance word recognition accuracy because listeners orient to threatening information and/or distinctive acoustical cues such as high pitch mean and variation. Second, older listeners recognize words less accurately than younger listeners, but the effects of different emotions on intelligibility are similar across age groups. Third, blocking emotions in list results in better word recognition accuracy, especially for older listeners, and reduces the effect of emotion on intelligibility because as listeners develop expectations about vocal emotion, the allocation of processing resources can shift from emotional to lexical processing. Emotion was the within-subjects variable: all participants heard speech stimuli consisting of a carrier phrase followed by a target word spoken by either a younger or an older talker, with an equal number of stimuli portraying each of seven vocal emotions. The speech was presented in multi-talker babble at signal to noise ratios adjusted for each talker and each listener age group. Listener age (younger, older), condition (mixed, blocked), and talker (younger, older) were the main between-subjects variables. Fifty-six students (Mage= 18.3 years) were recruited from an undergraduate psychology course; 56 older adults (Mage= 72.3 years) were recruited from a volunteer pool. All participants had clinically normal pure-tone audiometric thresholds at frequencies ≤3000 Hz. There were significant main effects of emotion, listener age group, and condition on the accuracy of word recognition in noise. Stimuli spoken in a fearful voice were the most intelligible, while those spoken in a sad voice were the least intelligible. Overall, word recognition accuracy was poorer for older than younger adults, but there was no main effect of talker, and the pattern of the effects of different emotions on intelligibility did not differ significantly across age groups. Acoustical analyses helped elucidate the effect of emotion and some intertalker differences. Finally, all participants performed better when emotions were blocked. For both groups, performance improved over repeated presentations of each emotion in both blocked and mixed conditions. These results are the first to demonstrate a relationship between vocal emotion and word recognition accuracy in noise for younger and older listeners. In particular, the enhancement of intelligibility by emotion is greatest for words spoken to portray fear and presented heterogeneously with other emotions. Fear may have a specialized role in orienting attention to words heard in noise. This finding may be an auditory counterpart to the enhanced detection of threat information in visual displays. The effect of vocal emotion on word recognition accuracy is preserved in older listeners with good audiograms and both age groups benefit from blocking and the repetition of emotions.
Changes in Maternal Expressed Emotion toward Clinically Anxious Children following Cognitive Behavioral Therapy

ERIC Educational Resources Information Center

Gar, Natalie S.; Hudson, Jennifer L.

2009-01-01

The aim of this study was to determine whether maternal expressed emotion (criticism and emotional overinvolvement) decreased across treatment for childhood anxiety. Mothers of 48 clinically anxious children (aged 6-14 years) were rated on levels of criticism (CRIT) and emotional overinvolvement (EOI), as measured by a Five Minute Speech Sample…
"I Won't Talk about This Here in America": Sociocultural Context of Korean English Language Learners' Emotion Speech in English

ERIC Educational Resources Information Center

Kim, Sujin; Dorner, Lisa M.

2013-01-01

This article examines the relationship between language and emotion, especially drawing attention to the experiences and perspectives of second language (SL) learners. Informed by the sociocultural perspective on the construction of emotion and its representation, this study highlights the intertwined relationship among emotions, cultural…
Mommy is only happy! Dutch mothers' realisation of speech sounds in infant-directed speech expresses emotion, not didactic intent.

PubMed

Benders, Titia

2013-12-01

Exaggeration of the vowel space in infant-directed speech (IDS) is well documented for English, but not consistently replicated in other languages or for other speech-sound contrasts. A second attested, but less discussed, pattern of change in IDS is an overall rise of the formant frequencies, which may reflect an affective speaking style. The present study investigates longitudinally how Dutch mothers change their corner vowels, voiceless fricatives, and pitch when speaking to their infant at 11 and 15 months of age. In comparison to adult-directed speech (ADS), Dutch IDS has a smaller vowel space, higher second and third formant frequencies in the vowels, and a higher spectral frequency in the fricatives. The formants of the vowels and spectral frequency of the fricatives are raised more strongly for infants at 11 than at 15 months, while the pitch is more extreme in IDS to 15-month olds. These results show that enhanced positive affect is the main factor influencing Dutch mothers' realisation of speech sounds in IDS, especially to younger infants. This study provides evidence that mothers' expression of emotion in IDS can influence the realisation of speech sounds, and that the loss or gain of speech clarity may be secondary effects of affect. Copyright © 2013 Elsevier Inc. All rights reserved.
Music Communicates Affects, Not Basic Emotions – A Constructionist Account of Attribution of Emotional Meanings to Music

PubMed Central

Cespedes-Guevara, Julian; Eerola, Tuomas

2018-01-01

Basic Emotion theory has had a tremendous influence on the affective sciences, including music psychology, where most researchers have assumed that music expressivity is constrained to a limited set of basic emotions. Several scholars suggested that these constrains to musical expressivity are explained by the existence of a shared acoustic code to the expression of emotions in music and speech prosody. In this article we advocate for a shift from this focus on basic emotions to a constructionist account. This approach proposes that the phenomenon of perception of emotions in music arises from the interaction of music’s ability to express core affects and the influence of top-down and contextual information in the listener’s mind. We start by reviewing the problems with the concept of Basic Emotions, and the inconsistent evidence that supports it. We also demonstrate how decades of developmental and cross-cultural research on music and emotional speech have failed to produce convincing findings to conclude that music expressivity is built upon a set of biologically pre-determined basic emotions. We then examine the cue-emotion consistencies between music and speech, and show how they support a parsimonious explanation, where musical expressivity is grounded on two dimensions of core affect (arousal and valence). Next, we explain how the fact that listeners reliably identify basic emotions in music does not arise from the existence of categorical boundaries in the stimuli, but from processes that facilitate categorical perception, such as using stereotyped stimuli and close-ended response formats, psychological processes of construction of mental prototypes, and contextual information. Finally, we outline our proposal of a constructionist account of perception of emotions in music, and spell out the ways in which this approach is able to make solve past conflicting findings. We conclude by providing explicit pointers about the methodological choices that will be vital to move beyond the popular Basic Emotion paradigm and start untangling the emergence of emotional experiences with music in the actual contexts in which they occur. PMID:29541041
Music Communicates Affects, Not Basic Emotions - A Constructionist Account of Attribution of Emotional Meanings to Music.

PubMed

Cespedes-Guevara, Julian; Eerola, Tuomas

2018-01-01

Basic Emotion theory has had a tremendous influence on the affective sciences, including music psychology, where most researchers have assumed that music expressivity is constrained to a limited set of basic emotions. Several scholars suggested that these constrains to musical expressivity are explained by the existence of a shared acoustic code to the expression of emotions in music and speech prosody. In this article we advocate for a shift from this focus on basic emotions to a constructionist account. This approach proposes that the phenomenon of perception of emotions in music arises from the interaction of music's ability to express core affects and the influence of top-down and contextual information in the listener's mind. We start by reviewing the problems with the concept of Basic Emotions, and the inconsistent evidence that supports it. We also demonstrate how decades of developmental and cross-cultural research on music and emotional speech have failed to produce convincing findings to conclude that music expressivity is built upon a set of biologically pre-determined basic emotions. We then examine the cue-emotion consistencies between music and speech, and show how they support a parsimonious explanation, where musical expressivity is grounded on two dimensions of core affect (arousal and valence). Next, we explain how the fact that listeners reliably identify basic emotions in music does not arise from the existence of categorical boundaries in the stimuli, but from processes that facilitate categorical perception, such as using stereotyped stimuli and close-ended response formats, psychological processes of construction of mental prototypes, and contextual information. Finally, we outline our proposal of a constructionist account of perception of emotions in music, and spell out the ways in which this approach is able to make solve past conflicting findings. We conclude by providing explicit pointers about the methodological choices that will be vital to move beyond the popular Basic Emotion paradigm and start untangling the emergence of emotional experiences with music in the actual contexts in which they occur.

Children with dyslexia show a reduced processing benefit from bimodal speech information compared to their typically developing peers.

PubMed

Schaadt, Gesa; van der Meer, Elke; Pannekamp, Ann; Oberecker, Regine; Männel, Claudia

2018-01-17

During information processing, individuals benefit from bimodally presented input, as has been demonstrated for speech perception (i.e., printed letters and speech sounds) or the perception of emotional expressions (i.e., facial expression and voice tuning). While typically developing individuals show this bimodal benefit, school children with dyslexia do not. Currently, it is unknown whether the bimodal processing deficit in dyslexia also occurs for visual-auditory speech processing that is independent of reading and spelling acquisition (i.e., no letter-sound knowledge is required). Here, we tested school children with and without spelling problems on their bimodal perception of video-recorded mouth movements pronouncing syllables. We analyzed the event-related potential Mismatch Response (MMR) to visual-auditory speech information and compared this response to the MMR to monomodal speech information (i.e., auditory-only, visual-only). We found a reduced MMR with later onset to visual-auditory speech information in children with spelling problems compared to children without spelling problems. Moreover, when comparing bimodal and monomodal speech perception, we found that children without spelling problems showed significantly larger responses in the visual-auditory experiment compared to the visual-only response, whereas children with spelling problems did not. Our results suggest that children with dyslexia exhibit general difficulties in bimodal speech perception independently of letter-speech sound knowledge, as apparent in altered bimodal speech perception and lacking benefit from bimodal information. This general deficit in children with dyslexia may underlie the previously reported reduced bimodal benefit for letter-speech sound combinations and similar findings in emotion perception. Copyright © 2018 Elsevier Ltd. All rights reserved.
Understanding emotional expression using prosodic analysis of natural speech: refining the methodology.

PubMed

Cohen, Alex S; Hong, S Lee; Guevara, Alvaro

2010-06-01

Emotional expression is an essential function for daily life that can be severely affected in some psychological disorders. Laboratory-based procedures designed to measure prosodic expression from natural speech have shown early promise for measuring individual differences in emotional expression but have yet to produce robust within-group prosodic changes across various evocative conditions. This report presents data from three separate studies (total N = 464) that digitally recorded subjects as they verbalized their reactions to various stimuli. Format and stimuli were modified to maximize prosodic expression. Our results suggest that use of evocative slides organized according to either a dimensional (e.g., high and low arousal - pleasant, unpleasant and neutral valence) or categorical (e.g., fear, surprise, happiness) models produced robust changes in subjective state but only negligible change in prosodic expression. Alternatively, speech from the recall of autobiographical memories resulted in meaningful changes in both subjective state and prosodic expression. Implications for the study of psychological disorders are discussed.
Recruitment of Language-, Emotion- and Speech-Timing Associated Brain Regions for Expressing Emotional Prosody: Investigation of Functional Neuroanatomy with fMRI

PubMed Central

Mitchell, Rachel L. C.; Jazdzyk, Agnieszka; Stets, Manuela; Kotz, Sonja A.

2016-01-01

We aimed to progress understanding of prosodic emotion expression by establishing brain regions active when expressing specific emotions, those activated irrespective of the target emotion, and those whose activation intensity varied depending on individual performance. BOLD contrast data were acquired whilst participants spoke non-sense words in happy, angry or neutral tones, or performed jaw-movements. Emotion-specific analyses demonstrated that when expressing angry prosody, activated brain regions included the inferior frontal and superior temporal gyri, the insula, and the basal ganglia. When expressing happy prosody, the activated brain regions also included the superior temporal gyrus, insula, and basal ganglia, with additional activation in the anterior cingulate. Conjunction analysis confirmed that the superior temporal gyrus and basal ganglia were activated regardless of the specific emotion concerned. Nevertheless, disjunctive comparisons between the expression of angry and happy prosody established that anterior cingulate activity was significantly higher for angry prosody than for happy prosody production. Degree of inferior frontal gyrus activity correlated with the ability to express the target emotion through prosody. We conclude that expressing prosodic emotions (vs. neutral intonation) requires generic brain regions involved in comprehending numerous aspects of language, emotion-related processes such as experiencing emotions, and in the time-critical integration of speech information. PMID:27803656
Analysis and synthesis of laughter

NASA Astrophysics Data System (ADS)

Sundaram, Shiva; Narayanan, Shrikanth

2004-10-01

There is much enthusiasm in the text-to-speech community for synthesis of emotional and natural speech. One idea being proposed is to include emotion dependent paralinguistic cues during synthesis to convey emotions effectively. This requires modeling and synthesis techniques of various cues for different emotions. Motivated by this, a technique to synthesize human laughter is proposed. Laughter is a complex mechanism of expression and has high variability in terms of types and usage in human-human communication. People have their own characteristic way of laughing. Laughter can be seen as a controlled/uncontrolled physiological process of a person resulting from an initial excitation in context. A parametric model based on damped simple harmonic motion to effectively capture these diversities and also maintain the individuals characteristics is developed here. Limited laughter/speech data from actual humans and synthesis ease are the constraints imposed on the accuracy of the model. Analysis techniques are also developed to determine the parameters of the model for a given individual or laughter type. Finally, the effectiveness of the model to capture the individual characteristics and naturalness compared to real human laughter has been analyzed. Through this the factors involved in individual human laughter and their importance can be better understood.
Computer-aided psychotherapy based on multimodal elicitation, estimation and regulation of emotion.

PubMed

Cosić, Krešimir; Popović, Siniša; Horvat, Marko; Kukolja, Davor; Dropuljić, Branimir; Kovač, Bernard; Jakovljević, Miro

2013-09-01

Contemporary psychiatry is looking at affective sciences to understand human behavior, cognition and the mind in health and disease. Since it has been recognized that emotions have a pivotal role for the human mind, an ever increasing number of laboratories and research centers are interested in affective sciences, affective neuroscience, affective psychology and affective psychopathology. Therefore, this paper presents multidisciplinary research results of Laboratory for Interactive Simulation System at Faculty of Electrical Engineering and Computing, University of Zagreb in the stress resilience. Patient's distortion in emotional processing of multimodal input stimuli is predominantly consequence of his/her cognitive deficit which is result of their individual mental health disorders. These emotional distortions in patient's multimodal physiological, facial, acoustic, and linguistic features related to presented stimulation can be used as indicator of patient's mental illness. Real-time processing and analysis of patient's multimodal response related to annotated input stimuli is based on appropriate machine learning methods from computer science. Comprehensive longitudinal multimodal analysis of patient's emotion, mood, feelings, attention, motivation, decision-making, and working memory in synchronization with multimodal stimuli provides extremely valuable big database for data mining, machine learning and machine reasoning. Presented multimedia stimuli sequence includes personalized images, movies and sounds, as well as semantically congruent narratives. Simultaneously, with stimuli presentation patient provides subjective emotional ratings of presented stimuli in terms of subjective units of discomfort/distress, discrete emotions, or valence and arousal. These subjective emotional ratings of input stimuli and corresponding physiological, speech, and facial output features provides enough information for evaluation of patient's cognitive appraisal deficit. Aggregated real-time visualization of this information provides valuable assistance in patient mental state diagnostics enabling therapist deeper and broader insights into dynamics and progress of the psychotherapy.
Voice emotion recognition by cochlear-implanted children and their normally-hearing peers

PubMed Central

Chatterjee, Monita; Zion, Danielle; Deroche, Mickael L.; Burianek, Brooke; Limb, Charles; Goren, Alison; Kulkarni, Aditya M.; Christensen, Julie A.

2014-01-01

Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups’ mean performance is similar to aNHs’ performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. PMID:25448167
Comparison of formant detection methods used in speech processing applications

NASA Astrophysics Data System (ADS)

Belean, Bogdan

2013-11-01

The paper describes time frequency representations of speech signal together with the formant significance in speech processing applications. Speech formants can be used in emotion recognition, sex discrimination or diagnosing different neurological diseases. Taking into account the various applications of formant detection in speech signal, two methods for detecting formants are presented. First, the poles resulted after a complex analysis of LPC coefficients are used for formants detection. The second approach uses the Kalman filter for formant prediction along the speech signal. Results are presented for both approaches on real life speech spectrograms. A comparison regarding the features of the proposed methods is also performed, in order to establish which method is more suitable in case of different speech processing applications.
Separated from family, students chalk up their emotions > U.S. Air Force >

Science.gov Websites

The Book Speeches Archive Former AF Top 3 Viewpoints and Speeches Air Force Warrior Games 2017 Events 2018 Air Force Strategic Documents Desert Storm 25th Anniversary Observances DoD Warrior Games
Cross-Cultural Attitudes toward Speech Disorders.

ERIC Educational Resources Information Center

Bebout, Linda; Arthur, Bradford

1992-01-01

University students (n=166) representing English-speaking North American culture and several other cultures completed questionnaires examining attitudes toward four speech disorders (cleft palate, dysfluency, hearing impairment, and misarticulations). Results showed significant group differences in beliefs about the emotional health of persons…
Implementation of Three Text to Speech Systems for Kurdish Language

NASA Astrophysics Data System (ADS)

Bahrampour, Anvar; Barkhoda, Wafa; Azami, Bahram Zahir

Nowadays, concatenative method is used in most modern TTS systems to produce artificial speech. The most important challenge in this method is choosing appropriate unit for creating database. This unit must warranty smoothness and high quality speech, and also, creating database for it must reasonable and inexpensive. For example, syllable, phoneme, allophone, and, diphone are appropriate units for all-purpose systems. In this paper, we implemented three synthesis systems for Kurdish language based on syllable, allophone, and diphone and compare their quality using subjective testing.
Affective Prosody Labeling in Youths with Bipolar Disorder or Severe Mood Dysregulation

ERIC Educational Resources Information Center

Deveney, Christen M.; Brotman, Melissa A.; Decker, Ann Marie; Pine, Daniel S.; Leibenluft, Ellen

2012-01-01

Background: Accurate identification of nonverbal emotional cues is essential to successful social interactions, yet most research is limited to emotional face expression labeling. Little research focuses on the processing of emotional prosody, or tone of verbal speech, in clinical populations. Methods: Using the Diagnostic Analysis of Nonverbal…
Towards Real-Time Speech Emotion Recognition for Affective E-Learning

ERIC Educational Resources Information Center

Bahreini, Kiavash; Nadolski, Rob; Westera, Wim

2016-01-01

This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…
A hypothesis on the biological origins and social evolution of music and dance

PubMed Central

Wang, Tianyan

2015-01-01

The origins of music and musical emotions is still an enigma, here I propose a comprehensive hypothesis on the origins and evolution of music, dance, and speech from a biological and sociological perspective. I suggest that every pitch interval between neighboring notes in music represents corresponding movement pattern through interpreting the Doppler effect of sound, which not only provides a possible explanation for the transposition invariance of music, but also integrates music and dance into a common form—rhythmic movements. Accordingly, investigating the origins of music poses the question: why do humans appreciate rhythmic movements? I suggest that human appreciation of rhythmic movements and rhythmic events developed from the natural selection of organisms adapting to the internal and external rhythmic environments. The perception and production of, as well as synchronization with external and internal rhythms are so vital for an organism's survival and reproduction, that animals have a rhythm-related reward and emotion (RRRE) system. The RRRE system enables the appreciation of rhythmic movements and events, and is integral to the origination of music, dance and speech. The first type of rewards and emotions (rhythm-related rewards and emotions, RRREs) are evoked by music and dance, and have biological and social functions, which in turn, promote the evolution of music, dance and speech. These functions also evoke a second type of rewards and emotions, which I name society-related rewards and emotions (SRREs). The neural circuits of RRREs and SRREs develop in species formation and personal growth, with congenital and acquired characteristics, respectively, namely music is the combination of nature and culture. This hypothesis provides probable selection pressures and outlines the evolution of music, dance, and speech. The links between the Doppler effect and the RRREs and SRREs can be empirically tested, making the current hypothesis scientifically concrete. PMID:25741232
Identification of emotional intonation evaluated by fMRI.

PubMed

Wildgruber, D; Riecker, A; Hertrich, I; Erb, M; Grodd, W; Ethofer, T; Ackermann, H

2005-02-15

During acoustic communication among human beings, emotional information can be expressed both by the propositional content of verbal utterances and by the modulation of speech melody (affective prosody). It is well established that linguistic processing is bound predominantly to the left hemisphere of the brain. By contrast, the encoding of emotional intonation has been assumed to depend specifically upon right-sided cerebral structures. However, prior clinical and functional imaging studies yielded discrepant data with respect to interhemispheric lateralization and intrahemispheric localization of brain regions contributing to processing of affective prosody. In order to delineate the cerebral network engaged in the perception of emotional tone, functional magnetic resonance imaging (fMRI) was performed during recognition of prosodic expressions of five different basic emotions (happy, sad, angry, fearful, and disgusted) and during phonetic monitoring of the same stimuli. As compared to baseline at rest, both tasks yielded widespread bilateral hemodynamic responses within frontal, temporal, and parietal areas, the thalamus, and the cerebellum. A comparison of the respective activation maps, however, revealed comprehension of affective prosody to be bound to a distinct right-hemisphere pattern of activation, encompassing posterior superior temporal sulcus (Brodmann Area [BA] 22), dorsolateral (BA 44/45), and orbitobasal (BA 47) frontal areas. Activation within left-sided speech areas, in contrast, was observed during the phonetic task. These findings indicate that partially distinct cerebral networks subserve processing of phonetic and intonational information during speech perception.
Lexical analysis in schizophrenia: how emotion and social word use informs our understanding of clinical presentation.

PubMed

Minor, Kyle S; Bonfils, Kelsey A; Luther, Lauren; Firmin, Ruth L; Kukla, Marina; MacLain, Victoria R; Buck, Benjamin; Lysaker, Paul H; Salyers, Michelle P

2015-05-01

The words people use convey important information about internal states, feelings, and views of the world around them. Lexical analysis is a fast, reliable method of assessing word use that has shown promise for linking speech content, particularly in emotion and social categories, with psychopathological symptoms. However, few studies have utilized lexical analysis instruments to assess speech in schizophrenia. In this exploratory study, we investigated whether positive emotion, negative emotion, and social word use was associated with schizophrenia symptoms, metacognition, and general functioning in a schizophrenia cohort. Forty-six participants generated speech during a semi-structured interview, and word use categories were assessed using a validated lexical analysis measure. Trained research staff completed symptom, metacognition, and functioning ratings using semi-structured interviews. Word use categories significantly predicted all variables of interest, accounting for 28% of the variance in symptoms and 16% of the variance in metacognition and general functioning. Anger words, a subcategory of negative emotion, significantly predicted greater symptoms and lower functioning. Social words significantly predicted greater metacognition. These findings indicate that lexical analysis instruments have the potential to play a vital role in psychosocial assessments of schizophrenia. Future research should replicate these findings and examine the relationship between word use and additional clinical variables across the schizophrenia-spectrum. Copyright © 2015 Elsevier Ltd. All rights reserved.
Classification of complex information: inference of co-occurring affective states from their expressions in speech.

PubMed

Sobol-Shikler, Tal; Robinson, Peter

2010-07-01

We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance.
Dual Diathesis-Stressor Model of Emotional and Linguistic Contributions to Developmental Stuttering

ERIC Educational Resources Information Center

Walden, Tedra A.; Frankel, Carl B.; Buhr, Anthony P.; Johnson, Kia N.; Conture, Edward G.; Karrass, Jan M.

2012-01-01

This study assessed emotional and speech-language contributions to childhood stuttering. A dual diathesis-stressor framework guided this study, in which both linguistic requirements and skills, and emotion and its regulation, are hypothesized to contribute to stuttering. The language diathesis consists of expressive and receptive language skills.…
Gender differences in the activation of inferior frontal cortex during emotional speech perception.

PubMed

Schirmer, Annett; Zysset, Stefan; Kotz, Sonja A; Yves von Cramon, D

2004-03-01

We investigated the brain regions that mediate the processing of emotional speech in men and women by presenting positive and negative words that were spoken with happy or angry prosody. Hence, emotional prosody and word valence were either congruous or incongruous. We assumed that an fRMI contrast between congruous and incongruous presentations would reveal the structures that mediate the interaction of emotional prosody and word valence. The left inferior frontal gyrus (IFG) was more strongly activated in incongruous as compared to congruous trials. This difference in IFG activity was significantly larger in women than in men. Moreover, the congruence effect was significant in women whereas it only appeared as a tendency in men. As the left IFG has been repeatedly implicated in semantic processing, these findings are taken as evidence that semantic processing in women is more susceptible to influences from emotional prosody than is semantic processing in men. Moreover, the present data suggest that the left IFG mediates increased semantic processing demands imposed by an incongruence between emotional prosody and word valence.
Normal-Hearing Listeners’ and Cochlear Implant Users’ Perception of Pitch Cues in Emotional Speech

PubMed Central

Fuller, Christina; Gilbers, Dicky; Broersma, Mirjam; Goudbeek, Martijn; Free, Rolien; Başkent, Deniz

2015-01-01

In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study’s aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings’ pitch cues were phonetically analyzed. The recordings were used to test 20 normal-hearing listeners’ and 20 CI users’ emotion recognition. In congruence with previous studies, high-arousal emotions had a higher mean pitch, wider pitch range, and more dominant pitches than low-arousal emotions. Regarding pitch, speakers did not differentiate emotions based on valence but on arousal. Normal-hearing listeners outperformed CI users in emotion recognition, even when presented with CI simulated stimuli. However, only normal-hearing listeners recognized one particular actor’s emotions worse than the other actors’. The groups behaved differently when presented with similar input, showing that they had to employ differing strategies. Considering the respective speaker’s deviating pronunciation, it appears that for normal-hearing listeners, mean pitch is a more salient cue than pitch range, whereas CI users are biased toward pitch range cues. PMID:27648210
Neural Substrates of Processing Anger in Language: Contributions of Prosody and Semantics.

PubMed

Castelluccio, Brian C; Myers, Emily B; Schuh, Jillian M; Eigsti, Inge-Marie

2016-12-01

Emotions are conveyed primarily through two channels in language: semantics and prosody. While many studies confirm the role of a left hemisphere network in processing semantic emotion, there has been debate over the role of the right hemisphere in processing prosodic emotion. Some evidence suggests a preferential role for the right hemisphere, and other evidence supports a bilateral model. The relative contributions of semantics and prosody to the overall processing of affect in language are largely unexplored. The present work used functional magnetic resonance imaging to elucidate the neural bases of processing anger conveyed by prosody or semantic content. Results showed a robust, distributed, bilateral network for processing angry prosody and a more modest left hemisphere network for processing angry semantics when compared to emotionally neutral stimuli. Findings suggest the nervous system may be more responsive to prosodic cues in speech than to the semantic content of speech.

Pathological speech signal analysis and classification using empirical mode decomposition.

PubMed

Kaleem, Muhammad; Ghoraani, Behnaz; Guergachi, Aziz; Krishnan, Sridhar

2013-07-01

Automated classification of normal and pathological speech signals can provide an objective and accurate mechanism for pathological speech diagnosis, and is an active area of research. A large part of this research is based on analysis of acoustic measures extracted from sustained vowels. However, sustained vowels do not reflect real-world attributes of voice as effectively as continuous speech, which can take into account important attributes of speech such as rapid voice onset and termination, changes in voice frequency and amplitude, and sudden discontinuities in speech. This paper presents a methodology based on empirical mode decomposition (EMD) for classification of continuous normal and pathological speech signals obtained from a well-known database. EMD is used to decompose randomly chosen portions of speech signals into intrinsic mode functions, which are then analyzed to extract meaningful temporal and spectral features, including true instantaneous features which can capture discriminative information in signals hidden at local time-scales. A total of six features are extracted, and a linear classifier is used with the feature vector to classify continuous speech portions obtained from a database consisting of 51 normal and 161 pathological speakers. A classification accuracy of 95.7 % is obtained, thus demonstrating the effectiveness of the methodology.
Toward Emotionally Accessible Massive Open Online Courses (MOOCs).

PubMed

Hillaire, Garron; Iniesto, Francisco; Rienties, Bart

2017-01-01

This paper outlines an approach to evaluating the emotional content of three Massive Open Online Courses (MOOCs) using the affective computing approach of prosody detection on two different text-to-speech voices in conjunction with human raters judging the emotional content of course text. The intent of this work is to establish the potential variation on the emotional delivery of MOOC material through synthetic voice.
GreekLex 2: A comprehensive lexical database with part-of-speech, syllabic, phonological, and stress information

PubMed Central

van Heuven, Walter J. B.; Pitchford, Nicola J.; Ledgeway, Timothy

2017-01-01

Databases containing lexical properties on any given orthography are crucial for psycholinguistic research. In the last ten years, a number of lexical databases have been developed for Greek. However, these lack important part-of-speech information. Furthermore, the need for alternative procedures for calculating syllabic measurements and stress information, as well as combination of several metrics to investigate linguistic properties of the Greek language are highlighted. To address these issues, we present a new extensive lexical database of Modern Greek (GreekLex 2) with part-of-speech information for each word and accurate syllabification and orthographic information predictive of stress, as well as several measurements of word similarity and phonetic information. The addition of detailed statistical information about Greek part-of-speech, syllabification, and stress neighbourhood allowed novel analyses of stress distribution within different grammatical categories and syllabic lengths to be carried out. Results showed that the statistical preponderance of stress position on the pre-final syllable that is reported for Greek language is dependent upon grammatical category. Additionally, analyses showed that a proportion higher than 90% of the tokens in the database would be stressed correctly solely by relying on stress neighbourhood information. The database and the scripts for orthographic and phonological syllabification as well as phonetic transcription are available at http://www.psychology.nottingham.ac.uk/greeklex/. PMID:28231303
GreekLex 2: A comprehensive lexical database with part-of-speech, syllabic, phonological, and stress information.

PubMed

Kyparissiadis, Antonios; van Heuven, Walter J B; Pitchford, Nicola J; Ledgeway, Timothy

2017-01-01

Databases containing lexical properties on any given orthography are crucial for psycholinguistic research. In the last ten years, a number of lexical databases have been developed for Greek. However, these lack important part-of-speech information. Furthermore, the need for alternative procedures for calculating syllabic measurements and stress information, as well as combination of several metrics to investigate linguistic properties of the Greek language are highlighted. To address these issues, we present a new extensive lexical database of Modern Greek (GreekLex 2) with part-of-speech information for each word and accurate syllabification and orthographic information predictive of stress, as well as several measurements of word similarity and phonetic information. The addition of detailed statistical information about Greek part-of-speech, syllabification, and stress neighbourhood allowed novel analyses of stress distribution within different grammatical categories and syllabic lengths to be carried out. Results showed that the statistical preponderance of stress position on the pre-final syllable that is reported for Greek language is dependent upon grammatical category. Additionally, analyses showed that a proportion higher than 90% of the tokens in the database would be stressed correctly solely by relying on stress neighbourhood information. The database and the scripts for orthographic and phonological syllabification as well as phonetic transcription are available at http://www.psychology.nottingham.ac.uk/greeklex/.
Politeness, emotion, and gender: A sociophonetic study of voice pitch modulation

NASA Astrophysics Data System (ADS)

Yuasa, Ikuko

The present dissertation is a cross-gender and cross-cultural sociophonetic exploration of voice pitch characteristics utilizing speech data derived from Japanese and American speakers in natural conversations. The roles of voice pitch modulation in terms of the concepts of politeness and emotion as they pertain to culture and gender will be investigated herein. The research interprets the significance of my findings based on the acoustic measurements of speech data as they are presented in the ERB-rate scale (the most appropriate scale for human speech perception). The investigation reveals that pitch range modulation displayed by Japanese informants in two types of conversations is closely linked to types of politeness adopted by those informants. The degree of the informants' emotional involvement and expressions reflected in differing pitch range widths plays an important role in determining the relationship between pitch range modulation and politeness. The study further correlates the Japanese cultural concept of enryo ("self-restraint") with this phenomenon. When median values were examined, male and female pitch ranges across cultures did not conspicuously differ. However, sporadically occurring women's pitch characteristics which culturally differ in width and height of pitch ranges may create an 'emotional' perception of women's speech style. The salience of these pitch characteristics appears to be the source of the stereotypically linked sound of women's speech being identified as 'swoopy' or 'shrill' and thus 'emotional'. Such women's salient voice characteristics are interpreted in light of camaraderie/positive politeness. Women's use of conspicuous paralinguistic features helps to create an atmosphere of camaraderie. These voice pitch characteristics promote the establishment of a sense of camaraderie since they act to emphasize such feelings as concern, support, and comfort towards addressees, Moreover, men's wide pitch ranges are discussed in view of politeness (rather than gender). Japanese men's use of wide pitch ranges during conversations with familiar interlocutors demonstrate the extent to which male speakers can increase their pitch ranges if there is an authentic socio-cultural inspiration (other than a gender-related one) to do so. The findings suggest the necessity of interpreting research data in consideration of how the notion of gender interacts with other socio-cultural behavioral norms.
Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.

PubMed

Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina

2014-11-15

Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on an inter-hemispheric mechanism which exploits both a right-hemispheric sensitivity to pitch information and a left-hemispheric dominance in speech processing. Copyright © 2014 Elsevier Inc. All rights reserved.
Data Recycling: Using Existing Databases to Increase Research Capacity in Speech-Language Development and Disorders

ERIC Educational Resources Information Center

Justice, Laura M.; Breit-Smith, Allison; Rogers, Margaret

2010-01-01

Purpose: This clinical forum was organized to provide a means for informing the research and clinical communities of one mechanism through which research capacity might be enhanced within the field of speech-language pathology. Specifically, forum authors describe the process of conducting secondary analyses of extant databases to answer questions…
Contribution of a contralateral hearing aid to perception of consonant voicing, intonation, and emotional state in adult cochlear implantees.

PubMed

Most, Tova; Gaon-Sivan, Gal; Shpak, Talma; Luntz, Michal

2012-01-01

Binaural hearing in cochlear implant (CI) users can be achieved either by bilateral implantation or bimodally with a contralateral hearing aid (HA). Binaural-bimodal hearing has the advantage of complementing the high-frequency electric information from the CI by low-frequency acoustic information from the HA. We examined the contribution of a contralateral HA in 25 adult implantees to their perception of fundamental frequency-cued speech characteristics (initial consonant voicing, intonation, and emotions). Testing with CI alone, HA alone, and bimodal hearing showed that all three characteristics were best perceived under the bimodal condition. Significant differences were recorded between bimodal and HA conditions in the initial voicing test, between bimodal and CI conditions in the intonation test, and between both bimodal and CI conditions and between bimodal and HA conditions in the emotion-in-speech test. These findings confirmed that such binaural-bimodal hearing enhances perception of these speech characteristics and suggest that implantees with residual hearing in the contralateral ear may benefit from a HA in that ear.
Computerized Measurement of Negative Symptoms in Schizophrenia

PubMed Central

Cohen, Alex S.; Alpert, Murray; Nienow, Tasha M.; Dinzeo, Thomas J.; Docherty, Nancy M.

2008-01-01

Accurate measurement of negative symptoms is crucial for understanding and treating schizophrenia. However, current measurement strategies are reliant on subjective symptom rating scales which often have psychometric and practical limitations. Computerized analysis of patients’ speech offers a sophisticated and objective means of evaluating negative symptoms. The present study examined the feasibility and validity of using widely-available acoustic and lexical-analytic software to measure flat affect, alogia and anhedonia (via positive emotion). These measures were examined in their relationships to clinically-rated negative symptoms and social functioning. Natural speech samples were collected and analyzed for 14 patients with clinically-rated flat affect, 46 patients without flat affect and 19 healthy controls. The computer-based inflection and speech rate measures significantly discriminated patients with flat affect from controls, and the computer-based measure of alogia and negative emotion significantly discriminated the flat and non-flat patients. Both the computer and clinical measures of positive emotion/anhedonia corresponded to functioning impairments. The computerized method of assessing negative symptoms offered a number of advantages over the symptom scale-based approach. PMID:17920078
Examining Differences between Students with Specific Learning Disabilities and Those with Specific Language Disorders on Cognition, Emotions and Psychopathology

ERIC Educational Resources Information Center

Filippatou, Diamanto; Dimitropoulou, Panagiota; Sideridis, Georgios

2009-01-01

The purpose of the present study was to investigate the differences between students with LD and SLI on emotional psychopathology and cognitive variables. In particular, the study examined whether cognitive, emotional, and psychopathology variables are significant discriminatory variables of speech and language disordered groups versus those…
Bipolar Disorder in Children: Implications for Speech-Language Pathologists

ERIC Educational Resources Information Center

Quattlebaum, Patricia D.; Grier, Betsy C.; Klubnik, Cynthia

2012-01-01

In the United States, bipolar disorder is an increasingly common diagnosis in children, and these children can present with severe behavior problems and emotionality. Many studies have documented the frequent coexistence of behavior disorders and speech-language disorders. Like other children with behavior disorders, children with bipolar disorder…
Children with Speech Sound Disorders at School: Challenges for Children, Parents and Teachers

ERIC Educational Resources Information Center

Daniel, Graham R.; McLeod, Sharynne

2017-01-01

Teachers play a major role in supporting children's educational, social, and emotional development although may be unprepared for supporting children with speech sound disorders. Interviews with 34 participants including six focus children, their parents, siblings, friends, teachers and other significant adults in their lives highlighted…
Effective Vocal Production in Performance.

ERIC Educational Resources Information Center

King, Robert G.

If speech instructors are to teach students to recreate for an audience an author's intellectual and emotional meanings, they must teach them to use human voice effectively. Seven essential elements of effective vocal production that often pose problems for oral interpretation students should be central to any speech training program: (1)…
The Relationship between Psychopathology and Speech and Language Disorders in Neurologic Patients.

ERIC Educational Resources Information Center

Sapir, Shimon; Aronson, Arnold E.

1990-01-01

This paper reviews findings that suggest a causal relationship between depression, anxiety, or conversion reaction and voice, speech, and language disorders in neurologic patients. The paper emphasizes the need to consider the psychosocial and psychopathological aspects of neurologic communicative disorders, the link between emotional and…
Emotional and physiological responses of fluent listeners while watching the speech of adults who stutter.

PubMed

Guntupalli, Vijaya K; Everhart, D Erik; Kalinowski, Joseph; Nanjundeswaran, Chayadevie; Saltuklaroglu, Tim

2007-01-01

People who stutter produce speech that is characterized by intermittent, involuntary part-word repetitions and prolongations. In addition to these signature acoustic manifestations, those who stutter often display repetitive and fixated behaviours outside the speech producing mechanism (e.g. in the head, arm, fingers, nares, etc.). Previous research has examined the attitudes and perceptions of those who stutter and people who frequently interact with them (e.g. relatives, parents, employers). Results have shown an unequivocal, powerful and robust negative stereotype despite a lack of defined differences in personality structure between people who stutter and normally fluent individuals. However, physiological investigations of listener responses during moments of stuttering are limited. There is a need for data that simultaneously examine physiological responses (e.g. heart rate and galvanic skin conductance) and subjective behavioural responses to stuttering. The pairing of these objective and subjective data may provide information that casts light on the genesis of negative stereotypes associated with stuttering, the development of compensatory mechanisms in those who stutter, and the true impact of stuttering on senders and receivers alike. To compare the emotional and physiological responses of fluent speakers while listening and observing fluent and severe stuttered speech samples. Twenty adult participants (mean age = 24.15 years, standard deviation = 3.40) observed speech samples of two fluent speakers and two speakers who stutter reading aloud. Participants' skin conductance and heart rate changes were measured as physiological responses to stuttered or fluent speech samples. Participants' subjective responses on arousal (excited-calm) and valence (happy-unhappy) dimensions were assessed via the Self-Assessment Manikin (SAM) rating scale with an additional questionnaire comprised of a set of nine bipolar adjectives. Results showed significantly increased skin conductance and lower mean heart rate during the presentation of stuttered speech relative to the presentation of fluent speech samples (p<0.05). Listeners also self-rated themselves as being more aroused, unhappy, nervous, uncomfortable, sad, tensed, unpleasant, avoiding, embarrassed, and annoyed while viewing stuttered speech relative to the fluent speech. These data support the notion that stutter-filled speech can elicit physiological and emotional responses in listeners. Clinicians who treat stuttering should be aware that listeners show involuntary physiological responses to moderate-severe stuttering that probably remain salient over time and contribute to the evolution of negative stereotypes of people who stutter. With this in mind, it is hoped that clinicians can work with people who stutter to develop appropriate coping strategies. The role of amygdala and mirror neural mechanism in physiological and subjective responses to stuttering is discussed.
The MPI Emotional Body Expressions Database for Narrative Scenarios

PubMed Central

Volkova, Ekaterina; de la Rosa, Stephan; Bülthoff, Heinrich H.; Mohler, Betty

2014-01-01

Emotion expression in human-human interaction takes place via various types of information, including body motion. Research on the perceptual-cognitive mechanisms underlying the processing of natural emotional body language can benefit greatly from datasets of natural emotional body expressions that facilitate stimulus manipulation and analysis. The existing databases have so far focused on few emotion categories which display predominantly prototypical, exaggerated emotion expressions. Moreover, many of these databases consist of video recordings which limit the ability to manipulate and analyse the physical properties of these stimuli. We present a new database consisting of a large set (over 1400) of natural emotional body expressions typical of monologues. To achieve close-to-natural emotional body expressions, amateur actors were narrating coherent stories while their body movements were recorded with motion capture technology. The resulting 3-dimensional motion data recorded at a high frame rate (120 frames per second) provides fine-grained information about body movements and allows the manipulation of movement on a body joint basis. For each expression it gives the positions and orientations in space of 23 body joints for every frame. We report the results of physical motion properties analysis and of an emotion categorisation study. The reactions of observers from the emotion categorisation study are included in the database. Moreover, we recorded the intended emotion expression for each motion sequence from the actor to allow for investigations regarding the link between intended and perceived emotions. The motion sequences along with the accompanying information are made available in a searchable MPI Emotional Body Expression Database. We hope that this database will enable researchers to study expression and perception of naturally occurring emotional body expressions in greater depth. PMID:25461382
Human phoneme recognition depending on speech-intrinsic variability.

PubMed

Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

2010-11-01

The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
Degraded speech sound processing in a rat model of fragile X syndrome

PubMed Central

Engineer, Crystal T.; Centanni, Tracy M.; Im, Kwok W.; Rahebi, Kimiya C.; Buell, Elizabeth P.; Kilgard, Michael P.

2014-01-01

Fragile X syndrome is the most common inherited form of intellectual disability and the leading genetic cause of autism. Impaired phonological processing in fragile X syndrome interferes with the development of language skills. Although auditory cortex responses are known to be abnormal in fragile X syndrome, it is not clear how these differences impact speech sound processing. This study provides the first evidence that the cortical representation of speech sounds is impaired in Fmr1 knockout rats, despite normal speech discrimination behavior. Evoked potentials and spiking activity in response to speech sounds, noise burst trains, and tones were significantly degraded in primary auditory cortex, anterior auditory field and the ventral auditory field. Neurometric analysis of speech evoked activity using a pattern classifier confirmed that activity in these fields contains significantly less information about speech sound identity in Fmr1 knockout rats compared to control rats. Responses were normal in the posterior auditory field, which is associated with sound localization. The greatest impairment was observed in the ventral auditory field, which is related to emotional regulation. Dysfunction in the ventral auditory field may contribute to poor emotional regulation in fragile X syndrome and may help explain the observation that later auditory evoked responses are more disturbed in fragile X syndrome compared to earlier responses. Rodent models of fragile X syndrome are likely to prove useful for understanding the biological basis of fragile X syndrome and for testing candidate therapies. PMID:24713347
Self-organizing map classifier for stressed speech recognition

NASA Astrophysics Data System (ADS)

Partila, Pavol; Tovarek, Jaromir; Voznak, Miroslav

2016-05-01

This paper presents a method for detecting speech under stress using Self-Organizing Maps. Most people who are exposed to stressful situations can not adequately respond to stimuli. Army, police, and fire department occupy the largest part of the environment that are typical of an increased number of stressful situations. The role of men in action is controlled by the control center. Control commands should be adapted to the psychological state of a man in action. It is known that the psychological changes of the human body are also reflected physiologically, which consequently means the stress effected speech. Therefore, it is clear that the speech stress recognizing system is required in the security forces. One of the possible classifiers, which are popular for its flexibility, is a self-organizing map. It is one type of the artificial neural networks. Flexibility means independence classifier on the character of the input data. This feature is suitable for speech processing. Human Stress can be seen as a kind of emotional state. Mel-frequency cepstral coefficients, LPC coefficients, and prosody features were selected for input data. These coefficients were selected for their sensitivity to emotional changes. The calculation of the parameters was performed on speech recordings, which can be divided into two classes, namely the stress state recordings and normal state recordings. The benefit of the experiment is a method using SOM classifier for stress speech detection. Results showed the advantage of this method, which is input data flexibility.
Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers

PubMed Central

Mustafa, Mumtaz Begum; Salim, Siti Salwah; Mohamed, Noraini; Al-Qatab, Bassam; Siong, Chng Eng

2014-01-01

Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data. PMID:24466004

Voice emotion recognition by cochlear-implanted children and their normally-hearing peers.

PubMed

Chatterjee, Monita; Zion, Danielle J; Deroche, Mickael L; Burianek, Brooke A; Limb, Charles J; Goren, Alison P; Kulkarni, Aditya M; Christensen, Julie A

2015-04-01

Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups' mean performance is similar to aNHs' performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. This article is part of a Special Issue entitled . Copyright © 2014 Elsevier B.V. All rights reserved.
On the Time Course of Vocal Emotion Recognition

PubMed Central

Pell, Marc D.; Kotz, Sonja A.

2011-01-01

How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing. PMID:22087275
Improving Reading Programs for Emotionally Handicapped Children. Proceedings Highlights of a Special Study Institute (Medina, New York, May 3-5, 1971).

ERIC Educational Resources Information Center

New York State Education Dept., Albany. Div. for Handicapped Children.

Six speeches given at an institute on reading programs for emotionally handicapped children are presented. Jules Abrams first examines the relationship of emotional and personality maladjustments to reading difficulty. Then Clifford Kolson advocates the promotion of informal reading and the proper diagnosis of a child's reading level. A discussion…
Computer Graphics Research Laboratory

DTIC Science & Technology

1994-01-31

loken language (words and contextually appropriate intonation marking topic and focus), fac %. iove- ments (lip shapes, emotions , gaze direction, head...content of speech (scrunching one’s lose when talking about something unpleasant), emotion (wrinkling one’s eyebrows with wov ry), personality...ng at the other person to see how she follows), look for information, express emotion (lookii , downward in case of sadness), or influence another
Attitudes toward speech disorders: sampling the views of Cantonese-speaking Americans.

PubMed

Bebout, L; Arthur, B

1997-01-01

Speech-language pathologists who serve clients from cultural backgrounds that are not familiar to them may encounter culturally influenced attitudinal differences. A questionnaire with statements about 4 speech disorders (dysfluency, cleft pallet, speech of the deaf, and misarticulations) was given to a focus group of Chinese Americans and a comparison group of non-Chinese Americans. The focus group was much more likely to believe that persons with speech disorders could improve their own speech by "trying hard," was somewhat more likely to say that people who use deaf speech and people with cleft palates might be "emotionally disturbed," and generally more likely to view deaf speech as a limitation. The comparison group was more pessimistic about stuttering children's acceptance by their peers than was the focus group. The two subject groups agreed about other items, such as the likelihood that older children with articulation problems are "less intelligent" than their peers.
Developmental Variables and Speech-Language in a Special Education Intervention Model.

ERIC Educational Resources Information Center

Cruz, Maria del C.; Ayala, Myrna

Case studies of eight children with speech and language impairments are presented in a review of the intervention efforts at the Demonstration Center for Preschool Special Education (DCPSE) in Puerto Rico. Five components of the intervention model are examined: social medical history, intelligence, motor development, socio-emotional development,…
The Relationship between Pre-Treatment Clinical Profile and Treatment Outcome in an Integrated Stuttering Program

ERIC Educational Resources Information Center

Huinck, Wendy J.; Langevin, Marilyn; Kully, Deborah; Graamans, Kees; Peters, Herman F. M.; Hulstijn, Wouter

2006-01-01

A procedure for subtyping individuals who stutter and its relationship to treatment outcome is explored. Twenty-five adult participants of the Comprehensive Stuttering Program (CSP) were classified according to: (1) stuttering severity and (2) severity of negative emotions and cognitions associated with their speech problem. Speech characteristics…
Imagery, Concept Formation and Creativity--From Past to Future.

ERIC Educational Resources Information Center

Silverstein, Ora. N. Asael

At the center of the conceptual framework there is visual imagery. Man's emotional and mental behavior is built on archetypal symbols that are the source of creative ideas. Native American pictography, in particular, illustrates this in the correlation between gesture speech and verbal speech. The author's research in this area has included a…
Facing the Problem: Impaired Emotion Recognition During Multimodal Social Information Processing in Borderline Personality Disorder.

PubMed

Niedtfeld, Inga; Defiebre, Nadine; Regenbogen, Christina; Mier, Daniela; Fenske, Sabrina; Kirsch, Peter; Lis, Stefanie; Schmahl, Christian

2017-04-01

Previous research has revealed alterations and deficits in facial emotion recognition in patients with borderline personality disorder (BPD). During interpersonal communication in daily life, social signals such as speech content, variation in prosody, and facial expression need to be considered simultaneously. We hypothesized that deficits in higher level integration of social stimuli contribute to difficulties in emotion recognition in BPD, and heightened arousal might explain this effect. Thirty-one patients with BPD and thirty-one healthy controls were asked to identify emotions in short video clips, which were designed to represent different combinations of the three communication channels: facial expression, speech content, and prosody. Skin conductance was recorded as a measure of sympathetic arousal, while controlling for state dissociation. Patients with BPD showed lower mean accuracy scores than healthy control subjects in all conditions comprising emotional facial expressions. This was true for the condition with facial expression only, and for the combination of all three communication channels. Electrodermal responses were enhanced in BPD only in response to auditory stimuli. In line with the major body of facial emotion recognition studies, we conclude that deficits in the interpretation of facial expressions lead to the difficulties observed in multimodal emotion processing in BPD.
A Synthesis of Relevant Literature on the Development of Emotional Competence: Implications for Design of Augmentative and Alternative Communication Systems.

PubMed

Na, Ji Young; Wilkinson, Krista; Karny, Meredith; Blackstone, Sarah; Stifter, Cynthia

2016-08-01

Emotional competence refers to the ability to identify, respond to, and manage one's own and others' emotions. Emotional competence is critical to many functional outcomes, including making and maintaining friends, academic success, and community integration. There appears to be a link between the development of language and the development of emotional competence in children who use speech. Little information is available about these issues in children who rely on augmentative and alternative communication (AAC). In this article, we consider how AAC systems can be designed to support communication about emotions and the development of emotional competence. Because limited research exists on communication about emotions in a context of aided AAC, theory and research from other fields (e.g., psychology, linguistics, child development) is reviewed to identify key features of emotional competence and their possible implications for AAC design and intervention. The reviewed literature indicated that the research and clinical attention to emotional competence in children with disabilities is encouraging. However, the ideas have not been considered specifically in the context of aided AAC. On the basis of the reviewed literature, we offer practical suggestions for system design and AAC use for communication about emotions with children who have significant disabilities. Three key elements of discussing emotions (i.e., emotion name, reason, and solution) are suggested for inclusion in order to provide these children with opportunities for a full range of discussion about emotions. We argue that supporting communication about emotions is as important for children who use AAC as it is for children who are learning speech. This article offers a means to integrate information from other fields for the purpose of enriching AAC supports.
Intonation processing deficits of emotional words among Mandarin Chinese speakers with congenital amusia: an ERP study.

PubMed

Lu, Xuejing; Ho, Hao Tam; Liu, Fang; Wu, Daxing; Thompson, William F

2015-01-01

Congenital amusia is a disorder that is known to affect the processing of musical pitch. Although individuals with amusia rarely show language deficits in daily life, a number of findings point to possible impairments in speech prosody that amusic individuals may compensate for by drawing on linguistic information. Using EEG, we investigated (1) whether the processing of speech prosody is impaired in amusia and (2) whether emotional linguistic information can compensate for this impairment. Twenty Chinese amusics and 22 matched controls were presented pairs of emotional words spoken with either statement or question intonation while their EEG was recorded. Their task was to judge whether the intonations were the same. Amusics exhibited impaired performance on the intonation-matching task for emotional linguistic information, as their performance was significantly worse than that of controls. EEG results showed a reduced N2 response to incongruent intonation pairs in amusics compared with controls, which likely reflects impaired conflict processing in amusia. However, our EEG results also indicated that amusics were intact in early sensory auditory processing, as revealed by a comparable N1 modulation in both groups. We propose that the impairment in discriminating speech intonation observed among amusic individuals may arise from an inability to access information extracted at early processing stages. This, in turn, could reflect a disconnection between low-level and high-level processing.
Delivering Bad News: Attitudes, Feelings, and Practice Characteristics Among Speech-Language Pathologists.

PubMed

Gold, Rinat; Gold, Azgad

2018-02-06

The purpose of this study was to examine the attitudes, feelings, and practice characteristics of speech-language pathologists (SLPs) in Israel regarding the subject of delivering bad news. One hundred and seventy-three Israeli SLPs answered an online survey. Respondents represented SLPs in Israel in all stages of vocational experience, with varying academic degrees, from a variety of employment settings. The survey addressed emotions involved in the process of delivering bad news, training on this subject, and background information of the respondents. Frequency distributions of the responses of the participants were determined, and Pearson correlations were computed to determine the relation between years of occupational experience and the following variables: frequency of delivering bad news, opinions regarding training, and emotions experienced during the process of bad news delivery. Our survey showed that bad news delivery is a task that most participants are confronted with from the very beginning of their careers. Participants regarded training in the subject of delivering bad news as important but, at the same time, reported receiving relatively little training on this subject. In addition, our survey showed that negative emotions are involved in the process of delivering bad news. Training SLPs on specific techniques is required for successfully delivering bad news. The emotional burden associated with breaking bad news in the field of speech-language pathology should be noticed and addressed.
Intonation processing deficits of emotional words among Mandarin Chinese speakers with congenital amusia: an ERP study

PubMed Central

Lu, Xuejing; Ho, Hao Tam; Liu, Fang; Wu, Daxing; Thompson, William F.

2015-01-01

Background: Congenital amusia is a disorder that is known to affect the processing of musical pitch. Although individuals with amusia rarely show language deficits in daily life, a number of findings point to possible impairments in speech prosody that amusic individuals may compensate for by drawing on linguistic information. Using EEG, we investigated (1) whether the processing of speech prosody is impaired in amusia and (2) whether emotional linguistic information can compensate for this impairment. Method: Twenty Chinese amusics and 22 matched controls were presented pairs of emotional words spoken with either statement or question intonation while their EEG was recorded. Their task was to judge whether the intonations were the same. Results: Amusics exhibited impaired performance on the intonation-matching task for emotional linguistic information, as their performance was significantly worse than that of controls. EEG results showed a reduced N2 response to incongruent intonation pairs in amusics compared with controls, which likely reflects impaired conflict processing in amusia. However, our EEG results also indicated that amusics were intact in early sensory auditory processing, as revealed by a comparable N1 modulation in both groups. Conclusion: We propose that the impairment in discriminating speech intonation observed among amusic individuals may arise from an inability to access information extracted at early processing stages. This, in turn, could reflect a disconnection between low-level and high-level processing. PMID:25914659
An algorithm of improving speech emotional perception for hearing aid

NASA Astrophysics Data System (ADS)

Xi, Ji; Liang, Ruiyu; Fei, Xianju

2017-07-01

In this paper, a speech emotion recognition (SER) algorithm was proposed to improve the emotional perception of hearing-impaired people. The algorithm utilizes multiple kernel technology to overcome the drawback of SVM: slow training speed. Firstly, in order to improve the adaptive performance of Gaussian Radial Basis Function (RBF), the parameter determining the nonlinear mapping was optimized on the basis of Kernel target alignment. Then, the obtained Kernel Function was used as the basis kernel of Multiple Kernel Learning (MKL) with slack variable that could solve the over-fitting problem. However, the slack variable also brings the error into the result. Therefore, a soft-margin MKL was proposed to balance the margin against the error. Moreover, the relatively iterative algorithm was used to solve the combination coefficients and hyper-plane equations. Experimental results show that the proposed algorithm can acquire an accuracy of 90% for five kinds of emotions including happiness, sadness, anger, fear and neutral. Compared with KPCA+CCA and PIM-FSVM, the proposed algorithm has the highest accuracy.
Understanding speaker attitudes from prosody by adults with Parkinson's disease.

PubMed

Monetta, Laura; Cheang, Henry S; Pell, Marc D

2008-09-01

The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).
Multimodal human communication--targeting facial expressions, speech content and prosody.

PubMed

Regenbogen, Christina; Schneider, Daniel A; Gur, Raquel E; Schneider, Frank; Habel, Ute; Kellermann, Thilo

2012-05-01

Human communication is based on a dynamic information exchange of the communication channels facial expressions, prosody, and speech content. This fMRI study elucidated the impact of multimodal emotion processing and the specific contribution of each channel on behavioral empathy and its prerequisites. Ninety-six video clips displaying actors who told self-related stories were presented to 27 healthy participants. In two conditions, all channels uniformly transported only emotional or neutral information. Three conditions selectively presented two emotional channels and one neutral channel. Subjects indicated the actors' emotional valence and their own while fMRI was recorded. Activation patterns of tri-channel emotional communication reflected multimodal processing and facilitative effects for empathy. Accordingly, subjects' behavioral empathy rates significantly deteriorated once one source was neutral. However, emotionality expressed via two of three channels yielded activation in a network associated with theory-of-mind-processes. This suggested participants' effort to infer mental states of their counterparts and was accompanied by a decline of behavioral empathy, driven by the participants' emotional responses. Channel-specific emotional contributions were present in modality-specific areas. The identification of different network-nodes associated with human interactions constitutes a prerequisite for understanding dynamics that underlie multimodal integration and explain the observed decline in empathy rates. This task might also shed light on behavioral deficits and neural changes that accompany psychiatric diseases. Copyright © 2012 Elsevier Inc. All rights reserved.
Analysis of glottal source parameters in Parkinsonian speech.

PubMed

Hanratty, Jane; Deegan, Catherine; Walsh, Mary; Kirkpatrick, Barry

2016-08-01

Diagnosis and monitoring of Parkinson's disease has a number of challenges as there is no definitive biomarker despite the broad range of symptoms. Research is ongoing to produce objective measures that can either diagnose Parkinson's or act as an objective decision support tool. Recent research on speech based measures have demonstrated promising results. This study aims to investigate the characteristics of the glottal source signal in Parkinsonian speech. An experiment is conducted in which a selection of glottal parameters are tested for their ability to discriminate between healthy and Parkinsonian speech. Results for each glottal parameter are presented for a database of 50 healthy speakers and a database of 16 speakers with Parkinsonian speech symptoms. Receiver operating characteristic (ROC) curves were employed to analyse the results and the area under the ROC curve (AUC) values were used to quantify the performance of each glottal parameter. The results indicate that glottal parameters can be used to discriminate between healthy and Parkinsonian speech, although results varied for each parameter tested. For the task of separating healthy and Parkinsonian speech, 2 out of the 7 glottal parameters tested produced AUC values of over 0.9.
Preliminary Analysis of Automatic Speech Recognition and Synthesis Technology.

DTIC Science & Technology

1983-05-01

16.311 % a. Seale In/Se"l tAL4 lrs e y i s 2 I ROM men "Ig eddiei, m releerla ons leveltc. Ŗ dots ghoeea INDtISTRtAIJ%6LITARY SPEECH SYNTHESIS PRODUCTS...saquence The SC-01 Suech Syntheszer conftains 64 cf, arent poneme~hs which are accessed try A 6-tht code. 1 - the proper sequ.enti omthnatiors of thoe...connected speech input with widely differing emotional states, diverse accents, and substantial nonperiodic background noise input. As noted previously
Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?

PubMed

Ali, Zulfiqar; Alsulaiman, Mansour; Muhammad, Ghulam; Elamvazuthi, Irraivan; Al-Nasheri, Ahmed; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H

2017-05-01

A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals

DTIC Science & Technology

1988-10-12

Secusrity Clamifieation, Nlassively-Parallel Architectures for Automa ic Recognitio of Visua, Speech Signals 12. PERSONAL AUTHOR(S) Terrence J...characteristics of speech from tJhe, visual speech signals. Neural networks have been trained on a database of vowels. The rqw images of faces , aligned and...images of faces , aligned and preprocessed, were used as input to these network which were trained to estimate the corresponding envelope of the

Vocal Tract Representation in the Recognition of Cerebral Palsied Speech

ERIC Educational Resources Information Center

Rudzicz, Frank; Hirst, Graeme; van Lieshout, Pascal

2012-01-01

Purpose: In this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine. Method: Data were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech.…
Parental Numeric Language Input to Mandarin Chinese and English Speaking Preschool Children

ERIC Educational Resources Information Center

Chang, Alicia; Sandhofer, Catherine M.; Adelchanow, Lauren; Rottman, Benjamin

2011-01-01

The present study examined the number-specific parental language input to Mandarin- and English-speaking preschool-aged children. Mandarin and English transcripts from the CHILDES database were examined for amount of numeric speech, specific types of numeric speech and syntactic frames in which numeric speech appeared. The results showed that…
Got EQ?: Increasing Cultural and Clinical Competence through Emotional Intelligence

ERIC Educational Resources Information Center

Robertson, Shari A.

2007-01-01

Cultural intelligence has been described across three parameters of human behavior: cognitive intelligence, emotional intelligence (EQ), and physical intelligence. Each contributes a unique and important perspective to the ability of speech-language pathologists and audiologists to provide benefits to their clients regardless of cultural…
Interdependence of linguistic and indexical speech perception skills in school-age children with early cochlear implantation.

PubMed

Geers, Ann E; Davidson, Lisa S; Uchanski, Rosalie M; Nicholas, Johanna G

2013-09-01

This study documented the ability of experienced pediatric cochlear implant (CI) users to perceive linguistic properties (what is said) and indexical attributes (emotional intent and talker identity) of speech, and examined the extent to which linguistic (LSP) and indexical (ISP) perception skills are related. Preimplant-aided hearing, age at implantation, speech processor technology, CI-aided thresholds, sequential bilateral cochlear implantation, and academic integration with hearing age-mates were examined for their possible relationships to both LSP and ISP skills. Sixty 9- to 12-year olds, first implanted at an early age (12 to 38 months), participated in a comprehensive test battery that included the following LSP skills: (1) recognition of monosyllabic words at loud and soft levels, (2) repetition of phonemes and suprasegmental features from nonwords, and (3) recognition of key words from sentences presented within a noise background, and the following ISP skills: (1) discrimination of across-gender and within-gender (female) talkers and (2) identification and discrimination of emotional content from spoken sentences. A group of 30 age-matched children without hearing loss completed the nonword repetition, and talker- and emotion-perception tasks for comparison. Word-recognition scores decreased with signal level from a mean of 77% correct at 70 dB SPL to 52% at 50 dB SPL. On average, CI users recognized 50% of key words presented in sentences that were 9.8 dB above background noise. Phonetic properties were repeated from nonword stimuli at about the same level of accuracy as suprasegmental attributes (70 and 75%, respectively). The majority of CI users identified emotional content and differentiated talkers significantly above chance levels. Scores on LSP and ISP measures were combined into separate principal component scores and these components were highly correlated (r = 0.76). Both LSP and ISP component scores were higher for children who received a CI at the youngest ages, upgraded to more recent CI technology and had lower CI-aided thresholds. Higher scores, for both LSP and ISP components, were also associated with higher language levels and mainstreaming at younger ages. Higher ISP scores were associated with better social skills. Results strongly support a link between indexical and linguistic properties in perceptual analysis of speech. These two channels of information appear to be processed together in parallel by the auditory system and are inseparable in perception. Better speech performance, for both linguistic and indexical perception, is associated with younger age at implantation and use of more recent speech processor technology. Children with better speech perception demonstrated better spoken language, earlier academic mainstreaming, and placement in more typically sized classrooms (i.e., >20 students). Well-developed social skills were more highly associated with the ability to discriminate the nuances of talker identity and emotion than with the ability to recognize words and sentences through listening. The extent to which early cochlear implantation enabled these early-implanted children to make use of both linguistic and indexical properties of speech influenced not only their development of spoken language, but also their ability to function successfully in a hearing world.
Interdependence of Linguistic and Indexical Speech Perception Skills in School-Aged Children with Early Cochlear Implantation

PubMed Central

Geers, Ann; Davidson, Lisa; Uchanski, Rosalie; Nicholas, Johanna

2013-01-01

Objectives This study documented the ability of experienced pediatric cochlear implant (CI) users to perceive linguistic properties (what is said) and indexical attributes (emotional intent and talker identity) of speech, and examined the extent to which linguistic (LSP) and indexical (ISP) perception skills are related. Pre-implant aided hearing, age at implantation, speech processor technology, CI-aided thresholds, sequential bilateral cochlear implantation, and academic integration with hearing age-mates were examined for their possible relationships to both LSP and ISP skills. Design Sixty 9–12 year olds, first implanted at an early age (12–38 months), participated in a comprehensive test battery that included the following LSP skills: 1) recognition of monosyllabic words at loud and soft levels, 2) repetition of phonemes and suprasegmental features from non-words, and 3) recognition of keywords from sentences presented within a noise background, and the following ISP skills: 1) discrimination of male from female and female from female talkers and 2) identification and discrimination of emotional content from spoken sentences. A group of 30 age-matched children without hearing loss completed the non-word repetition, and talker- and emotion-perception tasks for comparison. Results Word recognition scores decreased with signal level from a mean of 77% correct at 70 dB SPL to 52% at 50 dB SPL. On average, CI users recognized 50% of keywords presented in sentences that were 9.8 dB above background noise. Phonetic properties were repeated from non-word stimuli at about the same level of accuracy as suprasegmental attributes (70% and 75%, respectively). The majority of CI users identified emotional content and differentiated talkers significantly above chance levels. Scores on LSP and ISP measures were combined into separate principal component scores and these components were highly correlated (r = .76). Both LSP and ISP component scores were higher for children who received a CI at the youngest ages, upgraded to more recent CI technology and had lower CI-aided thresholds. Higher scores, for both LSP and ISP components, were also associated with higher language levels and mainstreaming at younger ages. Higher ISP scores were associated with better social skills. Conclusions Results strongly support a link between indexical and linguistic properties in perceptual analysis of speech. These two channels of information appear to be processed together in parallel by the auditory system and are inseparable in perception. Better speech performance, for both linguistic and indexical perception, is associated with younger age at implantation and use of more recent speech processor technology. Children with better speech perception demonstrated better spoken language, earlier academic mainstreaming, and placement in more typically-sized classrooms (i.e., >20 students). Well-developed social skills were more highly associated with the ability to discriminate the nuances of talker identity and emotion than with the ability to recognize words and sentences through listening. The extent to which early cochlear implantation enabled these early-implanted children to make use of both linguistic and indexical properties of speech influenced not only their development of spoken language, but also their ability to function successfully in a hearing world. PMID:23652814
Reference-free automatic quality assessment of tracheoesophageal speech.

PubMed

Huang, Andy; Falk, Tiago H; Chan, Wai-Yip; Parsa, Vijay; Doyle, Philip

2009-01-01

Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in conjunction with a recently introduced speech modulation spectrum measure. Tests performed on two TE speech databases demonstrate that the modulation spectral measure and a subset of features in the standard ITU-T P.563 algorithm estimate TE speech quality with better correlation (up to 0.9) than previously proposed features.
A New Standardized Emotional Film Database for Asian Culture

PubMed Central

Deng, Yaling; Yang, Meng; Zhou, Renlai

2017-01-01

Researchers interested in emotions have endeavored to elicit emotional responses in the laboratory and have determined that films were one of the most effective ways to elicit emotions. The present study presented the development of a new standardized emotional film database for Asian culture. There were eight kinds of emotion: fear, disgust, anger, sadness, neutrality, surprise, amusement, and pleasure. Each kind included eight film clips, and a total of 64 emotional films were viewed by 110 participants. We analyzed both the subjective experience (valence, arousal, motivation, and dominance) and physiological response (heart rate and respiration rate) to the presentation of each film. The results of the subjective ratings indicated that our set of 64 films successfully elicited the target emotions. Heart rate declined while watching high-arousal films compared to neutral ones. Films that expressed amusement elicited the lowest respiration rate, whereas fear elicited the highest. The amount and category of emotional films in this database were considerable. This database may help researchers choose applicable emotional films for study according to their own purposes and help in studies of cultural differences in emotion. PMID:29163312
Brain Response to a Humanoid Robot in Areas Implicated in the Perception of Human Emotional Gestures

PubMed Central

Chaminade, Thierry; Zecca, Massimiliano; Blakemore, Sarah-Jayne; Takanishi, Atsuo; Frith, Chris D.; Micera, Silvestro; Dario, Paolo; Rizzolatti, Giacomo; Gallese, Vittorio; Umiltà, Maria Alessandra

2010-01-01

Background The humanoid robot WE4-RII was designed to express human emotions in order to improve human-robot interaction. We can read the emotions depicted in its gestures, yet might utilize different neural processes than those used for reading the emotions in human agents. Methodology Here, fMRI was used to assess how brain areas activated by the perception of human basic emotions (facial expression of Anger, Joy, Disgust) and silent speech respond to a humanoid robot impersonating the same emotions, while participants were instructed to attend either to the emotion or to the motion depicted. Principal Findings Increased responses to robot compared to human stimuli in the occipital and posterior temporal cortices suggest additional visual processing when perceiving a mechanical anthropomorphic agent. In contrast, activity in cortical areas endowed with mirror properties, like left Broca's area for the perception of speech, and in the processing of emotions like the left anterior insula for the perception of disgust and the orbitofrontal cortex for the perception of anger, is reduced for robot stimuli, suggesting lesser resonance with the mechanical agent. Finally, instructions to explicitly attend to the emotion significantly increased response to robot, but not human facial expressions in the anterior part of the left inferior frontal gyrus, a neural marker of motor resonance. Conclusions Motor resonance towards a humanoid robot, but not a human, display of facial emotion is increased when attention is directed towards judging emotions. Significance Artificial agents can be used to assess how factors like anthropomorphism affect neural response to the perception of human actions. PMID:20657777
Discharge experiences of speech-language pathologists working in Cyprus and Greece.

PubMed

Kambanaros, Maria

2010-08-01

Post-termination relationships are complex because the client may need additional services and it may be difficult to determine when the speech-language pathologist-client relationship is truly terminated. In my contribution to this scientific forum, discharge experiences from speech-language pathologists working in Cyprus and Greece will be explored in search of commonalities and differences in the way in which pathologists end therapy from different cultural perspectives. Within this context the personal impact on speech-language pathologists of the discharge process will be highlighted. Inherent in this process is how speech-language pathologists learn to hold their feelings, anxieties and reactions when communicating discharge to clients. Overall speech-language pathologists working in Cyprus and Greece experience similar emotional responses to positive and negative therapy endings as speech-language pathologists working in Australia. The major difference is that Cypriot and Greek therapists face serious limitations in moving their clients on after therapy has ended.
Vulnerability to Bullying in Children with a History of Specific Speech and Language Difficulties

ERIC Educational Resources Information Center

Lindsay, Geoff; Dockrell, Julie E.; Mackie, Clare

2008-01-01

This study examined the susceptibility to problems with peer relationships and being bullied in a UK sample of 12-year-old children with a history of specific speech and language difficulties. Data were derived from the children's self-reports and the reports of parents and teachers using measures of victimization, emotional and behavioral…
Priming of Non-Speech Vocalizations in Male Adults: The Influence of the Speaker's Gender

ERIC Educational Resources Information Center

Fecteau, Shirley; Armony, Jorge L.; Joanette, Yves; Belin, Pascal

2004-01-01

Previous research reported a priming effect for voices. However, the type of information primed is still largely unknown. In this study, we examined the influence of speaker's gender and emotional category of the stimulus on priming of non-speech vocalizations in 10 male participants, who performed a gender identification task. We found a…
The Cerebellar Mutism Syndrome and Its Relation to Cerebellar Cognitive Function and the Cerebellar Cognitive Affective Disorder

ERIC Educational Resources Information Center

Wells, Elizabeth M.; Walsh, Karin S.; Khademian, Zarir P.; Keating, Robert F.; Packer, Roger J.

2008-01-01

The postoperative cerebellar mutism syndrome (CMS), consisting of diminished speech output, hypotonia, ataxia, and emotional lability, occurs after surgery in up to 25% of patients with medulloblastoma and occasionally after removal of other posterior fossa tumors. Although the mutism is transient, speech rarely normalizes and the syndrome is…
Separating the Problem and the Person: Insights from Narrative Therapy with People Who Stutter

ERIC Educational Resources Information Center

Ryan, Fiona; O'Dwyer, Mary; Leahy, Margaret M.

2015-01-01

Stuttering is a complex disorder of speech that encompasses motor speech and emotional and cognitive factors. The use of narrative therapy is described here, focusing on the stories that clients tell about the problems associated with stuttering that they have encountered in their lives. Narrative therapy uses these stories to understand, analyze,…
Related Service Personnel's Resource Guide for Supporting Programs for Emotionally Handicapped Students.

ERIC Educational Resources Information Center

Indiana State Dept. of Education, Indianapolis. Div. of Special Education.

The guide provides an information resource for related and supportive services personnel (e.g., school nurse, physical therapist, speech language pathologist) in their interactions with emotionally handicapped (EH) students. Following a definition of EH students, the first of six brief chapters discusses student characteristics, presents three…
Nonverbal Effects in Memory for Dialogue.

ERIC Educational Resources Information Center

Narvaez, Alice; Hertel, Paula T.

Memory for everyday conversational speech may be influenced by the nonverbally communicated emotion of the speaker. In order to investigate this premise, three videotaped scenes with bipolar emotional perspectives (joy/fear about going away to college, fear/anger about having been robbed, and disgust/interest regarding a friend's infidelity) were…
Advances in natural language processing.

PubMed

Hirschberg, Julia; Manning, Christopher D

2015-07-17

Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.
Two Different Communication Genres and Implications for Vocabulary Development and Learning to Read

ERIC Educational Resources Information Center

Massaro, Dominic W.

2015-01-01

This study examined potential differences in vocabulary found in picture books and adult's speech to children and to other adults. Using a small sample of various sources of speech and print, Hayes observed that print had a more extensive vocabulary than speech. The current analyses of two different spoken language databases and an assembled…
Cost-sensitive learning for emotion robust speaker recognition.

PubMed

Li, Dongdong; Yang, Yingchun; Dai, Weihui

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.
Cost-Sensitive Learning for Emotion Robust Speaker Recognition

PubMed Central

Li, Dongdong; Yang, Yingchun

2014-01-01

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492
The Effects of Alcohol on the Emotional Displays of Whites in Interracial Groups

PubMed Central

Fairbairn, Catharine E.; Sayette, Michael A.; Levine, John M.; Cohn, Jeffrey F.; Creswell, Kasey G.

2017-01-01

Discomfort during interracial interactions is common among Whites in the U.S. and is linked to avoidance of interracial encounters. While the negative consequences of interracial discomfort are well-documented, understanding of its causes is still incomplete. Alcohol consumption has been shown to decrease negative emotions caused by self-presentational concern but increase negative emotions associated with racial prejudice. Using novel behavioral-expressive measures of emotion, we examined the impact of alcohol on displays of discomfort among 92 White individuals interacting in all-White or interracial groups. We used the Facial Action Coding System and comprehensive content-free speech analyses to examine affective and behavioral dynamics during these 36-minute exchanges (7.9 million frames of video data). Among Whites consuming nonalcoholic beverages, those assigned to interracial groups evidenced more facial and speech displays of discomfort than those in all-White groups. In contrast, among intoxicated Whites there were no differences in displays of discomfort between interracial and all-White groups. Results highlight the central role of self-presentational concerns in interracial discomfort and offer new directions for applying theory and methods from emotion science to the examination of intergroup relations. PMID:23356562

The effects of alcohol on the emotional displays of Whites in interracial groups.

PubMed

Fairbairn, Catharine E; Sayette, Michael A; Levine, John M; Cohn, Jeffrey F; Creswell, Kasey G

2013-06-01

Discomfort during interracial interactions is common among Whites in the U.S. and is linked to avoidance of interracial encounters. While the negative consequences of interracial discomfort are well-documented, understanding of its causes is still incomplete. Alcohol consumption has been shown to decrease negative emotions caused by self-presentational concern but increase negative emotions associated with racial prejudice. Using novel behavioral-expressive measures of emotion, we examined the impact of alcohol on displays of discomfort among 92 White individuals interacting in all-White or interracial groups. We used the Facial Action Coding System and comprehensive content-free speech analyses to examine affective and behavioral dynamics during these 36-min exchanges (7.9 million frames of video data). Among Whites consuming nonalcoholic beverages, those assigned to interracial groups evidenced more facial and speech displays of discomfort than those in all-White groups. In contrast, among intoxicated Whites there were no differences in displays of discomfort between interracial and all-White groups. Results highlight the central role of self-presentational concerns in interracial discomfort and offer new directions for applying theory and methods from emotion science to the examination of intergroup relations.
An informatics approach to integrating genetic and neurological data in speech and language neuroscience.

PubMed

Bohland, Jason W; Myers, Emma M; Kim, Esther

2014-01-01

A number of heritable disorders impair the normal development of speech and language processes and occur in large numbers within the general population. While candidate genes and loci have been identified, the gap between genotype and phenotype is vast, limiting current understanding of the biology of normal and disordered processes. This gap exists not only in our scientific knowledge, but also in our research communities, where genetics researchers and speech, language, and cognitive scientists tend to operate independently. Here we describe a web-based, domain-specific, curated database that represents information about genotype-phenotype relations specific to speech and language disorders, as well as neuroimaging results demonstrating focal brain differences in relevant patients versus controls. Bringing these two distinct data types into a common database ( http://neurospeech.org/sldb ) is a first step toward bringing molecular level information into cognitive and computational theories of speech and language function. One bridge between these data types is provided by densely sampled profiles of gene expression in the brain, such as those provided by the Allen Brain Atlases. Here we present results from exploratory analyses of human brain gene expression profiles for genes implicated in speech and language disorders, which are annotated in our database. We then discuss how such datasets can be useful in the development of computational models that bridge levels of analysis, necessary to provide a mechanistic understanding of heritable language disorders. We further describe our general approach to information integration, discuss important caveats and considerations, and offer a specific but speculative example based on genes implicated in stuttering and basal ganglia function in speech motor control.
Emotional speech synchronizes brains across listeners and engages large-scale dynamic brain networks

PubMed Central

Nummenmaa, Lauri; Saarimäki, Heini; Glerean, Enrico; Gotsopoulos, Athanasios; Jääskeläinen, Iiro P.; Hari, Riitta; Sams, Mikko

2014-01-01

Speech provides a powerful means for sharing emotions. Here we implement novel intersubject phase synchronization and whole-brain dynamic connectivity measures to show that networks of brain areas become synchronized across participants who are listening to emotional episodes in spoken narratives. Twenty participants' hemodynamic brain activity was measured with functional magnetic resonance imaging (fMRI) while they listened to 45-s narratives describing unpleasant, neutral, and pleasant events spoken in neutral voice. After scanning, participants listened to the narratives again and rated continuously their feelings of pleasantness–unpleasantness (valence) and of arousal–calmness. Instantaneous intersubject phase synchronization (ISPS) measures were computed to derive both multi-subject voxel-wise similarity measures of hemodynamic activity and inter-area functional dynamic connectivity (seed-based phase synchronization, SBPS). Valence and arousal time series were subsequently used to predict the ISPS and SBPS time series. High arousal was associated with increased ISPS in the auditory cortices and in Broca's area, and negative valence was associated with enhanced ISPS in the thalamus, anterior cingulate, lateral prefrontal, and orbitofrontal cortices. Negative valence affected functional connectivity of fronto-parietal, limbic (insula, cingulum) and fronto-opercular circuitries, and positive arousal affected the connectivity of the striatum, amygdala, thalamus, cerebellum, and dorsal frontal cortex. Positive valence and negative arousal had markedly smaller effects. We propose that high arousal synchronizes the listeners' sound-processing and speech-comprehension networks, whereas negative valence synchronizes circuitries supporting emotional and self-referential processing. PMID:25128711
A cross-linguistic fMRI study of perception of intonation and emotion in Chinese.

PubMed

Gandour, Jack; Wong, Donald; Dzemidzic, Mario; Lowe, Mark; Tong, Yunxia; Li, Xiaojian

2003-03-01

Conflicting data from neurobehavioral studies of the perception of intonation (linguistic) and emotion (affective) in spoken language highlight the need to further examine how functional attributes of prosodic stimuli are related to hemispheric differences in processing capacity. Because of similarities in their acoustic profiles, intonation and emotion permit us to assess to what extent hemispheric lateralization of speech prosody depends on functional instead of acoustical properties. To examine how the brain processes linguistic and affective prosody, an fMRI study was conducted using Chinese, a tone language in which both intonation and emotion may be signaled prosodically, in addition to lexical tones. Ten Chinese and 10 English subjects were asked to perform discrimination judgments of intonation (I: statement, question) and emotion (E: happy, angry, sad) presented in semantically neutral Chinese sentences. A baseline task required passive listening to the same speech stimuli (S). In direct between-group comparisons, the Chinese group showed left-sided frontoparietal activation for both intonation (I vs. S) and emotion (E vs. S) relative to baseline. When comparing intonation relative to emotion (I vs. E), the Chinese group demonstrated prefrontal activation bilaterally; parietal activation in the left hemisphere only. The reverse comparison (E vs. I), on the other hand, revealed that activation occurred in anterior and posterior prefrontal regions of the right hemisphere only. These findings show that some aspects of perceptual processing of emotion are dissociable from intonation, and, moreover, that they are mediated by the right hemisphere. Copyright 2003 Wiley-Liss, Inc.
Geriatrics (Geriatrician)

MedlinePlus

... worker Consultant pharmacist Nutritionist Physical therapist Occupational therapist Speech and hearing specialist Psychiatrist Psychologist These professionals evaluate the older person’s medical, social, emotional, and other needs. ...
45 CFR 85.3 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-10-01

... diseases and conditions as orthopedic, visual, speech and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
11 CFR 9420.2 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-01-01

... diseases and conditions as orthopedic; visual, speech, and hearing impairments; cerebral palsy; epilepsy; muscular dystrophy; multiple sclerosis; cancer; heart disease; diabetes; mental retardation; emotional...
13 CFR 136.103 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-01-01

... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
11 CFR 9420.2 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-01-01

... diseases and conditions as orthopedic; visual, speech, and hearing impairments; cerebral palsy; epilepsy; muscular dystrophy; multiple sclerosis; cancer; heart disease; diabetes; mental retardation; emotional...
13 CFR 136.103 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-01-01

... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
45 CFR 85.3 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-10-01

... diseases and conditions as orthopedic, visual, speech and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
11 CFR 6.103 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-01-01

... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
11 CFR 6.103 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-01-01

... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
43 CFR 17.503 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-10-01

... orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, drug addiction, and...
43 CFR 17.503 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-10-01

... orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, drug addiction, and...
43 CFR 17.503 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-10-01

... orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, drug addiction, and...
Preserved appreciation of aesthetic elements of speech and music prosody in an amusic individual: A holistic approach.

PubMed

Loutrari, Ariadne; Lorch, Marjorie Perlman

2017-07-01

We present a follow-up study on the case of a Greek amusic adult, B.Z., whose impaired performance on scale, contour, interval, and meter was reported by Paraskevopoulos, Tsapkini, and Peretz in 2010, employing a culturally-tailored version of the Montreal Battery of Evaluation of Amusia. In the present study, we administered a novel set of perceptual judgement tasks designed to investigate the ability to appreciate holistic prosodic aspects of 'expressiveness' and emotion in phrase length music and speech stimuli. Our results show that, although diagnosed as a congenital amusic, B.Z. scored as well as healthy controls (N=24) on judging 'expressiveness' and emotional prosody in both speech and music stimuli. These findings suggest that the ability to make perceptual judgements about such prosodic qualities may be preserved in individuals who demonstrate difficulties perceiving basic musical features such as melody or rhythm. B.Z.'s case yields new insights into amusia and the processing of speech and music prosody through a holistic approach. The employment of novel stimuli with relatively fewer non-naturalistic manipulations, as developed for this study, may be a useful tool for revealing unexplored aspects of music and speech cognition and offer the possibility to further the investigation of the perception of acoustic streams in more authentic auditory conditions. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Expressed Emotion Displayed by the Mothers of Inhibited and Uninhibited Preschool-Aged Children

ERIC Educational Resources Information Center

Raishevich, Natoshia; Kennedy, Susan J.; Rapee, Ronald M.

2010-01-01

In the current study, the Five Minute Speech Sample was used to assess the association between parent attitudes and children's behavioral inhibition in mothers of 120 behaviorally inhibited (BI) and 37 behaviorally uninhibited preschool-aged children. Mothers of BI children demonstrated significantly higher levels of emotional over-involvement…
Early Cochlear Implant Experience and Emotional Functioning during Childhood: Loneliness in Middle and Late Childhood

ERIC Educational Resources Information Center

Schorr, Efrat A.

2006-01-01

The importance of early intervention for children with hearing loss has been demonstrated persuasively in areas including speech perception and production and spoken language. The present research shows that feelings of loneliness, a significant emotional outcome, are affected by the age at which children receive intervention with cochlear…
Influences of Semantic and Prosodic Cues on Word Repetition and Categorization in Autism

ERIC Educational Resources Information Center

Singh, Leher; Harrow, MariLouise S.

2014-01-01

Purpose: To investigate sensitivity to prosodic and semantic cues to emotion in individuals with high-functioning autism (HFA). Method: Emotional prosody and semantics were independently manipulated to assess the relative influence of prosody versus semantics on speech processing. A sample of 10-year-old typically developing children (n = 10) and…

Spontaneous Speech Events in Two Speech Databases of Human-Computer and Human-Human Dialogs in Spanish

ERIC Educational Resources Information Center

Rodriguez, Luis J.; Torres, M. Ines

2006-01-01

Previous works in English have revealed that disfluencies follow regular patterns and that incorporating them into the language model of a speech recognizer leads to lower perplexities and sometimes to a better performance. Although work on disfluency modeling has been applied outside the English community (e.g., in Japanese), as far as we know…
Motherese in Interaction: At the Cross-Road of Emotion and Cognition? (A Systematic Review)

PubMed Central

Saint-Georges, Catherine; Chetouani, Mohamed; Cassel, Raquel; Apicella, Fabio; Mahdhaoui, Ammar; Muratori, Filippo; Laznik, Marie-Christine; Cohen, David

2013-01-01

Various aspects of motherese also known as infant-directed speech (IDS) have been studied for many years. As it is a widespread phenomenon, it is suspected to play some important roles in infant development. Therefore, our purpose was to provide an update of the evidence accumulated by reviewing all of the empirical or experimental studies that have been published since 1966 on IDS driving factors and impacts. Two databases were screened and 144 relevant studies were retained. General linguistic and prosodic characteristics of IDS were found in a variety of languages, and IDS was not restricted to mothers. IDS varied with factors associated with the caregiver (e.g., cultural, psychological and physiological) and the infant (e.g., reactivity and interactive feedback). IDS promoted infants’ affect, attention and language learning. Cognitive aspects of IDS have been widely studied whereas affective ones still need to be developed. However, during interactions, the following two observations were notable: (1) IDS prosody reflects emotional charges and meets infants’ preferences, and (2) mother-infant contingency and synchrony are crucial for IDS production and prolongation. Thus, IDS is part of an interactive loop that may play an important role in infants’ cognitive and social development. PMID:24205112
Distinct frontal regions subserve evaluation of linguistic and emotional aspects of speech intonation.

PubMed

Wildgruber, D; Hertrich, I; Riecker, A; Erb, M; Anders, S; Grodd, W; Ackermann, H

2004-12-01

In addition to the propositional content of verbal utterances, significant linguistic and emotional information is conveyed by the tone of speech. To differentiate brain regions subserving processing of linguistic and affective aspects of intonation, discrimination of sentences differing in linguistic accentuation and emotional expressiveness was evaluated by functional magnetic resonance imaging. Both tasks yielded rightward lateralization of hemodynamic responses at the level of the dorsolateral frontal cortex as well as bilateral thalamic and temporal activation. Processing of linguistic and affective intonation, thus, seems to be supported by overlapping neural networks comprising partially right-sided brain regions. Comparison of hemodynamic activation during the two different tasks, however, revealed bilateral orbito-frontal responses restricted to the affective condition as opposed to activation of the left lateral inferior frontal gyrus confined to evaluation of linguistic intonation. These findings indicate that distinct frontal regions contribute to higher level processing of intonational information depending on its communicational function. In line with other components of language processing, discrimination of linguistic accentuation seems to be lateralized to the left inferior-lateral frontal region whereas bilateral orbito-frontal areas subserve evaluation of emotional expressiveness.
Development and validation of a facial expression database based on the dimensional and categorical model of emotions.

PubMed

Fujimura, Tomomi; Umemura, Hiroyuki

2018-01-15

The present study describes the development and validation of a facial expression database comprising five different horizontal face angles in dynamic and static presentations. The database includes twelve expression types portrayed by eight Japanese models. This database was inspired by the dimensional and categorical model of emotions: surprise, fear, sadness, anger with open mouth, anger with closed mouth, disgust with open mouth, disgust with closed mouth, excitement, happiness, relaxation, sleepiness, and neutral (static only). The expressions were validated using emotion classification and Affect Grid rating tasks [Russell, Weiss, & Mendelsohn, 1989. Affect Grid: A single-item scale of pleasure and arousal. Journal of Personality and Social Psychology, 57(3), 493-502]. The results indicate that most of the expressions were recognised as the intended emotions and could systematically represent affective valence and arousal. Furthermore, face angle and facial motion information influenced emotion classification and valence and arousal ratings. Our database will be available online at the following URL. https://www.dh.aist.go.jp/database/face2017/ .
Alterations in attention capture to auditory emotional stimuli in job burnout: an event-related potential study.

PubMed

Sokka, Laura; Huotilainen, Minna; Leinikka, Marianne; Korpela, Jussi; Henelius, Andreas; Alain, Claude; Müller, Kiti; Pakarinen, Satu

2014-12-01

Job burnout is a significant cause of work absenteeism. Evidence from behavioral studies and patient reports suggests that job burnout is associated with impairments of attention and decreased working capacity, and it has overlapping elements with depression, anxiety and sleep disturbances. Here, we examined the electrophysiological correlates of automatic sound change detection and involuntary attention allocation in job burnout using scalp recordings of event-related potentials (ERP). Volunteers with job burnout symptoms but without severe depression and anxiety disorders and their non-burnout controls were presented with natural speech sound stimuli (standard and nine deviants), as well as three rarely occurring speech sounds with strong emotional prosody. All stimuli elicited mismatch negativity (MMN) responses that were comparable in both groups. The groups differed with respect to the P3a, an ERP component reflecting involuntary shift of attention: job burnout group showed a shorter P3a latency in response to the emotionally negative stimulus, and a longer latency in response to the positive stimulus. Results indicate that in job burnout, automatic speech sound discrimination is intact, but there is an attention capture tendency that is faster for negative, and slower to positive information compared to that of controls. Copyright © 2014 Elsevier B.V. All rights reserved.
Acoustic resonance at the dawn of life: musical fundamentals of the psychoanalytic relationship.

PubMed

Pickering, Judith

2015-11-01

This paper uses a case vignette to show how musical elements of speech are a crucial source of information regarding the patient's emotional states and associated memory systems that are activated at a given moment in the analytic field. There are specific psychoacoustic markers associated with different memory systems which indicate whether a patient is immersed in a state of creative intersubjective relatedness related to autobiographical memory, or has been triggered into a traumatic memory system. When a patient feels immersed in an atmosphere of intersubjective mutuality, dialogue features a rhythmical and tuneful form of speech featuring improvized reciprocal imitation, theme and variation. When the patient is catapulted into a traumatic memory system, speech becomes monotone and disjointed. Awareness of such acoustic features of the traumatic memory system helps to alert the analyst that such a shift has taken place informing appropriate responses and interventions. Communicative musicality (Malloch & Trevarthen 2009) originates in the earliest non-verbal vocal communication between infant and care-giver, states of primary intersubjectivity. Such musicality continues to be the primary vehicle for transmitting emotional meaning and for integrating right and left hemispheres. This enables communication that expresses emotional significance, personal value as well as conceptual reasoning. © 2015, The Society of Analytical Psychology.
Depression-related difficulties disengaging from negative faces are associated with sustained attention to negative feedback during social evaluation and predict stress recovery

PubMed Central

Romero, Nuria; De Raedt, Rudi

2017-01-01

The present study aimed to clarify: 1) the presence of depression-related attention bias related to a social stressor, 2) its association with depression-related attention biases as measured under standard conditions, and 3) their association with impaired stress recovery in depression. A sample of 39 participants reporting a broad range of depression levels completed a standard eye-tracking paradigm in which they had to engage/disengage their gaze with/from emotional faces. Participants then underwent a stress induction (i.e., giving a speech), in which their eye movements to false emotional feedback were measured, and stress reactivity and recovery were assessed. Depression level was associated with longer times to engage/disengage attention with/from negative faces under standard conditions and with sustained attention to negative feedback during the speech. These depression-related biases were associated and mediated the association between depression level and self-reported stress recovery, predicting lower recovery from stress after giving the speech. PMID:28362826
28 CFR 41.31 - Handicapped person.

Code of Federal Regulations, 2012 CFR

2012-07-01

... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
28 CFR 41.31 - Handicapped person.

Code of Federal Regulations, 2013 CFR

2013-07-01

... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
28 CFR 41.31 - Handicapped person.

Code of Federal Regulations, 2014 CFR

2014-07-01

... diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional...
Visual gut punch: persuasion, emotion, and the constitutional meaning of graphic disclosure.

PubMed

Goodman, Ellen P

2014-01-01

The ability of government to "nudge" with information mandates, or merely to inform consumers of risks, is circumscribed by First Amendment interests that have been poorly articulated. New graphic cigarette warning labels supplied courts with the first opportunity to assess the informational interests attending novel forms of product disclosures. The D.C. Circuit enjoined them as unconstitutional, compelled by a narrative that the graphic labels converted government from objective informer to ideological persuader, shouting its warning to manipulate consumer decisions. This interpretation will leave little room for graphic disclosure and is already being used to challenge textual disclosure requirements (such as county-of-origin labeling) as unconstitutional. Graphic warning and the increasing reliance on regulation-by-disclosure present new free speech quandaries related to consumer autonomy, state normativity, and speaker liberty. This Article examines the distinct goals of product disclosure requirements and how those goals may serve to vindicate, or to frustrate, listener interests. I argue that many disclosures, and especially warnings, are necessarily both normative and informative, expressing value along with fact. It is not the existence of a norm that raises constitutional concern but rather the insistence on a controversial norm. Turning to the means of disclosure, this Article examines how emotional and graphic communication might change the constitutional calculus. Using autonomy theory and the communications research on speech processing, I conclude that disclosures do not bypass reason simply by reaching for the heart. If large graphic labels are unconstitutional, it will be because of undue burden on the speaker, not because they are emotionally powerful. This Article makes the following distinct contributions to the compelled commercial speech literature: critiques the leading precedent, Zauderer v. Office of Disciplinary Counsel, from a consumer autonomy standpoint; brings to bear empirical communications research on questions of facticity and rationality in emotional and graphic communications; and teases apart and distinguishes among various free speech dangers and contributions of commercial disclosure mandates with a view towards informing policy, law, and research.
Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

NASA Astrophysics Data System (ADS)

Dat, Tran Huy; Takeda, Kazuya; Itakura, Fumitada

We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.
Effects of human fatigue on speech signals

NASA Astrophysics Data System (ADS)

Stamoulis, Catherine

2004-05-01

Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.
Automatic evaluation of hypernasality based on a cleft palate speech database.

PubMed

He, Ling; Zhang, Jing; Liu, Qi; Yin, Heng; Lech, Margaret; Huang, Yunzhi

2015-05-01

The hypernasality is one of the most typical characteristics of cleft palate (CP) speech. The evaluation outcome of hypernasality grading decides the necessity of follow-up surgery. Currently, the evaluation of CP speech is carried out by experienced speech therapists. However, the result strongly depends on their clinical experience and subjective judgment. This work aims to propose an automatic evaluation system for hypernasality grading in CP speech. The database tested in this work is collected by the Hospital of Stomatology, Sichuan University, which has the largest number of CP patients in China. Based on the production process of hypernasality, source sound pulse and vocal tract filter features are presented. These features include pitch, the first and second energy amplified frequency bands, cepstrum based features, MFCC, short-time energy in the sub-bands features. These features combined with KNN classier are applied to automatically classify four grades of hypernasality: normal, mild, moderate and severe. The experiment results show that the proposed system achieves a good performance. The classification rates for four hypernasality grades reach up to 80.4%. The sensitivity of proposed features to the gender is also discussed.
Longitudinal Patterns of Behaviour Problems in Children with Specific Speech and Language Difficulties: Child and Contextual Factors

ERIC Educational Resources Information Center

Lindsay, Geoff; Dockrell, Julie E.; Strand, Steve

2007-01-01

Background: The purpose of this study was to examine the stability of behavioural, emotional and social difficulties (BESD) in children with specific speech and language difficulties (SSLD), and the relationship between BESD and the language ability. Methods: A sample of children with SSLD were assessed for BESD at ages 8, 10 and 12 years by both…
The EpiSLI Database: A Publicly Available Database on Speech and Language

ERIC Educational Resources Information Center

Tomblin, J. Bruce

2010-01-01

Purpose: This article describes a database that was created in the process of conducting a large-scale epidemiologic study of specific language impairment (SLI). As such, this database will be referred to as the EpiSLI database. Children with SLI have unexpected and unexplained difficulties learning and using spoken language. Although there is no…
Spontaneous regulation of emotions in preschool children who stutter: preliminary findings.

PubMed

Johnson, Kia N; Walden, Tedra A; Conture, Edward G; Karrass, Jan

2010-12-01

Emotional regulation of preschool children who stutter (CWS) and children who do not stutter (CWNS) was assessed through use of a disappointing gift (DG) procedure (P. M. Cole, 1986; C. Saarni, 1984, 1992). Participants consisted of 16 CWS and CWNS (11 boys and 5 girls in each talker group) who were 3 to 5 years of age. After assessing each child's knowledge of display rules about socially appropriate expression of emotions, the authors asked the children to participate in a DG procedure. The children received a desirable gift preceding the first free-play task and a disappointing gift preceding a second free-play task. Dependent variables consisted of participants' positive and negative expressive nonverbal behaviors exhibited during receipt of a desirable gift and disappointing gift as well as conversational speech disfluencies exhibited following receipt of each gift. Findings indicated that CWS and CWNS exhibited no significant differences in amount of positive emotional expressions after receiving the desired gift; however, CWS--when compared with CWNS--exhibited more negative emotional expressions after receiving the undesirable gift. Furthermore, CWS were more disfluent after receiving the desired gift than after receiving the disappointing gift. Ancillary findings also indicated that CWS and CWNS had equivalent knowledge of display rules. Findings suggest that efforts to concurrently regulate emotional behaviors and that speech disfluencies may be problematic for preschool-age CWS.
How Conceptual Frameworks Influence Discovery and Depictions of Emotions in Clinical Relationships

ERIC Educational Resources Information Center

Duchan, Judith Felson

2011-01-01

Although emotions are often seen as key to maintaining rapport between speech-language pathologists and their clients, they are often neglected in the research and clinical literature. This neglect, it is argued here, comes in part from the inadequacies of prevailing conceptual frameworks used to govern practices. I aim to show how six such…
Disentangling Child and Family Influences on Maternal Expressed Emotion toward Children with Attention-Deficit/Hyperactivity Disorder

ERIC Educational Resources Information Center

Cartwright, Kim L.; Bitsakou, Paraskevi; Daley, David; Gramzow, Richard H.; Psychogiou, Lamprini; Simonoff, Emily; Thompson, Margaret J.; Sonuga-Barke, Edmund J. S.

2011-01-01

Objective: We used multi-level modelling of sibling-pair data to disentangle the influence of proband-specific and more general family influences on maternal expressed emotion (MEE) toward children and adolescents with attention-deficit/hyperactivity disorder (ADHD). Method: MEE was measured using the Five Minute Speech Sample (FMSS) for 60…
Objective measurement of motor speech characteristics in the healthy pediatric population.

PubMed

Wong, A W; Allegro, J; Tirado, Y; Chadha, N; Campisi, P

2011-12-01

To obtain objective measurements of motor speech characteristics in normal children, using a computer-based motor speech software program. Cross-sectional, observational design in a university-based ambulatory pediatric otolaryngology clinic. Participants included 112 subjects (54 females and 58 males) aged 4-18 years. Participants with previously diagnosed hearing loss, voice and motor disorders, and children unable to repeat a passage in English were excluded. Voice samples were recorded and analysed using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ). The MSP produced measures of diadochokinetics, second formant transition, intonation, and syllabic rates. Demographic data, including sex, age, and cigarette smoke exposure were obtained. Normative data for several motor speech characteristics were derived for children ranging from age 4 to 18 years. A number of age-dependent changes were indentified, including an increase in average diadochokinetic rate (p<0.001) and standard syllabic duration (p<0.001) with age. There were no identified differences in motor speech characteristics between males and females across the measured age range. Variations in fundamental frequency (Fo) during speech did not change significantly with age for both males and females. To our knowledge, this is the first pediatric normative database for the MSP progam. The MSP is suitable for testing children and can be used to study developmental changes in motor speech. The analysis demonstrated that males and females behave similarly and show the same relationship with age for the motor speech characteristics studied. This normative database will provide essential comparative data for future studies exploring alterations in motor speech that may occur with hearing, voice, and motor disorders and to assess the results of targeted therapies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

Strategies for Teachers to Manage Stuttering in the Classroom: A Call for Research.

PubMed

Davidow, Jason H; Zaroogian, Lisa; Garcia-Barrera, Mauricio A

2016-10-01

This clinical focus article highlights the need for future research involving ways to assist children who stutter in the classroom. The 4 most commonly recommended strategies for teachers were found via searches of electronic databases and personal libraries of the authors. The peer-reviewed evidence for each recommendation was subsequently located and detailed. There are varying amounts of evidence for the 4 recommended teacher strategies outside of the classroom, but there are no data for 2 of the strategies, and minimal data for the others, in a classroom setting. That is, there is virtually no evidence regarding whether or not the actions put forth influence, for example, stuttering frequency, stuttering severity, participation, or the social, emotional, and cognitive components of stuttering in the classroom. There is a need for researchers and speech-language pathologists in the schools to study the outcomes of teacher strategies in the classroom for children who stutter.
Speech perception and production in severe environments

NASA Astrophysics Data System (ADS)

Pisoni, David B.

1990-09-01

The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.
A new feature constituting approach to detection of vocal fold pathology

NASA Astrophysics Data System (ADS)

Hariharan, M.; Polat, Kemal; Yaacob, Sazali

2014-08-01

In the last two decades, non-invasive methods through acoustic analysis of voice signal have been proved to be excellent and reliable tool to diagnose vocal fold pathologies. This paper proposes a new feature vector based on the wavelet packet transform and singular value decomposition for the detection of vocal fold pathology. k-means clustering based feature weighting is proposed to increase the distinguishing performance of the proposed features. In this work, two databases Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database and MAPACI speech pathology database are used. Four different supervised classifiers such as k-nearest neighbour (k-NN), least-square support vector machine, probabilistic neural network and general regression neural network are employed for testing the proposed features. The experimental results uncover that the proposed features give very promising classification accuracy of 100% for both MEEI database and MAPACI speech pathology database.
Multilingual vocal emotion recognition and classification using back propagation neural network

NASA Astrophysics Data System (ADS)

Kayal, Apoorva J.; Nirmal, Jagannath

2016-03-01

This work implements classification of different emotions in different languages using Artificial Neural Networks (ANN). Mel Frequency Cepstral Coefficients (MFCC) and Short Term Energy (STE) have been considered for creation of feature set. An emotional speech corpus consisting of 30 acted utterances per emotion has been developed. The emotions portrayed in this work are Anger, Joy and Neutral in each of English, Marathi and Hindi languages. Different configurations of Artificial Neural Networks have been employed for classification purposes. The performance of the classifiers has been evaluated by False Negative Rate (FNR), False Positive Rate (FPR), True Positive Rate (TPR) and True Negative Rate (TNR).
[Restoration of speech function in oncological patients with maxillary defects].

PubMed

Matiakin, E G; Chuchkov, V M; Akhundov, A A; Azizian, R I; Romanov, I S; Chuchkov, M V; Agapov, V V

2009-01-01

Speech quality was evaluated in 188 patients with acquired maxillary defects. Prosthetic treatment of 29 patients was preceded by pharmacopsychotherapy. Sixty three patients had lessons with a logopedist and 66 practiced self-tuition based on the specially developed test. Thirty patients were examined for the quality of speech without preliminary preparation. Speech quality was assessed by auditory and spectral analysis. The main forms of impaired speech quality in the patients with maxillary defects were marked rhinophonia and impaired articulation. The proposed analytical tests were based on a combination of "difficult" vowels and consonants. The use of a removable prostheses with an obturator failed to correct the affected speech function but created prerequisites for the formation of the correct speech stereotype. Results of the study suggest the relationship between the quality of speech in subjects with maxillary defects and their intellectual faculties as well as the desire to overcome this drawback. The proposed tests are designed to activate the neuromuscular apparatus responsible for the generation of the speech. Lessons with a speech therapist give a powerful emotional incentive to the patients and promote their efforts toward restoration of speaking ability. Pharmacopsychotherapy and self-control are another efficacious tools for the improvement of speech quality in patients with maxillary defects.
Relations between affective music and speech: evidence from dynamics of affective piano performance and speech production.

PubMed

Liu, Xiaoluan; Xu, Yi

2015-01-01

This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role.
Relations between affective music and speech: evidence from dynamics of affective piano performance and speech production

PubMed Central

Liu, Xiaoluan; Xu, Yi

2015-01-01

This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role. PMID:26217252
Pilot Workload and Speech Analysis: A Preliminary Investigation

NASA Technical Reports Server (NTRS)

Bittner, Rachel M.; Begault, Durand R.; Christopher, Bonny R.

2013-01-01

Prior research has questioned the effectiveness of speech analysis to measure the stress, workload, truthfulness, or emotional state of a talker. The question remains regarding the utility of speech analysis for restricted vocabularies such as those used in aviation communications. A part-task experiment was conducted in which participants performed Air Traffic Control read-backs in different workload environments. Participant's subjective workload and the speech qualities of fundamental frequency (F0) and articulation rate were evaluated. A significant increase in subjective workload rating was found for high workload segments. F0 was found to be significantly higher during high workload while articulation rates were found to be significantly slower. No correlation was found to exist between subjective workload and F0 or articulation rate.
Subjective comparison and evaluation of speech enhancement algorithms

PubMed Central

Hu, Yi; Loizou, Philipos C.

2007-01-01

Making meaningful comparisons between the performance of the various speech enhancement algorithms proposed over the years, has been elusive due to lack of a common speech database, differences in the types of noise used and differences in the testing methodology. To facilitate such comparisons, we report on the development of a noisy speech corpus suitable for evaluation of speech enhancement algorithms. This corpus is subsequently used for the subjective evaluation of 13 speech enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. The subjective evaluation was performed by Dynastat, Inc. using the ITU-T P.835 methodology designed to evaluate the speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests. PMID:18046463
Speech Disfluency-dependent Amygdala Activity in Adults Who Stutter: Neuroimaging of Interpersonal Communication in MRI Scanner Environment.

PubMed

Toyomura, Akira; Fujii, Tetsunoshin; Yokosawa, Koichi; Kuriki, Shinya

2018-03-15

Affective states, such as anticipatory anxiety, critically influence speech communication behavior in adults who stutter. However, there is currently little evidence regarding the involvement of the limbic system in speech disfluency during interpersonal communication. We designed this neuroimaging study and experimental procedure to sample neural activity during interpersonal communication between human participants, and to investigate the relationship between the amygdala activity and speech disfluency. Participants were required to engage in live communication with a stranger of the opposite sex in the MRI scanner environment. In the gaze condition, the stranger gazed at the participant without speaking, while in the live conversation condition, the stranger asked questions that the participant was required to answer. The stranger continued to gaze silently at the participant while the participant answered. Adults who stutter reported significantly higher discomfort than fluent controls during the experiment. Activity in the right amygdala, a key anatomical region in the limbic system involved in emotion, was significantly correlated with stuttering occurrences in adults who stutter. Right amygdala activity from pooled data of all participants also showed a significant correlation with discomfort level during the experiment. Activity in the prefrontal cortex, which forms emotion regulation neural circuitry with the amygdala, was decreased in adults who stutter than in fluent controls. This is the first study to demonstrate that amygdala activity during interpersonal communication is involved in disfluent speech in adults who stutter. Copyright © 2018 IBRO. Published by Elsevier Ltd. All rights reserved.
Connected word recognition using a cascaded neuro-computational model

NASA Astrophysics Data System (ADS)

Hoya, Tetsuya; van Leeuwen, Cees

2016-10-01

We propose a novel framework for processing a continuous speech stream that contains a varying number of words, as well as non-speech periods. Speech samples are segmented into word-tokens and non-speech periods. An augmented version of an earlier-proposed, cascaded neuro-computational model is used for recognising individual words within the stream. Simulation studies using both a multi-speaker-dependent and speaker-independent digit string database show that the proposed method yields a recognition performance comparable to that obtained by a benchmark approach using hidden Markov models with embedded training.
Atypical speech versus non-speech detection and discrimination in 4- to 6- yr old children with autism spectrum disorder: An ERP study.

PubMed

Galilee, Alena; Stefanidou, Chrysi; McCleery, Joseph P

2017-01-01

Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6-year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age.
Atypical speech versus non-speech detection and discrimination in 4- to 6- yr old children with autism spectrum disorder: An ERP study

PubMed Central

Stefanidou, Chrysi; McCleery, Joseph P.

2017-01-01

Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6—year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age. PMID:28738063
Basic Emotions in the Nencki Affective Word List (NAWL BE): New Method of Classifying Emotional Stimuli.

PubMed

Wierzba, Małgorzata; Riegel, Monika; Wypych, Marek; Jednoróg, Katarzyna; Turnau, Paweł; Grabowska, Anna; Marchewka, Artur

2015-01-01

The Nencki Affective Word List (NAWL) has recently been introduced as a standardized database of Polish words suitable for studying various aspects of language and emotions. Though the NAWL was originally based on the most commonly used dimensional approach, it is not the only way of studying emotions. Another framework is based on discrete emotional categories. Since the two perspectives are recognized as complementary, the aim of the present study was to supplement the NAWL database by the addition of categories corresponding to basic emotions. Thus, 2902 Polish words from the NAWL were presented to 265 subjects, who were instructed to rate them according to the intensity of each of the five basic emotions: happiness, anger, sadness, fear and disgust. The general characteristics of the present word database, as well as the relationships between the studied variables are shown to be consistent with typical patterns found in previous studies using similar databases for different languages. Here we present the Basic Emotions in the Nencki Affective Word List (NAWL BE) as a database of verbal material suitable for highly controlled experimental research. To make the NAWL more convenient to use, we introduce a comprehensive method of classifying stimuli to basic emotion categories. We discuss the advantages of our method in comparison to other methods of classification. Additionally, we provide an interactive online tool (http://exp.lobi.nencki.gov.pl/nawl-analysis) to help researchers browse and interactively generate classes of stimuli to meet their specific requirements.
Basic Emotions in the Nencki Affective Word List (NAWL BE): New Method of Classifying Emotional Stimuli

PubMed Central

Wierzba, Małgorzata; Riegel, Monika; Wypych, Marek; Jednoróg, Katarzyna; Turnau, Paweł; Grabowska, Anna; Marchewka, Artur

2015-01-01

The Nencki Affective Word List (NAWL) has recently been introduced as a standardized database of Polish words suitable for studying various aspects of language and emotions. Though the NAWL was originally based on the most commonly used dimensional approach, it is not the only way of studying emotions. Another framework is based on discrete emotional categories. Since the two perspectives are recognized as complementary, the aim of the present study was to supplement the NAWL database by the addition of categories corresponding to basic emotions. Thus, 2902 Polish words from the NAWL were presented to 265 subjects, who were instructed to rate them according to the intensity of each of the five basic emotions: happiness, anger, sadness, fear and disgust. The general characteristics of the present word database, as well as the relationships between the studied variables are shown to be consistent with typical patterns found in previous studies using similar databases for different languages. Here we present the Basic Emotions in the Nencki Affective Word List (NAWL BE) as a database of verbal material suitable for highly controlled experimental research. To make the NAWL more convenient to use, we introduce a comprehensive method of classifying stimuli to basic emotion categories. We discuss the advantages of our method in comparison to other methods of classification. Additionally, we provide an interactive online tool (http://exp.lobi.nencki.gov.pl/nawl-analysis) to help researchers browse and interactively generate classes of stimuli to meet their specific requirements. PMID:26148193
Numerical expression of color emotion and its application

NASA Astrophysics Data System (ADS)

Sato, Tetsuya; Kajiwara, Kanji; Xin, John H.; Hansuebsai, Aran; Nobbs, Jim

2002-06-01

Human emotions induced by colors are various but the emotions are expressed through words and languages. In order to analyze the emotions expressed through words and languages, visual assessment tests against color emotions expressed by twelve kinds of word pairs were carried out in Japan, Thailand, Hong Kong and UK. The numerical expression of each color emotion is being tried as a formula with an ellipsoid-shape resembling that of a color difference formula. In this paper, the numerical expression of 'Soft- Hard' color emotion was mainly discussed. The application of color emotions via the empirical color emotions formulae derived from kansei database (database of sensory assessments) was also briefly reported.
The musician effect: does it persist under degraded pitch conditions of cochlear implant simulations?

PubMed Central

Fuller, Christina D.; Galvin, John J.; Maat, Bert; Free, Rolien H.; Başkent, Deniz

2014-01-01

Cochlear implants (CIs) are auditory prostheses that restore hearing via electrical stimulation of the auditory nerve. Compared to normal acoustic hearing, sounds transmitted through the CI are spectro-temporally degraded, causing difficulties in challenging listening tasks such as speech intelligibility in noise and perception of music. In normal hearing (NH), musicians have been shown to better perform than non-musicians in auditory processing and perception, especially for challenging listening tasks. This “musician effect” was attributed to better processing of pitch cues, as well as better overall auditory cognitive functioning in musicians. Does the musician effect persist when pitch cues are degraded, as it would be in signals transmitted through a CI? To answer this question, NH musicians and non-musicians were tested while listening to unprocessed signals or to signals processed by an acoustic CI simulation. The task increasingly depended on pitch perception: (1) speech intelligibility (words and sentences) in quiet or in noise, (2) vocal emotion identification, and (3) melodic contour identification (MCI). For speech perception, there was no musician effect with the unprocessed stimuli, and a small musician effect only for word identification in one noise condition, in the CI simulation. For emotion identification, there was a small musician effect for both. For MCI, there was a large musician effect for both. Overall, the effect was stronger as the importance of pitch in the listening task increased. This suggests that the musician effect may be more rooted in pitch perception, rather than in a global advantage in cognitive processing (in which musicians would have performed better in all tasks). The results further suggest that musical training before (and possibly after) implantation might offer some advantage in pitch processing that could partially benefit speech perception, and more strongly emotion and music perception. PMID:25071428
Assistive Technology and Adults with Learning Disabilities: A Blueprint for Exploration and Advancement.

ERIC Educational Resources Information Center

Raskind, Marshall

1993-01-01

This article describes assistive technologies for persons with learning disabilities, including word processing, spell checking, proofreading programs, outlining/"brainstorming" programs, abbreviation expanders, speech recognition, speech synthesis/screen review, optical character recognition systems, personal data managers, free-form databases,…
Perception of emotionally loaded vocal expressions and its connection to responses to music. A cross-cultural investigation: Estonia, Finland, Sweden, Russia, and the USA

PubMed Central

Waaramaa, Teija; Leisiö, Timo

2013-01-01

The present study focused on voice quality and the perception of the basic emotions from speech samples in cross-cultural conditions. It was examined whether voice quality, cultural, or language background, age, or gender were related to the identification of the emotions. Professional actors (n2) and actresses (n2) produced non-sense sentences (n32) and protracted vowels (n8) expressing the six basic emotions, interest, and a neutral emotional state. The impact of musical interests on the ability to distinguish between emotions or valence (on an axis positivity – neutrality – negativity) from voice samples was studied. Listening tests were conducted on location in five countries: Estonia, Finland, Russia, Sweden, and the USA with 50 randomly chosen participants (25 males and 25 females) in each country. The participants (total N = 250) completed a questionnaire eliciting their background information and musical interests. The responses in the listening test and the questionnaires were statistically analyzed. Voice quality parameters and the share of the emotions and valence identified correlated significantly with each other for both genders. The percentage of emotions and valence identified was clearly above the chance level in each of the five countries studied, however, the countries differed significantly from each other for the identified emotions and the gender of the speaker. The samples produced by females were identified significantly better than those produced by males. Listener's age was a significant variable. Only minor gender differences were found for the identification. Perceptual confusion in the listening test between emotions seemed to be dependent on their similar voice production types. Musical interests tended to have a positive effect on the identification of the emotions. The results also suggest that identifying emotions from speech samples may be easier for those listeners who share a similar language or cultural background with the speaker. PMID:23801972
Unpacking the psychological weight of weight stigma: A rejection-expectation pathway

PubMed Central

Blodorn, Alison; Major, Brenda; Hunger, Jeffrey; Miller, Carol

2015-01-01

The present research tested the hypothesis that the negative effects of weight stigma among higher body-weight individuals are mediated by expectations of social rejection. Women and men who varied in objective body-weight (body mass index; BMI) gave a speech describing why they would make a good date. Half believed that a potential dating partner would see a videotape of their speech (weight seen) and half believed that a potential dating partner would listen to an audiotape of their speech (weight unseen). Among women, but not men, higher body-weight predicted increased expectations of social rejection, decreased executive control resources, decreased self-esteem, increased self-conscious emotions and behavioral displays of self-consciousness when weight was seen but not when weight was unseen. As predicted, higher body-weight women reported increased expectations of social rejection when weight was seen (versus unseen), which in turn predicted decreased self-esteem, increased self-conscious emotions, and increased stress. In contrast, lower body-weight women reported decreased expectations of social rejection when weight was seen (versus unseen), which in turn predicted increased self-esteem, decreased self-conscious emotions, and decreased stress. Men’s responses were largely unaffected by body-weight or visibility, suggesting that a dating context may not be identity threatening for higher body-weight men. Overall, the present research illuminates a rejection-expectation pathway by which weight stigma undermines higher body-weight women’s health. PMID:26752792

Higher order statistical analysis of /x/ in male speech.

PubMed

Orr, M C; Lithgow, B

2005-03-01

This paper presents a study of kurtosis analysis for the sound /x/ in male speech, /x/ is the sound of the 'o' at the end of words such as 'ago'. The sound analysed for this paper came from the Australian National Database of Spoken Language, more specifically the male speaker 17. The /x/ was isolated and extracted from the database by the author in a quiet booth using standard multimedia software. A 5 millisecond window was used for the analysis as it was shown previously by the author to be the most appropriate size for speech phoneme analysis. The significance of the research presented here is shown in the results where a majority of coefficients had a platykurtic (kurtosis between 0 and 3) value as opposed to the previously held leptokurtic (kurtosis > 3) belief.
Bilingualism and Children's Use of Paralinguistic Cues to Interpret Emotion in Speech

ERIC Educational Resources Information Center

Yow, W. Quin; Markman, Ellen M.

2011-01-01

Preschoolers tend to rely on what speakers say rather than how they sound when interpreting a speaker's emotion while adults rely instead on tone of voice. However, children who have a greater need to attend to speakers' communicative requirements, such as bilingual children, may be more adept in using paralinguistic cues (e.g. tone of voice) when…
Social-Emotional Challenges Experienced by Students Who Function with Mild and Moderate Hearing Loss in Educational Settings

ERIC Educational Resources Information Center

Dalton, C. J.

2011-01-01

Mild or moderate hearing loss (MMHL) is a communication disability that impacts speech and language development and academic performance. Students with MMHL also have threats to their social-emotional well-being and self-identity formation, and are at risk for psychosocial deficits related to cognitive fatigue, isolation, and bullying. While the…
Intonation Features of the Expression of Emotions in Spanish: Preliminary Study for a Prosody Assessment Procedure

ERIC Educational Resources Information Center

Martinez-Castilla, Pastora; Peppe, Susan

2008-01-01

This study aimed to find out what intonation features reliably represent the emotions of "liking" as opposed to "disliking" in the Spanish language, with a view to designing a prosody assessment procedure for use with children with speech and language disorders. 18 intonationally different prosodic realisations (tokens) of one word (limon) were…
The varieties of speech to young children.

PubMed

Huttenlocher, Janellen; Vasilyeva, Marina; Waterfall, Heidi R; Vevea, Jack L; Hedges, Larry V

2007-09-01

This article examines caregiver speech to young children. The authors obtained several measures of the speech used to children during early language development (14-30 months). For all measures, they found substantial variation across individuals and subgroups. Speech patterns vary with caregiver education, and the differences are maintained over time. While there are distinct levels of complexity for different caregivers, there is a common pattern of increase across age within the range that characterizes each educational group. Thus, caregiver speech exhibits both long-standing patterns of linguistic behavior and adjustment for the interlocutor. This information about the variability of speech by individual caregivers provides a framework for systematic study of the role of input in language acquisition. PsycINFO Database Record (c) 2007 APA, all rights reserved
In search of the emotional face: anger versus happiness superiority in visual search.

PubMed

Savage, Ruth A; Lipp, Ottmar V; Craig, Belinda M; Becker, Stefanie I; Horstmann, Gernot

2013-08-01

Previous research has provided inconsistent results regarding visual search for emotional faces, yielding evidence for either anger superiority (i.e., more efficient search for angry faces) or happiness superiority effects (i.e., more efficient search for happy faces), suggesting that these results do not reflect on emotional expression, but on emotion (un-)related low-level perceptual features. The present study investigated possible factors mediating anger/happiness superiority effects; specifically search strategy (fixed vs. variable target search; Experiment 1), stimulus choice (Nimstim database vs. Ekman & Friesen database; Experiments 1 and 2), and emotional intensity (Experiment 3 and 3a). Angry faces were found faster than happy faces regardless of search strategy using faces from the Nimstim database (Experiment 1). By contrast, a happiness superiority effect was evident in Experiment 2 when using faces from the Ekman and Friesen database. Experiment 3 employed angry, happy, and exuberant expressions (Nimstim database) and yielded anger and happiness superiority effects, respectively, highlighting the importance of the choice of stimulus materials. Ratings of the stimulus materials collected in Experiment 3a indicate that differences in perceived emotional intensity, pleasantness, or arousal do not account for differences in search efficiency. Across three studies, the current investigation indicates that prior reports of anger or happiness superiority effects in visual search are likely to reflect on low-level visual features associated with the stimulus materials used, rather than on emotion. PsycINFO Database Record (c) 2013 APA, all rights reserved.
An Analysis of The Parameters Used In Speech ABR Assessment Protocols.

PubMed

Sanfins, Milaine D; Hatzopoulos, Stavros; Donadon, Caroline; Diniz, Thais A; Borges, Leticia R; Skarzynski, Piotr H; Colella-Santos, Maria Francisca

2018-04-01

The aim of this study was to assess the parameters of choice, such as duration, intensity, rate, polarity, number of sweeps, window length, stimulated ear, fundamental frequency, first formant, and second formant, from previously published speech ABR studies. To identify candidate articles, five databases were assessed using the following keyword descriptors: speech ABR, ABR-speech, speech auditory brainstem response, auditory evoked potential to speech, speech-evoked brainstem response, and complex sounds. The search identified 1288 articles published between 2005 and 2015. After filtering the total number of papers according to the inclusion and exclusion criteria, 21 studies were selected. Analyzing the protocol details used in 21 studies suggested that there is no consensus to date on a speech-ABR protocol and that the parameters of analysis used are quite variable between studies. This inhibits the wider generalization and extrapolation of data across languages and studies.
Lions, tigers, and bears, oh sh!t: Semantics versus tabooness in speech production.

PubMed

White, Katherine K; Abrams, Lise; Koehler, Sarah M; Collins, Richard J

2017-04-01

While both semantic and highly emotional (i.e., taboo) words can interfere with speech production, different theoretical mechanisms have been proposed to explain why interference occurs. Two experiments investigated these theoretical approaches by comparing the magnitude of these two types of interference and the stages at which they occur during picture naming. Participants named target pictures superimposed with semantic, taboo, or unrelated distractor words that were presented at three different stimulus-onset asynchronies (SOA): -150 ms, 0 ms, or +150 ms. In addition, the duration of distractor presentation was manipulated across experiments, with distractors appearing for the duration of the picture (Experiment 1) or for 350 ms (Experiment 2). Taboo distractors interfered more than semantic distractors, i.e., slowed target naming times, at all SOAs. While distractor duration had no effect on type of interference at -150 or 0 SOAs, briefly presented distractors eliminated semantic interference but not taboo interference at +150 SOA. Discussion focuses on how existing speech production theories can explain interference from emotional distractors and the unique role that attention may play in taboo interference.
Low-Arousal Speech Noise Improves Performance in N-Back Task: An ERP Study

PubMed Central

Zhang, Dandan; Jin, Yi; Luo, Yuejia

2013-01-01

The relationship between noise and human performance is a crucial topic in ergonomic research. However, the brain dynamics of the emotional arousal effects of background noises are still unclear. The current study employed meaningless speech noises in the n-back working memory task to explore the changes of event-related potentials (ERPs) elicited by the noises with low arousal level vs. high arousal level. We found that the memory performance in low arousal condition were improved compared with the silent and the high arousal conditions; participants responded more quickly and had larger P2 and P3 amplitudes in low arousal condition while the performance and ERP components showed no significant difference between high arousal and silent conditions. These findings suggested that the emotional arousal dimension of background noises had a significant influence on human working memory performance, and that this effect was independent of the acoustic characteristics of noises (e.g., intensity) and the meaning of speech materials. The current findings improve our understanding of background noise effects on human performance and lay the ground for the investigation of patients with attention deficits. PMID:24204607
A Joint Prosodic Origin of Language and Music

PubMed Central

Brown, Steven

2017-01-01

Vocal theories of the origin of language rarely make a case for the precursor functions that underlay the evolution of speech. The vocal expression of emotion is unquestionably the best candidate for such a precursor, although most evolutionary models of both language and speech ignore emotion and prosody altogether. I present here a model for a joint prosodic precursor of language and music in which ritualized group-level vocalizations served as the ancestral state. This precursor combined not only affective and intonational aspects of prosody, but also holistic and combinatorial mechanisms of phrase generation. From this common stage, there was a bifurcation to form language and music as separate, though homologous, specializations. This separation of language and music was accompanied by their (re)unification in songs with words. PMID:29163276
Speech Enhancement based on the Dominant Classification Between Speech and Noise Using Feature Data in Spectrogram of Observation Signal

NASA Astrophysics Data System (ADS)

Nomura, Yukihiro; Lu, Jianming; Sekiya, Hiroo; Yahagi, Takashi

This paper presents a speech enhancement using the classification between the dominants of speech and noise. In our system, a new classification scheme between the dominants of speech and noise is proposed. The proposed classifications use the standard deviation of the spectrum of observation signal in each band. We introduce two oversubtraction factors for the dominants of speech and noise, respectively. And spectral subtraction is carried out after the classification. The proposed method is tested on several noise types from the Noisex-92 database. From the investigation of segmental SNR, Itakura-Saito distance measure, inspection of spectrograms and listening tests, the proposed system is shown to be effective to reduce background noise. Moreover, the enhanced speech using our system generates less musical noise and distortion than that of conventional systems.
Parental depressive symptoms, children’s emotional and behavioural problems, and parents’ expressed emotion—Critical and positive comments

PubMed Central

Parry, Elizabeth; Nath, Selina; Kallitsoglou, Angeliki; Russell, Ginny

2017-01-01

This longitudinal study examined whether mothers’ and fathers’ depressive symptoms predict, independently and interactively, children’s emotional and behavioural problems. It also examined bi-directional associations between parents’ expressed emotion constituents (parents’ child-directed positive and critical comments) and children’s emotional and behavioural problems. At time 1, the sample consisted of 160 families in which 50 mothers and 40 fathers had depression according to the Structured Clinical Interview for DSM-IV. Children’s mean age at Time 1 was 3.9 years (SD = 0.8). Families (n = 106) were followed up approximately 16 months later (Time 2). Expressed emotion constituents were assessed using the Preschool Five Minute Speech Sample. In total, 144 mothers and 158 fathers at Time 1 and 93 mothers and 105 fathers at Time 2 provided speech samples. Fathers’ depressive symptoms were concurrently associated with more child emotional problems when mothers had higher levels of depressive symptoms. When controlling for important confounders (children’s gender, baseline problems, mothers’ depressive symptoms and parents’ education and age), fathers’ depressive symptoms independently predicted higher levels of emotional and behavioural problems in their children over time. There was limited evidence for a bi-directional relationship between fathers’ positive comments and change in children’s behavioural problems over time. Unexpectedly, there were no bi-directional associations between parents’ critical comments and children’s outcomes. We conclude that the study provides evidence to support a whole family approach to prevention and intervention strategies for children’s mental health and parental depression. PMID:29045440
Adaptation to Vocal Expressions Reveals Multistep Perception of Auditory Emotion

PubMed Central

Maurage, Pierre; Rouger, Julien; Latinus, Marianne; Belin, Pascal

2014-01-01

The human voice carries speech as well as important nonlinguistic signals that influence our social interactions. Among these cues that impact our behavior and communication with other people is the perceived emotional state of the speaker. A theoretical framework for the neural processing stages of emotional prosody has suggested that auditory emotion is perceived in multiple steps (Schirmer and Kotz, 2006) involving low-level auditory analysis and integration of the acoustic information followed by higher-level cognition. Empirical evidence for this multistep processing chain, however, is still sparse. We examined this question using functional magnetic resonance imaging and a continuous carry-over design (Aguirre, 2007) to measure brain activity while volunteers listened to non-speech-affective vocalizations morphed on a continuum between anger and fear. Analyses dissociated neuronal adaptation effects induced by similarity in perceived emotional content between consecutive stimuli from those induced by their acoustic similarity. We found that bilateral voice-sensitive auditory regions as well as right amygdala coded the physical difference between consecutive stimuli. In contrast, activity in bilateral anterior insulae, medial superior frontal cortex, precuneus, and subcortical regions such as bilateral hippocampi depended predominantly on the perceptual difference between morphs. Our results suggest that the processing of vocal affect recognition is a multistep process involving largely distinct neural networks. Amygdala and auditory areas predominantly code emotion-related acoustic information while more anterior insular and prefrontal regions respond to the abstract, cognitive representation of vocal affect. PMID:24920615
Social Anxiety-Linked Attention Bias to Threat Is Indirectly Related to Post-Event Processing Via Subjective Emotional Reactivity to Social Stress.

PubMed

Çek, Demet; Sánchez, Alvaro; Timpano, Kiara R

2016-05-01

Attention bias to threat (e.g., disgust faces) is a cognitive vulnerability factor for social anxiety occurring in early stages of information processing. Few studies have investigated the relationship between social anxiety and attention biases, in conjunction with emotional and cognitive responses to a social stressor. Elucidating these links would shed light on maintenance factors of social anxiety and could help identify malleable treatment targets. This study examined the associations between social anxiety level, attention bias to disgust (AB-disgust), subjective emotional and physiological reactivity to a social stressor, and subsequent post-event processing (PEP). We tested a mediational model where social anxiety level indirectly predicted subsequent PEP via its association with AB-disgust and immediate subjective emotional reactivity to social stress. Fifty-five undergraduates (45% female) completed a passive viewing task. Eye movements were tracked during the presentation of social stimuli (e.g., disgust faces) and used to calculate AB-disgust. Next, participants gave an impromptu speech in front of a video camera and watched a neutral video, followed by the completion of a PEP measure. Although there was no association between AB-disgust and physiological reactivity to the stressor, AB-disgust was significantly associated with greater subjective emotional reactivity from baseline to the speech. Analyses supported a partial mediation model where AB-disgust and subjective emotional reactivity to a social stressor partially accounted for the link between social anxiety levels and PEP. Copyright © 2016. Published by Elsevier Ltd.
Adaptation to vocal expressions reveals multistep perception of auditory emotion.

PubMed

Bestelmeyer, Patricia E G; Maurage, Pierre; Rouger, Julien; Latinus, Marianne; Belin, Pascal

2014-06-11

The human voice carries speech as well as important nonlinguistic signals that influence our social interactions. Among these cues that impact our behavior and communication with other people is the perceived emotional state of the speaker. A theoretical framework for the neural processing stages of emotional prosody has suggested that auditory emotion is perceived in multiple steps (Schirmer and Kotz, 2006) involving low-level auditory analysis and integration of the acoustic information followed by higher-level cognition. Empirical evidence for this multistep processing chain, however, is still sparse. We examined this question using functional magnetic resonance imaging and a continuous carry-over design (Aguirre, 2007) to measure brain activity while volunteers listened to non-speech-affective vocalizations morphed on a continuum between anger and fear. Analyses dissociated neuronal adaptation effects induced by similarity in perceived emotional content between consecutive stimuli from those induced by their acoustic similarity. We found that bilateral voice-sensitive auditory regions as well as right amygdala coded the physical difference between consecutive stimuli. In contrast, activity in bilateral anterior insulae, medial superior frontal cortex, precuneus, and subcortical regions such as bilateral hippocampi depended predominantly on the perceptual difference between morphs. Our results suggest that the processing of vocal affect recognition is a multistep process involving largely distinct neural networks. Amygdala and auditory areas predominantly code emotion-related acoustic information while more anterior insular and prefrontal regions respond to the abstract, cognitive representation of vocal affect. Copyright © 2014 Bestelmeyer et al.
Effect of Parkinson Disease on Emotion Perception Using the Persian Affective Voices Test.

PubMed

Saffarian, Arezoo; Shavaki, Yunes Amiri; Shahidi, Gholam Ali; Jafari, Zahra

2018-05-04

Emotion perception plays a major role in proper communication with people in different social interactions. Nonverbal affect bursts can be used to evaluate vocal emotion perception. The present study was a preliminary step to establishing the psychometric properties of the Persian version of the Montreal Affective Voices (MAV) test, as well as to investigate the effect of Parkinson disease (PD) on vocal emotion perception. The short, emotional sound made by pronouncing the vowel "a" in Persian was recorded by 22 actors and actresses to develop the Persian version of the MAV, the Persian Affective Voices (PAV), for emotions of happiness, sadness, pleasure, pain, anger, disgust, fear, surprise, and neutrality. The results of the recordings of five of the actresses and five of the actors who obtained the highest score were used to generate the test. For convergent validity assessment, the correlation between the PAV and a speech prosody comprehension test was examined using a gender- and age-matched control group. To investigate the effect of the PD on emotion perception, the PAV test was performed on 28 patients with mild PD between ages 50 and 70 years. The PAV showed a high internal consistency (Cronbach's α = 0.80). A significant positive correlation was observed between the PAV and the speech prosody comprehension test. The test-retest reliability also showed the high repeatability of the PAV (intraclass correlation coefficient = 0.815, P ≤ 0.001). A significant difference was observed between the patients with PD and the controls in all subtests. The PAV test is a useful psychometric tool for examining vocal emotion perception that can be used in both behavioral and neuroimaging studies. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Issues in forensic voice.

PubMed

Hollien, Harry; Huntley Bahr, Ruth; Harnsberger, James D

2014-03-01

The following article provides a general review of an area that can be referred to as Forensic Voice. Its goals will be outlined and that discussion will be followed by a description of its major elements. Considered are (1) the processing and analysis of spoken utterances, (2) distorted speech, (3) enhancement of speech intelligibility (re: surveillance and other recordings), (4) transcripts, (5) authentication of recordings, (6) speaker identification, and (7) the detection of deception, intoxication, and emotions in speech. Stress in speech and the psychological stress evaluation systems (that some individuals attempt to use as lie detectors) also will be considered. Points of entry will be suggested for individuals with the kinds of backgrounds possessed by professionals already working in the voice area. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Voice Technologies in Libraries: A Look into the Future.

ERIC Educational Resources Information Center

Lange, Holley R., Ed.; And Others

1991-01-01

Discussion of synthesized speech and voice recognition focuses on a forum that addressed the potential for speech technologies in libraries. Topics discussed by three contributors include possible library applications in technical processing, book receipt, circulation control, and database access; use by disabled and illiterate users; and problems…
Dual diathesis-stressor model of emotional and linguistic contributions to developmental stuttering.

PubMed

Walden, Tedra A; Frankel, Carl B; Buhr, Anthony P; Johnson, Kia N; Conture, Edward G; Karrass, Jan M

2012-05-01

This study assessed emotional and speech-language contributions to childhood stuttering. A dual diathesis-stressor framework guided this study, in which both linguistic requirements and skills, and emotion and its regulation, are hypothesized to contribute to stuttering. The language diathesis consists of expressive and receptive language skills. The emotion diathesis consists of proclivities to emotional reactivity and regulation of emotion, and the emotion stressor consists of experimentally manipulated emotional inductions prior to narrative speaking tasks. Preschool-age children who do and do not stutter were exposed to three emotion-producing overheard conversations-neutral, positive, and angry. Emotion and emotion-regulatory behaviors were coded while participants listened to each conversation and while telling a story after each overheard conversation. Instances of stuttering during each story were counted. Although there was no main effect of conversation type, results indicated that stuttering in preschool-age children is influenced by emotion and language diatheses, as well as coping strategies and situational emotional stressors. Findings support the dual diathesis-stressor model of stuttering.
Dual Diathesis-Stressor Model of Emotional and Linguistic Contributions to Developmental Stuttering

PubMed Central

Frankel, Carl B.; Buhr, Anthony P.; Johnson, Kia N.; Conture, Edward G.; Karrass, Jan M.

2013-01-01

This study assessed emotional and speech-language contributions to childhood stuttering. A dual diathesis-stressor framework guided this study, in which both linguistic requirements and skills, and emotion and its regulation, are hypothesized to contribute to stuttering. The language diathesis consists of expressive and receptive language skills. The emotion diathesis consists of proclivities to emotional reactivity and regulation of emotion, and the emotion stressor consists of experimentally manipulated emotional inductions prior to narrative speaking tasks. Preschool-age children who do and do not stutter were exposed to three emotion-producing overheard conversations—neutral, positive, and angry. Emotion and emotion-regulatory behaviors were coded while participants listened to each conversation and while telling a story after each overheard conversation. Instances of stuttering during each story were counted. Although there was no main effect of conversation type, results indicated that stuttering in preschool-age children is influenced by emotion and language diatheses, as well as coping strategies and situational emotional stressors. Findings support the dual diathesis-stressor model of stuttering. PMID:22016200

Understanding the abstract role of speech in communication at 12 months.

PubMed

Martin, Alia; Onishi, Kristine H; Vouloumanos, Athena

2012-04-01

Adult humans recognize that even unfamiliar speech can communicate information between third parties, demonstrating an ability to separate communicative function from linguistic content. We examined whether 12-month-old infants understand that speech can communicate before they understand the meanings of specific words. Specifically, we test the understanding that speech permits the transfer of information about a Communicator's target object to a Recipient. Initially, the Communicator selectively grasped one of two objects. In test, the Communicator could no longer reach the objects. She then turned to the Recipient and produced speech (a nonsense word) or non-speech (coughing). Infants looked longer when the Recipient selected the non-target than the target object when the Communicator had produced speech but not coughing (Experiment 1). Looking time patterns differed from the speech condition when the Recipient rather than the Communicator produced the speech (Experiment 2), and when the Communicator produced a positive emotional vocalization (Experiment 3), but did not differ when the Recipient had previously received information about the target by watching the Communicator's selective grasping (Experiment 4). Thus infants understand the information-transferring properties of speech and recognize some of the conditions under which others' information states can be updated. These results suggest that infants possess an abstract understanding of the communicative function of speech, providing an important potential mechanism for language and knowledge acquisition. Copyright © 2011 Elsevier B.V. All rights reserved.
Meeting Their Needs: Provision of Services to the Severely Emotionally Disturbed and Autistic. Conference Proceedings (Memphis, Tennessee, April 27-28, 1983).

ERIC Educational Resources Information Center

Memphis State Univ., TN. Coll. of Education.

The document contains the proceedings of a 1983 Tennessee conference on "Provision of Services to the Severely and Emotionally Disturbed and Austistic." Areas covered were identified as priority needs by Tennessee educators and emphasize the practical rather than the theoretical aspects of providing services. After the text of the keynote speech,…
Computational Modeling of Emotions and Affect in Social-Cultural Interaction

DTIC Science & Technology

2013-10-02

acoustic and textual information sources. Second, a cross-lingual study was performed that shed light on how human perception and automatic recognition...speech is produced, a speaker’s pitch and intonational pattern, and word usage. Better feature representation and advanced approaches were used to...recognition performance, and improved our understanding of language/cultural impact on human perception of emotion and automatic classification. • Units
School performance and wellbeing of children with CI in different communicative-educational environments.

PubMed

Langereis, Margreet; Vermeulen, Anneke

2015-06-01

This study aimed to evaluate the long term effects of CI on auditory, language, educational and social-emotional development of deaf children in different educational-communicative settings. The outcomes of 58 children with profound hearing loss and normal non-verbal cognition, after 60 months of CI use have been analyzed. At testing the children were enrolled in three different educational settings; in mainstream education, where spoken language is used or in hard-of-hearing education where sign supported spoken language is used and in bilingual deaf education, with Sign Language of the Netherlands and Sign Supported Dutch. Children were assessed on auditory speech perception, receptive language, educational attainment and wellbeing. Auditory speech perception of children with CI in mainstream education enable them to acquire language and educational levels that are comparable to those of their normal hearing peers. Although the children in mainstream and hard-of-hearing settings show similar speech perception abilities, language development in children in hard-of-hearing settings lags significantly behind. Speech perception, language and educational attainments of children in deaf education remained extremely poor. Furthermore more children in mainstream and hard-of-hearing environments are resilient than in deaf educational settings. Regression analyses showed an important influence of educational setting. Children with CI who are placed in early intervention environments that facilitate auditory development are able to achieve good auditory speech perception, language and educational levels on the long term. Most parents of these children report no social-emotional concerns. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Hearing Voices and Seeing Things

MedlinePlus

... are serious and severely interfere with a child's thinking and functioning. Children who are psychotic often appear ... and agitated. They also may have disorganized speech, thinking, emotional reactions, and behavior, sometimes accompanied by hallucinations ...
Assessment of auditory and psychosocial handicap associated with unilateral hearing loss among Indian patients.

PubMed

Augustine, Ann Mary; Chrysolyte, Shipra B; Thenmozhi, K; Rupa, V

2013-04-01

In order to assess psychosocial and auditory handicap in Indian patients with unilateral sensorineural hearing loss (USNHL), a prospective study was conducted on 50 adults with USNHL in the ENT Outpatient clinic of a tertiary care centre. The hearing handicap inventory for adults (HHIA) as well as speech in noise and sound localization tests were administered to patients with USNHL. An equal number of age-matched, normal controls also underwent the speech and sound localization tests. The results showed that HHIA scores ranged from 0 to 60 (mean 20.7). Most patients (84.8 %) had either mild to moderate or no handicap. Emotional subscale scores were higher than social subscale scores (p = 0.01). When the effect of sociodemographic factors on HHIA scores was analysed, educated individuals were found to have higher social subscale scores (p = 0.04). Age, sex, side and duration of hearing loss, occupation and income did not affect HHIA scores. Speech in noise and sound localization were significantly poorer in cases compared to controls (p < 0.001). About 75 % of patients refused a rehabilitative device. We conclude that USNHL in Indian adults does not usually produce severe handicap. When present, the handicap is more emotional than social. USNHL significantly affects sound localization and speech in noise. Yet, affected patients seldom seek a rehabilitative device.
Social power and recognition of emotional prosody: High power is associated with lower recognition accuracy than low power.

PubMed

Uskul, Ayse K; Paulmann, Silke; Weick, Mario

2016-02-01

Listeners have to pay close attention to a speaker's tone of voice (prosody) during daily conversations. This is particularly important when trying to infer the emotional state of the speaker. Although a growing body of research has explored how emotions are processed from speech in general, little is known about how psychosocial factors such as social power can shape the perception of vocal emotional attributes. Thus, the present studies explored how social power affects emotional prosody recognition. In a correlational study (Study 1) and an experimental study (Study 2), we show that high power is associated with lower accuracy in emotional prosody recognition than low power. These results, for the first time, suggest that individuals experiencing high or low power perceive emotional tone of voice differently. (c) 2016 APA, all rights reserved).
Applying the Verona coding definitions of emotional sequences (VR-CoDES) to code medical students' written responses to written case scenarios: Some methodological and practical considerations.

PubMed

Ortwein, Heiderose; Benz, Alexander; Carl, Petra; Huwendiek, Sören; Pander, Tanja; Kiessling, Claudia

2017-02-01

To investigate whether the Verona Coding Definitions of Emotional Sequences to code health providers' responses (VR-CoDES-P) can be used for assessment of medical students' responses to patients' cues and concerns provided in written case vignettes. Student responses in direct speech to patient cues and concerns were analysed in 21 different case scenarios using VR-CoDES-P. A total of 977 student responses were available for coding, and 857 responses were codable with the VR-CoDES-P. In 74.6% of responses, the students used either a "reducing space" statement only or a "providing space" statement immediately followed by a "reducing space" statement. Overall, the most frequent response was explicit information advice (ERIa) followed by content exploring (EPCEx) and content acknowledgement (EPCAc). VR-CoDES-P were applicable to written responses of medical students when they were phrased in direct speech. The application of VR-CoDES-P is reliable and feasible when using the differentiation of "providing" and "reducing space" responses. Communication strategies described by students in non-direct speech were difficult to code and produced many missings. VR-CoDES-P are useful for analysis of medical students' written responses when focusing on emotional issues. Students need precise instructions for their response in the given test format. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Perceptual Speech and Paralinguistic Skills of Adolescents with Williams Syndrome

ERIC Educational Resources Information Center

Hargrove, Patricia M.; Pittelko, Stephen; Fillingane, Evan; Rustman, Emily; Lund, Bonnie

2013-01-01

The purpose of this research was to compare selected speech and paralinguistic skills of speakers with Williams syndrome (WS) and typically developing peers and to demonstrate the feasibility of providing preexisting databases to students to facilitate graduate research. In a series of three studies, conversational samples of 12 adolescents with…
The not face: A grammaticalization of facial expressions of emotion.

PubMed

Benitez-Quiroz, C Fabian; Wilbur, Ronnie B; Martinez, Aleix M

2016-05-01

Facial expressions of emotion are thought to have evolved from the development of facial muscles used in sensory regulation and later adapted to express moral judgment. Negative moral judgment includes the expressions of anger, disgust and contempt. Here, we study the hypothesis that these facial expressions of negative moral judgment have further evolved into a facial expression of negation regularly used as a grammatical marker in human language. Specifically, we show that people from different cultures expressing negation use the same facial muscles as those employed to express negative moral judgment. We then show that this nonverbal signal is used as a co-articulator in speech and that, in American Sign Language, it has been grammaticalized as a non-manual marker. Furthermore, this facial expression of negation exhibits the theta oscillation (3-8 Hz) universally seen in syllable and mouthing production in speech and signing. These results provide evidence for the hypothesis that some components of human language have evolved from facial expressions of emotion, and suggest an evolutionary route for the emergence of grammatical markers. Copyright © 2016 Elsevier B.V. All rights reserved.
The Not Face: A grammaticalization of facial expressions of emotion

PubMed Central

Benitez-Quiroz, C. Fabian; Wilbur, Ronnie B.; Martinez, Aleix M.

2016-01-01

Facial expressions of emotion are thought to have evolved from the development of facial muscles used in sensory regulation and later adapted to express moral judgment. Negative moral judgment includes the expressions of anger, disgust and contempt. Here, we study the hypothesis that these facial expressions of negative moral judgment have further evolved into a facial expression of negation regularly used as a grammatical marker in human language. Specifically, we show that people from different cultures expressing negation use the same facial muscles as those employed to express negative moral judgment. We then show that this nonverbal signal is used as a co-articulator in speech and that, in American Sign Language, it has been grammaticalized as a non-manual marker. Furthermore, this facial expression of negation exhibits the theta oscillation (3–8 Hz) universally seen in syllable and mouthing production in speech and signing. These results provide evidence for the hypothesis that some components of human language have evolved from facial expressions of emotion, and suggest an evolutionary route for the emergence of grammatical markers. PMID:26872248
How Stuttering Develops: The Multifactorial Dynamic Pathways Theory

PubMed Central

Weber, Christine

2017-01-01

Purpose We advanced a multifactorial, dynamic account of the complex, nonlinear interactions of motor, linguistic, and emotional factors contributing to the development of stuttering. Our purpose here is to update our account as the multifactorial dynamic pathways theory. Method We review evidence related to how stuttering develops, including genetic/epigenetic factors; motor, linguistic, and emotional features; and advances in neuroimaging studies. We update evidence for our earlier claim: Although stuttering ultimately reflects impairment in speech sensorimotor processes, its course over the life span is strongly conditioned by linguistic and emotional factors. Results Our current account places primary emphasis on the dynamic developmental context in which stuttering emerges and follows its course during the preschool years. Rapid changes in many neurobehavioral systems are ongoing, and critical interactions among these systems likely play a major role in determining persistence of or recovery from stuttering. Conclusion Stuttering, or childhood onset fluency disorder (Diagnostic and Statistical Manual of Mental Disorders, 5th edition; American Psychiatric Association [APA], 2013), is a neurodevelopmental disorder that begins when neural networks supporting speech, language, and emotional functions are rapidly developing. The multifactorial dynamic pathways theory motivates experimental and clinical work to determine the specific factors that contribute to each child's pathway to the diagnosis of stuttering and those most likely to promote recovery. PMID:28837728
How Stuttering Develops: The Multifactorial Dynamic Pathways Theory.

PubMed

Smith, Anne; Weber, Christine

2017-09-18

We advanced a multifactorial, dynamic account of the complex, nonlinear interactions of motor, linguistic, and emotional factors contributing to the development of stuttering. Our purpose here is to update our account as the multifactorial dynamic pathways theory. We review evidence related to how stuttering develops, including genetic/epigenetic factors; motor, linguistic, and emotional features; and advances in neuroimaging studies. We update evidence for our earlier claim: Although stuttering ultimately reflects impairment in speech sensorimotor processes, its course over the life span is strongly conditioned by linguistic and emotional factors. Our current account places primary emphasis on the dynamic developmental context in which stuttering emerges and follows its course during the preschool years. Rapid changes in many neurobehavioral systems are ongoing, and critical interactions among these systems likely play a major role in determining persistence of or recovery from stuttering. Stuttering, or childhood onset fluency disorder (Diagnostic and Statistical Manual of Mental Disorders, 5th edition; American Psychiatric Association [APA], 2013), is a neurodevelopmental disorder that begins when neural networks supporting speech, language, and emotional functions are rapidly developing. The multifactorial dynamic pathways theory motivates experimental and clinical work to determine the specific factors that contribute to each child's pathway to the diagnosis of stuttering and those most likely to promote recovery.
Speech therapy after thyroidectomy

PubMed Central

Wu, Che-Wei

2017-01-01

Common complaints of patients who have received thyroidectomy include dysphonia (voice dysfunction) and dysphagia (difficulty swallowing). One cause of these surgical outcomes is recurrent laryngeal nerve paralysis. Many studies have discussed the effectiveness of speech therapy (e.g., voice therapy and dysphagia therapy) for improving dysphonia and dysphagia, but not specifically in patients who have received thyroidectomy. Therefore, the aim of this paper was to discuss issues regarding speech therapy such as voice therapy and dysphagia for patients after thyroidectomy. Another aim was to review the literature on speech therapy for patients with recurrent laryngeal nerve paralysis after thyroidectomy. Databases used for the literature review in this study included, PubMed, MEDLINE, Academic Search Primer, ERIC, CINAHL Plus, and EBSCO. The articles retrieved by database searches were classified and screened for relevance by using EndNote. Of the 936 articles retrieved, 18 discussed “voice assessment and thyroidectomy”, 3 discussed “voice therapy and thyroidectomy”, and 11 discussed “surgical interventions for voice restoration after thyroidectomy”. Only 3 studies discussed topics related to “swallowing function assessment/treatment and thyroidectomy”. Although many studies have investigated voice changes and assessment methods in thyroidectomy patients, few recent studies have investigated speech therapy after thyroidectomy. Additionally, some studies have addressed dysphagia after thyroidectomy, but few have discussed assessment and treatment of dysphagia after thyroidectomy. PMID:29142841
"Having the heart to be evaluated": The differential effects of fears of positive and negative evaluation on emotional and cardiovascular responses to social threat.

PubMed

Weeks, Justin W; Zoccola, Peggy M

2015-12-01

Accumulating evidence supports fear of evaluation in general as important in social anxiety, including fear of positive evaluation (FPE) and fear of negative evaluation (FNE). The present study examined state responses to an impromptu speech task with a sample of 81 undergraduates. This study is the first to compare and contrast physiological responses associated with FPE and FNE, and to examine both FPE- and FNE-related changes in state anxiety/affect in response to perceived social evaluation during a speech. FPE uniquely predicted (relative to FNE/depression) increases in mean heart rate during the speech; in contrast, neither FNE nor depression related to changes in heart rate. Both FPE and FNE related uniquely to increases in negative affect and state anxiety during the speech. Furthermore, pre-speech state anxiety mediated the relationship between trait FPE and diminished positive affect during the speech. Implications for the theoretical conceptualization and treatment of social anxiety are discussed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Rhythm as a Coordinating Device: Entrainment With Disordered Speech

PubMed Central

Borrie, Stephanie A.; Liss, Julie M.

2014-01-01

Purpose The rhythmic entrainment (coordination) of behavior during human interaction is a powerful phenomenon, considered essential for successful communication, supporting social and emotional connection, and facilitating sense-making and information exchange. Disruption in entrainment likely occurs in conversations involving those with speech and language impairment, but its contribution to communication disorders has not been defined. As a first step to exploring this phenomenon in clinical populations, the present investigation examined the influence of disordered speech on the speech production properties of healthy interactants. Method Twenty-nine neurologically healthy interactants participated in a quasi-conversational paradigm, in which they read sentences (response) in response to hearing prerecorded sentences (exposure) from speakers with dysarthria (n = 4) and healthy controls (n = 4). Recordings of read sentences prior to the task were also collected (habitual). Results Findings revealed that interactants modified their speaking rate and pitch variation to align more closely with the disordered speech. Production shifts in these rhythmic properties, however, remained significantly different from corresponding properties in dysarthric speech. Conclusion Entrainment offers a new avenue for exploring speech and language impairment, addressing a communication process not currently explained by existing frameworks. This article offers direction for advancing this line of inquiry. PMID:24686410
Behaviorally-based couple therapies reduce emotional arousal during couple conflict.

PubMed

Baucom, Brian R; Sheng, Elisa; Christensen, Andrew; Georgiou, Panayiotis G; Narayanan, Shrikanth S; Atkins, David C

2015-09-01

Emotional arousal during relationship conflict is a major target for intervention in couple therapies. The current study examines changes in conflict-related emotional arousal in 104 couples that participated in a randomized clinical trial of two behaviorally-based couple therapies. Emotional arousal is measured using mean fundamental frequency of spouse's speech, and changes in emotional arousal from pre-to post-therapy are examined using multilevel models. Overall emotional arousal, the rate of increase in emotional arousal at the beginning of conflict, and the duration of emotional arousal declined for all couples. Reductions in overall arousal were stronger for TBCT wives than for IBCT wives but not significantly different for IBCT and TBCT husbands. Reductions in the rate of initial arousal were larger for TBCT couples than IBCT couples. Reductions in duration were larger for IBCT couples than TBCT couples. These findings suggest that both therapies can reduce emotional arousal, but that the two therapies create different kinds of change in emotional arousal. Copyright © 2015 Elsevier Ltd. All rights reserved.
A systematic review of treatment intensity in speech disorders.

PubMed

Kaipa, Ramesh; Peterson, Abigail Marie

2016-12-01

Treatment intensity (sometimes referred to as "practice amount") has been well-investigated in learning non-speech tasks, but its role in treating speech disorders has not been largely analysed. This study reviewed the literature regarding treatment intensity in speech disorders. A systematic search was conducted in four databases using appropriate search terms. Seven articles from a total of 580 met the inclusion criteria. The speech disorders investigated included speech sound disorders, dysarthria, acquired apraxia of speech and childhood apraxia of speech. All seven studies were evaluated for their methodological quality, research phase and evidence level. Evidence level of reviewed studies ranged from moderate to strong. With regard to the research phase, only one study was considered to be phase III research, which corresponds to the controlled trial phase. The remaining studies were considered to be phase II research, which corresponds to the phase where magnitude of therapeutic effect is assessed. Results suggested that higher treatment intensity was favourable over lower treatment intensity of specific treatment technique(s) for treating childhood apraxia of speech and speech sound (phonological) disorders. Future research should incorporate randomised-controlled designs to establish optimal treatment intensity that is specific to each of the speech disorders.
45 CFR 2490.103 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-10-01

..., such diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, HIV disease (whether symptomatic or asymptomatic), and drug addiction and alcoholism. (2...
45 CFR 2490.103 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-10-01

..., such diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, HIV disease (whether symptomatic or asymptomatic), and drug addiction and alcoholism. (2...

[Swallowing and Voice Disorders in Cancer Patients].

PubMed

Tanuma, Akira

2015-07-01

Dysphagia sometimes occurs in patients with head and neck cancer, particularly in those undergoing surgery and radiotherapy for lingual, pharyngeal, and laryngeal cancer. It also occurs in patients with esophageal cancer and brain tumor. Patients who undergo glossectomy usually show impairment of the oral phase of swallowing, whereas those with pharyngeal, laryngeal, and esophageal cancer show impairment of the pharyngeal phase of swallowing. Videofluoroscopic examination of swallowing provides important information necessary for rehabilitation of swallowing in these patients. Appropriate swallowing exercises and compensatory strategies can be decided based on the findings of the evaluation. Palatal augmentation prostheses are sometimes used for rehabilitation in patients undergoing glossectomy. Patients who undergo total laryngectomy or total pharyngolaryngoesophagectomy should receive speech therapy to enable them to use alaryngeal speech methods, including electrolarynx, esophageal speech, or speech via tracheoesophageal puncture. Regaining swallowing function and speech can improve a patient's emotional health and quality of life. Therefore, it is important to manage swallowing and voice disorders appropriately.
Hemodynamics of speech production: An fNIRS investigation of children who stutter.

PubMed

Walsh, B; Tian, F; Tourville, J A; Yücel, M A; Kuczek, T; Bostian, A J

2017-06-22

Stuttering affects nearly 1% of the population worldwide and often has life-altering negative consequences, including poorer mental health and emotional well-being, and reduced educational and employment achievements. Over two decades of neuroimaging research reveals clear anatomical and physiological differences in the speech neural networks of adults who stutter. However, there have been few neurophysiological investigations of speech production in children who stutter. Using functional near-infrared spectroscopy (fNIRS), we examined hemodynamic responses over neural regions integral to fluent speech production including inferior frontal gyrus, premotor cortex, and superior temporal gyrus during a picture description task. Thirty-two children (16 stuttering and 16 controls) aged 7-11 years participated in the study. We found distinctly different speech-related hemodynamic responses in the group of children who stutter compared to the control group. Whereas controls showed significant activation over left dorsal inferior frontal gyrus and left premotor cortex, children who stutter exhibited deactivation over these left hemisphere regions. This investigation of neural activation during natural, connected speech production in children who stutter demonstrates that in childhood stuttering, atypical functional organization for speech production is present and suggests promise for the use of fNIRS during natural speech production in future research with typical and atypical child populations.
Is talking to an automated teller machine natural and fun?

PubMed

Chan, F Y; Khalid, H M

Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.
Speech in 10-Year-Olds Born With Cleft Lip and Palate: What Do Peers Say?

PubMed

Nyberg, Jill; Havstam, Christina

2016-09-01

The aim of this study was to explore how 10-year-olds describe speech and communicative participation in children born with unilateral cleft lip and palate in their own words, whether they perceive signs of velopharyngeal insufficiency (VPI) and articulation errors of different degrees, and if so, which terminology they use. Methods/Participants: Nineteen 10-year-olds participated in three focus group interviews where they listened to 10 to 12 speech samples with different types of cleft speech characteristics assessed by speech and language pathologists (SLPs) and described what they heard. The interviews were transcribed and analyzed with qualitative content analysis. The analysis resulted in three interlinked categories encompassing different aspects of speech, personality, and social implications: descriptions of speech, thoughts on causes and consequences, and emotional reactions and associations. Each category contains four subcategories exemplified with quotes from the children's statements. More pronounced signs of VPI were perceived but referred to in terms relevant to 10-year-olds. Articulatory difficulties, even minor ones, were noted. Peers reflected on the risk to teasing and bullying and on how children with impaired speech might experience their situation. The SLPs and peers did not agree on minor signs of VPI, but they were unanimous in their analysis of clinically normal and more severely impaired speech. Articulatory impairments may be more important to treat than minor signs of VPI based on what peers say.
The impact of threat and cognitive stress on speech motor control in people who stutter.

PubMed

Lieshout, Pascal van; Ben-David, Boaz; Lipski, Melinda; Namasivayam, Aravind

2014-06-01

In the present study, an Emotional Stroop and Classical Stroop task were used to separate the effect of threat content and cognitive stress from the phonetic features of words on motor preparation and execution processes. A group of 10 people who stutter (PWS) and 10 matched people who do not stutter (PNS) repeated colour names for threat content words and neutral words, as well as for traditional Stroop stimuli. Data collection included speech acoustics and movement data from upper lip and lower lip using 3D EMA. PWS in both tasks were slower to respond and showed smaller upper lip movement ranges than PNS. For the Emotional Stroop task only, PWS were found to show larger inter-lip phase differences compared to PNS. General threat words were executed with faster lower lip movements (larger range and shorter duration) in both groups, but only PWS showed a change in upper lip movements. For stutter specific threat words, both groups showed a more variable lip coordination pattern, but only PWS showed a delay in reaction time compared to neutral words. Individual stuttered words showed no effects. Both groups showed a classical Stroop interference effect in reaction time but no changes in motor variables. This study shows differential motor responses in PWS compared to controls for specific threat words. Cognitive stress was not found to affect stuttering individuals differently than controls or that its impact spreads to motor execution processes. After reading this article, the reader will be able to: (1) discuss the importance of understanding how threat content influences speech motor control in people who stutter and non-stuttering speakers; (2) discuss the need to use tasks like the Emotional Stroop and Regular Stroop to separate phonetic (word-bound) based impact on fluency from other factors in people who stutter; and (3) describe the role of anxiety and cognitive stress on speech motor processes. Copyright © 2014 Elsevier Inc. All rights reserved.
Threat Interference Biases Predict Socially Anxious Behavior: The Role of Inhibitory Control and Minute of Stressor.

PubMed

Gorlin, Eugenia I; Teachman, Bethany A

2015-07-01

The current study brings together two typically distinct lines of research. First, social anxiety is inconsistently associated with behavioral deficits in social performance, and the factors accounting for these deficits remain poorly understood. Second, research on selective processing of threat cues, termed cognitive biases, suggests these biases typically predict negative outcomes, but may sometimes be adaptive, depending on the context. Integrating these research areas, the current study examined whether conscious and/or unconscious threat interference biases (indexed by the unmasked and masked emotional Stroop) can explain unique variance, beyond self-reported anxiety measures, in behavioral avoidance and observer-rated anxious behavior during a public speaking task. Minute of speech and general inhibitory control (indexed by the color-word Stroop) were examined as within-subject and between-subject moderators, respectively. Highly socially anxious participants (N=135) completed the emotional and color-word Stroop blocks prior to completing a 4-minute videotaped speech task, which was later coded for anxious behaviors (e.g., speech dysfluency). Mixed-effects regression analyses revealed that general inhibitory control moderated the relationship between both conscious and unconscious threat interference bias and anxious behavior (though not avoidance), such that lower threat interference predicted higher levels of anxious behavior, but only among those with relatively weaker (versus stronger) inhibitory control. Minute of speech further moderated this relationship for unconscious (but not conscious) social-threat interference, such that lower social-threat interference predicted a steeper increase in anxious behaviors over the course of the speech (but only among those with weaker inhibitory control). Thus, both trait and state differences in inhibitory control resources may influence the behavioral impact of threat biases in social anxiety. Copyright © 2015. Published by Elsevier Ltd.
Biologically inspired emotion recognition from speech

NASA Astrophysics Data System (ADS)

Caponetti, Laura; Buscicchio, Cosimo Alessandro; Castellano, Giovanna

2011-12-01

Emotion recognition has become a fundamental task in human-computer interaction systems. In this article, we propose an emotion recognition approach based on biologically inspired methods. Specifically, emotion classification is performed using a long short-term memory (LSTM) recurrent neural network which is able to recognize long-range dependencies between successive temporal patterns. We propose to represent data using features derived from two different models: mel-frequency cepstral coefficients (MFCC) and the Lyon cochlear model. In the experimental phase, results obtained from the LSTM network and the two different feature sets are compared, showing that features derived from the Lyon cochlear model give better recognition results in comparison with those obtained with the traditional MFCC representation.
Four-Channel Biosignal Analysis and Feature Extraction for Automatic Emotion Recognition

NASA Astrophysics Data System (ADS)

Kim, Jonghwa; André, Elisabeth

This paper investigates the potential of physiological signals as a reliable channel for automatic recognition of user's emotial state. For the emotion recognition, little attention has been paid so far to physiological signals compared to audio-visual emotion channels such as facial expression or speech. All essential stages of automatic recognition system using biosignals are discussed, from recording physiological dataset up to feature-based multiclass classification. Four-channel biosensors are used to measure electromyogram, electrocardiogram, skin conductivity and respiration changes. A wide range of physiological features from various analysis domains, including time/frequency, entropy, geometric analysis, subband spectra, multiscale entropy, etc., is proposed in order to search the best emotion-relevant features and to correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by emotion recognition results.
[Family characteristics of stuttering children].

PubMed

Simić-Ruzić, Budimirka; Jovanović, Aleksandar A

2008-01-01

Stuttering is a functional impairment of speech, which is manifested by conscious, but nonintentionally interrupted, disharmonic and disrhythmic fluctuation of sound varying in frequency and intensity. Aetiology of this disorder has been conceived within the frame of theoretical models, which tend to connect genetic and epigenetic factors. The goal of the paper was to study the characteristics of the family functioning of stuttering children in comparison to the family functioning of children without speech disorder, which confirmed the justification of the introduction of family orientated therapeutic interventions into the therapy spectrum of child stuttering. Seventy-nine nucleus families of 3 to 6-year-old children were examined; of these, 39 families had stuttering children and 40 had children without speech disorder. The assessment of family characteristics was made using the Family Health Scale, an observer-rating scale which according to semistructured interview and operational criteria, measures 6 basic dimensions of family functioning: Emotional State, Communication, Borders, Alliances, Adaptability & Stability, Family Skills. A total score calculated from the basic dimensions, is considered as a global index of family health. Families with stuttering children compared to families with children without speech disorder showed significantly lower scores in all the basic dimension of family functioning, as well as in the total score on the Family Health Scale. Our research results have shown that stuttering children in comparison with children without speech disorder live in families with unfavourable emotional atmosphere, impaired communication and worse control over situational and developmental difficulties, which affect children's development and well-being. In the light of previous research, the application of family therapy modified according to the child's needs is now considered indispensable in the therapeutic approach to stuttering children. The assessment of family characteristics with special reference to the ability of parents to recognize specific needs of children with speech disorder and adequate interaction, as well as readiness of parents for therapeutic collaboration are the necessary elements in legal custody evaluations.
Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research.

PubMed

Toutios, Asterios; Narayanan, Shrikanth S

2016-01-01

Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.
Population Health in Pediatric Speech and Language Disorders: Available Data Sources and a Research Agenda for the Field.

PubMed

Raghavan, Ramesh; Camarata, Stephen; White, Karl; Barbaresi, William; Parish, Susan; Krahn, Gloria

2018-05-17

The aim of the study was to provide an overview of population science as applied to speech and language disorders, illustrate data sources, and advance a research agenda on the epidemiology of these conditions. Computer-aided database searches were performed to identify key national surveys and other sources of data necessary to establish the incidence, prevalence, and course and outcome of speech and language disorders. This article also summarizes a research agenda that could enhance our understanding of the epidemiology of these disorders. Although the data yielded estimates of prevalence and incidence for speech and language disorders, existing sources of data are inadequate to establish reliable rates of incidence, prevalence, and outcomes for speech and language disorders at the population level. Greater support for inclusion of speech and language disorder-relevant questions is necessary in national health surveys to build the population science in the field.
Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research

PubMed Central

TOUTIOS, ASTERIOS; NARAYANAN, SHRIKANTH S.

2016-01-01

Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development. PMID:27833745
45 CFR 2301.103 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-10-01

...; respiratory, including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and... “physical or mental impairment” includes, but is not limited to, such diseases and conditions as orthopedic..., cancer, heart disease, diabetes, mental retardation, emotional illness, HIV disease (whether symptomatic...
5 CFR 1636.103 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-01-01

...; respiratory, including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and... “physical or mental impairment” includes, but is not limited to, such diseases and conditions as orthopedic..., cancer, heart disease, diabetes, mental retardation, emotional illness, HIV disease (whether symptomatic...
45 CFR 1803.3 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-10-01

...; respiratory, including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and... “physical or mental impairment” includes, but is not limited to, such diseases and conditions as orthopedic..., cancer, heart disease, diabetes, mental retardation, emotional illness, and drug addiction and alcoholism...
43 CFR 17.503 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-10-01

...; respiratory, including speech organs; cardiovascular; reproductive; digestive; genitourinary; hemic and... “physical, mental or sensory impairment” includes, but is not limited to, such diseases and conditions as... sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, drug addiction, and...
45 CFR 1214.103 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-10-01

... impairment” includes, but is not limited to, such diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, and drug addiction and alcoholism. (2) Major...
45 CFR 1214.103 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-10-01

... impairment” includes, but is not limited to, such diseases and conditions as orthopedic, visual, speech, and hearing impairments, cerebral palsy, epilepsy, muscular dystrophy, multiple sclerosis, cancer, heart disease, diabetes, mental retardation, emotional illness, and drug addiction and alcoholism. (2) Major...
Vocal contagion of emotions in non-human animals

PubMed Central

2018-01-01

Communicating emotions to conspecifics (emotion expression) allows the regulation of social interactions (e.g. approach and avoidance). Moreover, when emotions are transmitted from one individual to the next, leading to state matching (emotional contagion), information transfer and coordination between group members are facilitated. Despite the high potential for vocalizations to influence the affective state of surrounding individuals, vocal contagion of emotions has been largely unexplored in non-human animals. In this paper, I review the evidence for discrimination of vocal expression of emotions, which is a necessary step for emotional contagion to occur. I then describe possible proximate mechanisms underlying vocal contagion of emotions, propose criteria to assess this phenomenon and review the existing evidence. The literature so far shows that non-human animals are able to discriminate and be affected by conspecific and also potentially heterospecific (e.g. human) vocal expression of emotions. Since humans heavily rely on vocalizations to communicate (speech), I suggest that studying vocal contagion of emotions in non-human animals can lead to a better understanding of the evolution of emotional contagion and empathy. PMID:29491174
The research questions and methodological adequacy of clinical studies of the voice and larynx published in Brazilian and international journals.

PubMed

Vieira, Vanessa Pedrosa; De Biase, Noemi; Peccin, Maria Stella; Atallah, Alvaro Nagib

2009-06-01

To evaluate the methodological adequacy of voice and laryngeal study designs published in speech-language pathology and otorhinolaryngology journals indexed for the ISI Web of Knowledge (ISI Web) and the MEDLINE database. A cross-sectional study conducted at the Universidade Federal de São Paulo (Federal University of São Paulo). Two Brazilian speech-language pathology and otorhinolaryngology journals (Pró-Fono and Revista Brasileira de Otorrinolaringologia) and two international speech-language pathology and otorhinolaryngology journals (Journal of Voice, Laryngoscope), all dated between 2000 and 2004, were hand-searched by specialists. Subsequently, voice and larynx publications were separated, and a speech-language pathologist and otorhinolaryngologist classified 374 articles from the four journals according to objective and study design. The predominant objective contained in the articles was that of primary diagnostic evaluation (27%), and the most frequent study design was case series (33.7%). A mere 7.8% of the studies were designed adequately with respect to the stated objectives. There was no statistical difference in the methodological quality of studies indexed for the ISI Web and the MEDLINE database. The studies published in both national journals, indexed for the MEDLINE database, and international journals, indexed for the ISI Web, demonstrate weak methodology, with research poorly designed to meet the proposed objectives. There is much scientific work to be done in order to decrease uncertainty in the field analysed.

Minimalistic toy robot to analyze a scenery of speaker-listener condition in autism.

PubMed

Giannopulu, Irini; Montreynaud, Valérie; Watanabe, Tomio

2016-05-01

Atypical neural architecture causes impairment in communication capabilities and reduces the ability of representing the referential statements of other people in children with autism. During a scenery of "speaker-listener" communication, we have analyzed verbal and emotional expressions in neurotypical children (n = 20) and in children with autism (n = 20). The speaker was always a child, and the listener was a human or a minimalistic robot which reacts to speech expression by nodding only. Although both groups performed the task, everything happens as if the robot could allow children with autism to elaborate a multivariate equation encoding and conceptualizing within his/her brain, and externalizing into unconscious emotion (heart rate) and conscious verbal speech (words). Such a behavior would indicate that minimalistic artificial environments such as toy robots could be considered as the root of neuronal organization and reorganization with the potential to improve brain activity.
Motivation and appraisal in perception of poorly specified speech.

PubMed

Lidestam, Björn; Beskow, Jonas

2006-04-01

Normal-hearing students (n = 72) performed sentence, consonant, and word identification in either A (auditory), V (visual), or AV (audiovisual) modality. The auditory signal had difficult speech-to-noise relations. Talker (human vs. synthetic), topic (no cue vs. cue-words), and emotion (no cue vs. facially displayed vs. cue-words) were varied within groups. After the first block, effects of modality, face, topic, and emotion on initial appraisal and motivation were assessed. After the entire session, effects of modality on longer-term appraisal and motivation were assessed. The results from both assessments showed that V identification was more positively appraised than A identification. Correlations were tentatively interpreted such that evaluation of self-rated performance possibly depends on subjective standard and is reflected on motivation (if below subjective standard, AV group), or on appraisal (if above subjective standard, A group). Suggestions for further research are presented.
Intentional Voice Command Detection for Trigger-Free Speech Interface

NASA Astrophysics Data System (ADS)

Obuchi, Yasunari; Sumiyoshi, Takashi

In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
Prosodic alignment in human-computer interaction

NASA Astrophysics Data System (ADS)

Suzuki, N.; Katagiri, Y.

2007-06-01

Androids that replicate humans in form also need to replicate them in behaviour to achieve a high level of believability or lifelikeness. We explore the minimal social cues that can induce in people the human tendency for social acceptance, or ethopoeia, toward artifacts, including androids. It has been observed that people exhibit a strong tendency to adjust to each other, through a number of speech and language features in human-human conversational interactions, to obtain communication efficiency and emotional engagement. We investigate in this paper the phenomena related to prosodic alignment in human-computer interactions, with particular focus on human-computer alignment of speech characteristics. We found that people exhibit unidirectional and spontaneous short-term alignment of loudness and response latency in their speech in response to computer-generated speech. We believe this phenomenon of prosodic alignment provides one of the key components for building social acceptance of androids.
The biopsychosocial model of stress in adolescence: self-awareness of performance versus stress reactivity

PubMed Central

Rith-Najarian, Leslie R.; McLaughlin, Katie A.; Sheridan, Margaret A.; Nock, Matthew K.

2014-01-01

Extensive research among adults supports the biopsychosocial (BPS) model of challenge and threat, which describes relationships among stress appraisals, physiological stress reactivity, and performance; however, no previous studies have examined these relationships in adolescents. Perceptions of stressors as well as physiological reactivity to stress increase during adolescence, highlighting the importance of understanding the relationships among stress appraisals, physiological reactivity, and performance during this developmental period. In this study, 79 adolescent participants reported on stress appraisals before and after a Trier Social Stress Test in which they performed a speech task. Physiological stress reactivity was defined by changes in cardiac output and total peripheral resistance from a baseline rest period to the speech task, and performance on the speech was coded using an objective rating system. We observed in adolescents only two relationships found in past adult research on the BPS model variables: (1) pre-task stress appraisal predicted post-task stress appraisal and (2) performance predicted post-task stress appraisal. Physiological reactivity during the speech was unrelated to pre- and post-task stress appraisals and to performance. We conclude that the lack of association between post-task stress appraisal and physiological stress reactivity suggests that adolescents might have low self-awareness of physiological emotional arousal. Our findings further suggest that adolescent stress appraisals are based largely on their performance during stressful situations. Developmental implications of this potential lack of awareness of one’s physiological and emotional state during adolescence are discussed. PMID:24491123
The biopsychosocial model of stress in adolescence: self-awareness of performance versus stress reactivity.

PubMed

Rith-Najarian, Leslie R; McLaughlin, Katie A; Sheridan, Margaret A; Nock, Matthew K

2014-03-01

Extensive research among adults supports the biopsychosocial (BPS) model of challenge and threat, which describes relationships among stress appraisals, physiological stress reactivity, and performance; however, no previous studies have examined these relationships in adolescents. Perceptions of stressors as well as physiological reactivity to stress increase during adolescence, highlighting the importance of understanding the relationships among stress appraisals, physiological reactivity, and performance during this developmental period. In this study, 79 adolescent participants reported on stress appraisals before and after a Trier Social Stress Test in which they performed a speech task. Physiological stress reactivity was defined by changes in cardiac output and total peripheral resistance from a baseline rest period to the speech task, and performance on the speech was coded using an objective rating system. We observed in adolescents only two relationships found in past adult research on the BPS model variables: (1) pre-task stress appraisal predicted post-task stress appraisal and (2) performance predicted post-task stress appraisal. Physiological reactivity during the speech was unrelated to pre- and post-task stress appraisals and to performance. We conclude that the lack of association between post-task stress appraisal and physiological stress reactivity suggests that adolescents might have low self-awareness of physiological emotional arousal. Our findings further suggest that adolescent stress appraisals are based largely on their performance during stressful situations. Developmental implications of this potential lack of awareness of one's physiological and emotional state during adolescence are discussed.
Mild Developmental Foreign Accent Syndrome and Psychiatric Comorbidity: Altered White Matter Integrity in Speech and Emotion Regulation Networks

PubMed Central

Berthier, Marcelo L.; Roé-Vellvé, Núria; Moreno-Torres, Ignacio; Falcon, Carles; Thurnhofer-Hemsi, Karl; Paredes-Pacheco, José; Torres-Prioris, María J.; De-Torres, Irene; Alfaro, Francisco; Gutiérrez-Cardo, Antonio L.; Baquero, Miquel; Ruiz-Cruces, Rafael; Dávila, Guadalupe

2016-01-01

Foreign accent syndrome (FAS) is a speech disorder that is defined by the emergence of a peculiar manner of articulation and intonation which is perceived as foreign. In most cases of acquired FAS (AFAS) the new accent is secondary to small focal lesions involving components of the bilaterally distributed neural network for speech production. In the past few years FAS has also been described in different psychiatric conditions (conversion disorder, bipolar disorder, and schizophrenia) as well as in developmental disorders (specific language impairment, apraxia of speech). In the present study, two adult males, one with atypical phonetic production and the other one with cluttering, reported having developmental FAS (DFAS) since their adolescence. Perceptual analysis by naïve judges could not confirm the presence of foreign accent, possibly due to the mildness of the speech disorder. However, detailed linguistic analysis provided evidence of prosodic and segmental errors previously reported in AFAS cases. Cognitive testing showed reduced communication in activities of daily living and mild deficits related to psychiatric disorders. Psychiatric evaluation revealed long-lasting internalizing disorders (neuroticism, anxiety, obsessive-compulsive disorder, social phobia, depression, alexithymia, hopelessness, and apathy) in both subjects. Diffusion tensor imaging (DTI) data from each subject with DFAS were compared with data from a group of 21 age- and gender-matched healthy control subjects. Diffusion parameters (MD, AD, and RD) in predefined regions of interest showed changes of white matter microstructure in regions previously related with AFAS and psychiatric disorders. In conclusion, the present findings militate against the possibility that these two subjects have FAS of psychogenic origin. Rather, our findings provide evidence that mild DFAS occurring in the context of subtle, yet persistent, developmental speech disorders may be associated with structural brain anomalies. We suggest that the simultaneous involvement of speech and emotion regulation networks might result from disrupted neural organization during development, or compensatory or maladaptive plasticity. Future studies are required to examine whether the interplay between biological trait-like diathesis (shyness, neuroticism) and the stressful experience of living with mild DFAS lead to the development of internalizing psychiatric disorders. PMID:27555813
A Mis-recognized Medical Vocabulary Correction System for Speech-based Electronic Medical Record

PubMed Central

Seo, Hwa Jeong; Kim, Ju Han; Sakabe, Nagamasa

2002-01-01

Speech recognition as an input tool for electronic medical record (EMR) enables efficient data entry at the point of care. However, the recognition accuracy for medical vocabulary is much poorer than that for doctor-patient dialogue. We developed a mis-recognized medical vocabulary correction system based on syllable-by-syllable comparison of speech text against medical vocabulary database. Using specialty medical vocabulary, the algorithm detects and corrects mis-recognized medical vocabularies in narrative text. Our preliminary evaluation showed 94% of accuracy in mis-recognized medical vocabulary correction.
Speech Volume Indexes Sex Differences in the Social-Emotional Effects of Alcohol

PubMed Central

Fairbairn, Catharine E.; Sayette, Michael A.; Amole, Marlissa C.; Dimoff, John D.; Cohn, Jeffrey F.; Girard, Jeffrey M.

2015-01-01

Men and women differ dramatically in their rates of alcohol use disorder (AUD), and researchers have long been interested in identifying mechanisms underlying male vulnerability to problem drinking. Surveys suggest that social processes underlie sex differences in drinking patterns, with men reporting greater social enhancement from alcohol than women, and all-male social drinking contexts being associated with particularly high rates of hazardous drinking. But experimental evidence for sex differences in social-emotional response to alcohol has heretofore been lacking. Research using larger sample sizes, a social context, and more sensitive measures of alcohol’s rewarding effects may be necessary to better understand sex differences in the etiology of AUD. This study explored the acute effects of alcohol during social exchange on speech volume –an objective measure of social-emotional experience that was reliably captured at the group level. Social drinkers (360 male; 360 female) consumed alcohol (.82g/kg males; .74g/kg females), placebo, or a no-alcohol control beverage in groups of three over 36-minutes. Within each of the three beverage conditions, equal numbers of groups consisted of all males, all females, 2 females and 1 male, and 1 female and 2 males. Speech volume was monitored continuously throughout the drink period, and group volume emerged as a robust correlate of self-report and facial indexes of social reward. Notably, alcohol-related increases in group volume were observed selectively in all-male groups but not in groups containing any females. Results point to social enhancement as a promising direction for research exploring factors underlying sex differences in problem drinking. PMID:26237323
Underconnectivity between voice-selective cortex and reward circuitry in children with autism.

PubMed

Abrams, Daniel A; Lynch, Charles J; Cheng, Katherine M; Phillips, Jennifer; Supekar, Kaustubh; Ryali, Srikanth; Uddin, Lucina Q; Menon, Vinod

2013-07-16

Individuals with autism spectrum disorders (ASDs) often show insensitivity to the human voice, a deficit that is thought to play a key role in communication deficits in this population. The social motivation theory of ASD predicts that impaired function of reward and emotional systems impedes children with ASD from actively engaging with speech. Here we explore this theory by investigating distributed brain systems underlying human voice perception in children with ASD. Using resting-state functional MRI data acquired from 20 children with ASD and 19 age- and intelligence quotient-matched typically developing children, we examined intrinsic functional connectivity of voice-selective bilateral posterior superior temporal sulcus (pSTS). Children with ASD showed a striking pattern of underconnectivity between left-hemisphere pSTS and distributed nodes of the dopaminergic reward pathway, including bilateral ventral tegmental areas and nucleus accumbens, left-hemisphere insula, orbitofrontal cortex, and ventromedial prefrontal cortex. Children with ASD also showed underconnectivity between right-hemisphere pSTS, a region known for processing speech prosody, and the orbitofrontal cortex and amygdala, brain regions critical for emotion-related associative learning. The degree of underconnectivity between voice-selective cortex and reward pathways predicted symptom severity for communication deficits in children with ASD. Our results suggest that weak connectivity of voice-selective cortex and brain structures involved in reward and emotion may impair the ability of children with ASD to experience speech as a pleasurable stimulus, thereby impacting language and social skill development in this population. Our study provides support for the social motivation theory of ASD.
Human emotion detector based on genetic algorithm using lip features

NASA Astrophysics Data System (ADS)

Brown, Terrence; Fetanat, Gholamreza; Homaifar, Abdollah; Tsou, Brian; Mendoza-Schrock, Olga

2010-04-01

We predicted human emotion using a Genetic Algorithm (GA) based lip feature extractor from facial images to classify all seven universal emotions of fear, happiness, dislike, surprise, anger, sadness and neutrality. First, we isolated the mouth from the input images using special methods, such as Region of Interest (ROI) acquisition, grayscaling, histogram equalization, filtering, and edge detection. Next, the GA determined the optimal or near optimal ellipse parameters that circumvent and separate the mouth into upper and lower lips. The two ellipses then went through fitness calculation and were followed by training using a database of Japanese women's faces expressing all seven emotions. Finally, our proposed algorithm was tested using a published database consisting of emotions from several persons. The final results were then presented in confusion matrices. Our results showed an accuracy that varies from 20% to 60% for each of the seven emotions. The errors were mainly due to inaccuracies in the classification, and also due to the different expressions in the given emotion database. Detailed analysis of these errors pointed to the limitation of detecting emotion based on the lip features alone. Similar work [1] has been done in the literature for emotion detection in only one person, we have successfully extended our GA based solution to include several subjects.
Intact brain processing of musical emotions in autism spectrum disorder, but more cognitive load and arousal in happy vs. sad music.

PubMed

Gebauer, Line; Skewes, Joshua; Westphael, Gitte; Heaton, Pamela; Vuust, Peter

2014-01-01

Music is a potent source for eliciting emotions, but not everybody experience emotions in the same way. Individuals with autism spectrum disorder (ASD) show difficulties with social and emotional cognition. Impairments in emotion recognition are widely studied in ASD, and have been associated with atypical brain activation in response to emotional expressions in faces and speech. Whether these impairments and atypical brain responses generalize to other domains, such as emotional processing of music, is less clear. Using functional magnetic resonance imaging, we investigated neural correlates of emotion recognition in music in high-functioning adults with ASD and neurotypical adults. Both groups engaged similar neural networks during processing of emotional music, and individuals with ASD rated emotional music comparable to the group of neurotypical individuals. However, in the ASD group, increased activity in response to happy compared to sad music was observed in dorsolateral prefrontal regions and in the rolandic operculum/insula, and we propose that this reflects increased cognitive processing and physiological arousal in response to emotional musical stimuli in this group.
Processing of affective speech prosody is impaired in Asperger syndrome.

PubMed

Korpilahti, Pirjo; Jansson-Verkasalo, Eira; Mattila, Marja-Leena; Kuusikko, Sanna; Suominen, Kalervo; Rytky, Seppo; Pauls, David L; Moilanen, Irma

2007-09-01

Many people with the diagnosis of Asperger syndrome (AS) show poorly developed skills in understanding emotional messages. The present study addressed discrimination of speech prosody in children with AS at neurophysiological level. Detection of affective prosody was investigated in one-word utterances as indexed by the N1 and the mismatch negativity (MMN) of auditory event-related potentials (ERPs). Data from fourteen boys with AS were compared with those for thirteen typically developed boys. These results suggest atypical neural responses to affective prosody in children with AS and their fathers, especially over the RH, and that this impairment can already be seen at low-level information processes. Our results provide evidence for familial patterns of abnormal auditory brain reactions to prosodic features of speech.
Architectural Considerations for Classrooms for Exceptional Children.

ERIC Educational Resources Information Center

Texas Education Agency, Austin.

Definitions are provided of the following exceptionalities: blind, partially sighted, physically handicapped, minimally brain injured, deaf, educable mentally retarded (primary, junior, and senior high levels), trainable mentally retarded, speech handicapped, and emotionally disturbed. Architectural guidelines specify classroom location, size,…
Freedom of racist speech: Ego and expressive threats.

PubMed

White, Mark H; Crandall, Christian S

2017-09-01

Do claims of "free speech" provide cover for prejudice? We investigate whether this defense of racist or hate speech serves as a justification for prejudice. In a series of 8 studies (N = 1,624), we found that explicit racial prejudice is a reliable predictor of the "free speech defense" of racist expression. Participants endorsed free speech values for singing racists songs or posting racist comments on social media; people high in prejudice endorsed free speech more than people low in prejudice (meta-analytic r = .43). This endorsement was not principled-high levels of prejudice did not predict endorsement of free speech values when identical speech was directed at coworkers or the police. Participants low in explicit racial prejudice actively avoided endorsing free speech values in racialized conditions compared to nonracial conditions, but participants high in racial prejudice increased their endorsement of free speech values in racialized conditions. Three experiments failed to find evidence that defense of racist speech by the highly prejudiced was based in self-relevant or self-protective motives. Two experiments found evidence that the free speech argument protected participants' own freedom to express their attitudes; the defense of other's racist speech seems motivated more by threats to autonomy than threats to self-regard. These studies serve as an elaboration of the Justification-Suppression Model (Crandall & Eshleman, 2003) of prejudice expression. The justification of racist speech by endorsing fundamental political values can serve to buffer racial and hate speech from normative disapproval. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Musical anhedonia: selective loss of emotional experience in listening to music.

PubMed

Satoh, Masayuki; Nakase, Taizen; Nagata, Ken; Tomimoto, Hidekazu

2011-10-01

Recent case studies have suggested that emotion perception and emotional experience of music have independent cognitive processing. We report a patient who showed selective impairment of emotional experience only in listening to music, that is musical anhednia. A 71-year-old right-handed man developed an infarction in the right parietal lobe. He found himself unable to experience emotion in listening to music, even to which he had listened pleasantly before the illness. In neuropsychological assessments, his intellectual, memory, and constructional abilities were normal. Speech audiometry and recognition of environmental sounds were within normal limits. Neuromusicological assessments revealed no abnormality in the perception of elementary components of music, expression and emotion perception of music. Brain MRI identified the infarct lesion in the right inferior parietal lobule. These findings suggest that emotional experience of music could be selectively impaired without any disturbance of other musical, neuropsychological abilities. The right parietal lobe might participate in emotional experience in listening to music.
Hot Speech and Exploding Bombs: Autonomic Arousal During Emotion Classification of Prosodic Utterances and Affective Sounds

PubMed Central

Jürgens, Rebecca; Fischer, Julia; Schacht, Annekathrin

2018-01-01

Emotional expressions provide strong signals in social interactions and can function as emotion inducers in a perceiver. Although speech provides one of the most important channels for human communication, its physiological correlates, such as activations of the autonomous nervous system (ANS) while listening to spoken utterances, have received far less attention than in other domains of emotion processing. Our study aimed at filling this gap by investigating autonomic activation in response to spoken utterances that were embedded into larger semantic contexts. Emotional salience was manipulated by providing information on alleged speaker similarity. We compared these autonomic responses to activations triggered by affective sounds, such as exploding bombs, and applause. These sounds had been rated and validated as being either positive, negative, or neutral. As physiological markers of ANS activity, we recorded skin conductance responses (SCRs) and changes of pupil size while participants classified both prosodic and sound stimuli according to their hedonic valence. As expected, affective sounds elicited increased arousal in the receiver, as reflected in increased SCR and pupil size. In contrast, SCRs to angry and joyful prosodic expressions did not differ from responses to neutral ones. Pupil size, however, was modulated by affective prosodic utterances, with increased dilations for angry and joyful compared to neutral prosody, although the similarity manipulation had no effect. These results indicate that cues provided by emotional prosody in spoken semantically neutral utterances might be too subtle to trigger SCR, although variation in pupil size indicated the salience of stimulus variation. Our findings further demonstrate a functional dissociation between pupil dilation and skin conductance that presumably origins from their differential innervation. PMID:29541045
Resting-state networks associated with cognitive processing show more age-related decline than those associated with emotional processing.

PubMed

Nashiro, Kaoru; Sakaki, Michiko; Braskie, Meredith N; Mather, Mara

2017-06-01

Correlations in activity across disparate brain regions during rest reveal functional networks in the brain. Although previous studies largely agree that there is an age-related decline in the "default mode network," how age affects other resting-state networks, such as emotion-related networks, is still controversial. Here we used a dual-regression approach to investigate age-related alterations in resting-state networks. The results revealed age-related disruptions in functional connectivity in all 5 identified cognitive networks, namely the default mode network, cognitive-auditory, cognitive-speech (or speech-related somatosensory), and right and left frontoparietal networks, whereas such age effects were not observed in the 3 identified emotion networks. In addition, we observed age-related decline in functional connectivity in 3 visual and 3 motor/visuospatial networks. Older adults showed greater functional connectivity in regions outside 4 out of the 5 identified cognitive networks, consistent with the dedifferentiation effect previously observed in task-based functional magnetic resonance imaging studies. Both reduced within-network connectivity and increased out-of-network connectivity were correlated with poor cognitive performance, providing potential biomarkers for cognitive aging. Copyright © 2017 Elsevier Inc. All rights reserved.
Doctors' voices in patients' narratives: coping with emotions in storytelling.

PubMed

Lucius-Hoene, Gabriele; Thiele, Ulrike; Breuning, Martina; Haug, Stephanie

2012-09-01

To understand doctors' impacts on the emotional coping of patients, their stories about encounters with doctors are used. These accounts reflect meaning-making processes and biographically contextualized experiences. We investigate how patients characterize their doctors by voicing them in their stories, thus assigning them functions in their coping process. 394 narrated scenes with reported speech of doctors were extracted from interviews with 26 patients with type 2 diabetes and 30 with chronic pain. Constructed speech acts were investigated by means of positioning and narrative analysis, and assigned into thematic categories by a bottom-up coding procedure. Patients use narratives as coping strategies when confronted with illness and their encounters with doctors by constructing them in a supportive and face-saving way. In correspondence with the variance of illness conditions, differing moral problems in dealing with doctors arise. Different evaluative stances towards the same events within interviews show that positionings are not fixed, but vary according to contexts and purposes. Our narrative approach deepens the standardized and predominantly cognitive statements of questionnaires in research on doctor-patient relations by individualized emotional and biographical aspects of patients' perspective. Doctors should be trained to become aware of their impact in patients' coping processes.
[Ontogeny-specific interaction of psychophysiological mechanisms of emotional perception and educational achievement in students].

PubMed

Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M

2003-01-01

In order to explore the process of adaptation of children to school environment psychophysiological characteristics of perception of emotional speech information and school progress were experimentally studied. Forty-six schoolchildren of three age groups (7-10, 11-13, and 14-17 years old) participated in the study. In experimental session, a test sentence was presented to a subject through headphones with two emotional intonations (joy and anger) and without emotional expression. A subject had to recognize the type of emotion. His/her answers were recorded. School progress was determined by year grades in Russian, foreign language, and mathematics. Analysis of variance and linear regression analysis showed that ontogenetic features of a correlation between psychophysiological mechanisms of emotion recognition and school progress were gender- and subject-dependent. This correlation was stronger in 7-13-year-old children than in senior children. This age boundary was passed by the girls earlier than by the boys.

A Coding System with Independent Annotations of Gesture Forms and Functions during Verbal Communication: Development of a Database of Speech and GEsture (DoSaGE)

PubMed Central

Kong, Anthony Pak-Hin; Law, Sam-Po; Kwan, Connie Ching-Yin; Lai, Christy; Lam, Vivian

2014-01-01

Gestures are commonly used together with spoken language in human communication. One major limitation of gesture investigations in the existing literature lies in the fact that the coding of forms and functions of gestures has not been clearly differentiated. This paper first described a recently developed Database of Speech and GEsture (DoSaGE) based on independent annotation of gesture forms and functions among 119 neurologically unimpaired right-handed native speakers of Cantonese (divided into three age and two education levels), and presented findings of an investigation examining how gesture use was related to age and linguistic performance. Consideration of these two factors, for which normative data are currently very limited or lacking in the literature, is relevant and necessary when one evaluates gesture employment among individuals with and without language impairment. Three speech tasks, including monologue of a personally important event, sequential description, and story-telling, were used for elicitation. The EUDICO Linguistic ANnotator (ELAN) software was used to independently annotate each participant’s linguistic information of the transcript, forms of gestures used, and the function for each gesture. About one-third of the subjects did not use any co-verbal gestures. While the majority of gestures were non-content-carrying, which functioned mainly for reinforcing speech intonation or controlling speech flow, the content-carrying ones were used to enhance speech content. Furthermore, individuals who are younger or linguistically more proficient tended to use fewer gestures, suggesting that normal speakers gesture differently as a function of age and linguistic performance. PMID:25667563
Social Anxiety, Affect, Cortisol Response and Performance on a Speech Task.

PubMed

Losiak, Wladyslaw; Blaut, Agata; Klosowska, Joanna; Slowik, Natalia

2016-01-01

Social anxiety is characterized by increased emotional reactivity to social stimuli, but results of studies focusing on affective reactions of socially anxious subjects in the situation of social exposition are inconclusive, especially in the case of endocrinological measures of affect. This study was designed to examine individual differences in endocrinological and affective reactions to social exposure as well as in performance on a speech task in a group of students (n = 44) comprising subjects with either high or low levels of social anxiety. Measures of salivary cortisol and positive and negative affect were taken before and after an impromptu speech. Self-ratings and observer ratings of performance were also obtained. Cortisol levels and negative affect increased in both groups after the speech task, and positive affect decreased; however, group × affect interactions were not significant. Assessments conducted after the speech task revealed that highly socially anxious participants had lower observer ratings of performance while cortisol increase and changes in self-reported affect were not related to performance. Socially anxious individuals do not differ from nonanxious individuals in affective reactions to social exposition, but reveal worse performance at a speech task. © 2015 S. Karger AG, Basel.
Self-awareness deficits following loss of inner speech: Dr. Jill Bolte Taylor's case study.

PubMed

Morin, Alain

2009-06-01

In her 2006 book "My Stroke of Insight" Dr. Jill Bolte Taylor relates her experience of suffering from a left hemispheric stroke caused by a congenital arteriovenous malformation which led to a loss of inner speech. Her phenomenological account strongly suggests that this impairment produced a global self-awareness deficit as well as more specific dysfunctions related to corporeal awareness, sense of individuality, retrieval of autobiographical memories, and self-conscious emotions. These are examined in details and corroborated by numerous excerpts from Taylor's book.
Preliminary Support for a Generalized Arousal Model of Political Conservatism

PubMed Central

Tritt, Shona M.; Inzlicht, Michael; Peterson, Jordan B.

2013-01-01

It is widely held that negative emotions such as threat, anxiety, and disgust represent the core psychological factors that enhance conservative political beliefs. We put forward an alternative hypothesis: that conservatism is fundamentally motivated by arousal, and that, in this context, the effect of negative emotion is due to engaging intensely arousing states. Here we show that study participants agreed more with right but not left-wing political speeches after being exposed to positive as well as negative emotion-inducing film-clips. No such effect emerged for neutral-content videos. A follow-up study replicated and extended this effect. These results are consistent with the idea that emotional arousal, in general, and not negative valence, specifically, may underlie political conservatism. PMID:24376687
Preliminary support for a generalized arousal model of political conservatism.

PubMed

Tritt, Shona M; Inzlicht, Michael; Peterson, Jordan B

2013-01-01

It is widely held that negative emotions such as threat, anxiety, and disgust represent the core psychological factors that enhance conservative political beliefs. We put forward an alternative hypothesis: that conservatism is fundamentally motivated by arousal, and that, in this context, the effect of negative emotion is due to engaging intensely arousing states. Here we show that study participants agreed more with right but not left-wing political speeches after being exposed to positive as well as negative emotion-inducing film-clips. No such effect emerged for neutral-content videos. A follow-up study replicated and extended this effect. These results are consistent with the idea that emotional arousal, in general, and not negative valence, specifically, may underlie political conservatism.
Reviewing the connection between speech and obstructive sleep apnea.

PubMed

Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A

2016-02-20

Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea-hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients' condition. We first evaluate AHI prediction using state-of-the-art speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.
The Emotional Movie Database (EMDB): a self-report and psychophysiological study.

PubMed

Carvalho, Sandra; Leite, Jorge; Galdo-Álvarez, Santiago; Gonçalves, Oscar F

2012-12-01

Film clips are an important tool for evoking emotional responses in the laboratory. When compared with other emotionally potent visual stimuli (e.g., pictures), film clips seem to be more effective in eliciting emotions for longer periods of time at both the subjective and physiological levels. The main objective of the present study was to develop a new database of affective film clips without auditory content, based on a dimensional approach to emotional stimuli (valence, arousal and dominance). The study had three different phases: (1) the pre-selection and editing of 52 film clips (2) the self-report rating of these film clips by a sample of 113 participants and (3) psychophysiological assessment [skin conductance level (SCL) and the heart rate (HR)] on 32 volunteers. Film clips from different categories were selected to elicit emotional states from different quadrants of affective space. The results also showed that sustained exposure to the affective film clips resulted in a pattern of a SCL increase and HR deceleration in high arousal conditions (i.e., horror and erotic conditions). The resulting emotional movie database can reliably be used in research requiring the presentation of non-auditory film clips with different ratings of valence, arousal and dominance.
Cognitive processing specificity of anxious apprehension: impact on distress and performance during speech exposure.

PubMed

Philippot, Pierre; Vrielynck, Nathalie; Muller, Valérie

2010-12-01

The present study examined the impact of different modes of processing anxious apprehension on subsequent anxiety and performance in a stressful speech task. Participants were informed that they would have to give a speech on a difficult topic while being videotaped and evaluated on their performance. They were then randomly assigned to one of three conditions. In a specific processing condition, they were encouraged to explore in detail all the specific aspects (thoughts, emotions, sensations) they experienced while anticipating giving the speech; in a general processing condition, they had to focus on the generic aspects that they would typically experience during anxious anticipation; and in a control, no-processing condition, participants were distracted. Results revealed that at the end of the speech, participants in the specific processing condition reported less anxiety than those in the two other conditions. They were also evaluated by judges to have performed better than those in the control condition, who in turn did better than those in the general processing condition. Copyright © 2010. Published by Elsevier Ltd.
Musical melody and speech intonation: singing a different tune.

PubMed

Zatorre, Robert J; Baum, Shari R

2012-01-01

Music and speech are often cited as characteristically human forms of communication. Both share the features of hierarchical structure, complex sound systems, and sensorimotor sequencing demands, and both are used to convey and influence emotions, among other functions [1]. Both music and speech also prominently use acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, and the fact that pitch perception and production involve the same peripheral transduction system (cochlea) and the same production mechanism (vocal tract), it might be natural to assume that pitch processing in speech and music would also depend on the same underlying cognitive and neural mechanisms. In this essay we argue that the processing of pitch information differs significantly for speech and music; specifically, we suggest that there are two pitch-related processing systems, one for more coarse-grained, approximate analysis and one for more fine-grained accurate representation, and that the latter is unique to music. More broadly, this dissociation offers clues about the interface between sensory and motor systems, and highlights the idea that multiple processing streams are a ubiquitous feature of neuro-cognitive architectures.
Situational influences on rhythmicity in speech, music, and their interaction

PubMed Central

Hawkins, Sarah

2014-01-01

Brain processes underlying the production and perception of rhythm indicate considerable flexibility in how physical signals are interpreted. This paper explores how that flexibility might play out in rhythmicity in speech and music. There is much in common across the two domains, but there are also significant differences. Interpretations are explored that reconcile some of the differences, particularly with respect to how functional properties modify the rhythmicity of speech, within limits imposed by its structural constraints. Functional and structural differences mean that music is typically more rhythmic than speech, and that speech will be more rhythmic when the emotions are more strongly engaged, or intended to be engaged. The influence of rhythmicity on attention is acknowledged, and it is suggested that local increases in rhythmicity occur at times when attention is required to coordinate joint action, whether in talking or music-making. Evidence is presented which suggests that while these short phases of heightened rhythmical behaviour are crucial to the success of transitions in communicative interaction, their modality is immaterial: they all function to enhance precise temporal prediction and hence tightly coordinated joint action. PMID:25385776
Effects of emotional and perceptual-motor stress on a voice recognition system's accuracy: An applied investigation

NASA Astrophysics Data System (ADS)

Poock, G. K.; Martin, B. J.

1984-02-01

This was an applied investigation examining the ability of a speech recognition system to recognize speakers' inputs when the speakers were under different stress levels. Subjects were asked to speak to a voice recognition system under three conditions: (1) normal office environment, (2) emotional stress, and (3) perceptual-motor stress. Results indicate a definite relationship between voice recognition system performance and the type of low stress reference patterns used to achieve recognition.
How our own speech rate influences our perception of others.

PubMed

Bosker, Hans Rutger

2017-08-01

In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through 6 experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing prerecorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Evidence for cultural dialects in vocal emotion expression: acoustic classification within and across five nations.

PubMed

Laukka, Petri; Neiberg, Daniel; Elfenbein, Hillary Anger

2014-06-01

The possibility of cultural differences in the fundamental acoustic patterns used to express emotion through the voice is an unanswered question central to the larger debate about the universality versus cultural specificity of emotion. This study used emotionally inflected standard-content speech segments expressing 11 emotions produced by 100 professional actors from 5 English-speaking cultures. Machine learning simulations were employed to classify expressions based on their acoustic features, using conditions where training and testing were conducted on stimuli coming from either the same or different cultures. A wide range of emotions were classified with above-chance accuracy in cross-cultural conditions, suggesting vocal expressions share important characteristics across cultures. However, classification showed an in-group advantage with higher accuracy in within- versus cross-cultural conditions. This finding demonstrates cultural differences in expressive vocal style, and supports the dialect theory of emotions according to which greater recognition of expressions from in-group members results from greater familiarity with culturally specific expressive styles.
Transitioning from analog to digital audio recording in childhood speech sound disorders.

PubMed

Shriberg, Lawrence D; McSweeny, Jane L; Anderson, Bruce E; Campbell, Thomas F; Chial, Michael R; Green, Jordan R; Hauner, Katherina K; Moore, Christopher A; Rusiewicz, Heather L; Wilson, David L

2005-06-01

Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants' speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise.
Transitioning from analog to digital audio recording in childhood speech sound disorders

PubMed Central

Shriberg, Lawrence D.; McSweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.

2014-01-01

Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants’ speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise. PMID:16019779
Hearing and Speech Sciences in Educational Environment Mapping in Brazil: education, work and professional experience.

PubMed

Celeste, Letícia Corrêa; Zanoni, Graziela; Queiroga, Bianca; Alves, Luciana Mendonça

2017-03-09

To map the profile of Brazilian Speech Therapists who report acting in Educational Speech Therapy, with regard to aspects related to training, performance and professional experience. Retrospective study, based on secondary database analysis of the Federal Council of Hearing and Speech Sciences on the questionnaires reporting acting with Educational Environment. 312 questionnaires were completed, of which 93.3% by women aged 30-39 years. Most Speech Therapists continued the studies, opting mostly for specialization. Almost 50% of respondents, have worked for less than six years with the speciality, most significantly in the public service (especially municipal) and private area. The profile of the Speech Therapists active in the Educational area in Brazil is a professional predominantly female, who values to continue their studies after graduation, looking mostly for specialization in the following areas: Audiology and Orofacial Motor. The time experience of the majority is up to 10 years of work whose nature is divided mainly in public (municipal) and private schools. The performance of Speech Therapists in the Educational area concentrates in Elementary and Primary school, with varied workload.
Parental Reactions to Cleft Palate Children.

ERIC Educational Resources Information Center

Vanpoelvoorde, Leah; Shaughnessy, Michael F.

1991-01-01

This paper reviews parents' emotional reactions following the birth of a cleft lip/palate child. It examines when parents were told of the deformity and discusses the duties of the speech-language pathologist and the psychologist in counseling the parents and the child. (Author/JDD)
Books Can Break Attitudinal Barriers Toward the Handicapped.

ERIC Educational Resources Information Center

Bauer, Carolyn J.

1985-01-01

Lists books dealing with the more prevalent handicaps of mainstreamed children: visual handicaps, speech handicaps, emotional disturbances, learning disabilities, auditory handicaps, intellectual impairments, and orthopedic handicaps. Recommends books for use from preschool to level three to expose children early and influence their attitudes…
Affective Aprosodia from a Medial Frontal Stroke

ERIC Educational Resources Information Center

Heilman, Kenneth M.; Leon, Susan A.; Rosenbek, John C.

2004-01-01

Background and objectives: Whereas injury to the left hemisphere induces aphasia, injury to the right hemisphere's perisylvian region induces an impairment of emotional speech prosody (affective aprosodia). Left-sided medial frontal lesions are associated with reduced verbal fluency with relatively intact comprehension and repetition…
Emotional recognition of dynamic facial expressions before and after cochlear implantation in adults with progressive deafness.

PubMed

Ambert-Dahan, Emmanuèle; Giraud, Anne-Lise; Mecheri, Halima; Sterkers, Olivier; Mosnier, Isabelle; Samson, Séverine

2017-10-01

Visual processing has been extensively explored in deaf subjects in the context of verbal communication, through the assessment of speech reading and sign language abilities. However, little is known about visual emotional processing in adult progressive deafness, and after cochlear implantation. The goal of our study was thus to assess the influence of acquired post-lingual progressive deafness on the recognition of dynamic facial emotions that were selected to express canonical fear, happiness, sadness, and anger. A total of 23 adults with post-lingual deafness separated into two groups; those assessed either before (n = 10) and those assessed after (n = 13) cochlear implantation (CI); and 13 normal hearing (NH) individuals participated in the current study. Participants were asked to rate the expression of the four cardinal emotions, and to evaluate both their emotional valence (unpleasant-pleasant) and arousal potential (relaxing-stimulating). We found that patients with deafness were impaired in the recognition of sad faces, and that patients equipped with a CI were additionally impaired in the recognition of happiness and fear (but not anger). Relative to controls, all patients with deafness showed a deficit in perceiving arousal expressed in faces, while valence ratings remained unaffected. The current results show for the first time that acquired and progressive deafness is associated with a reduction of emotional sensitivity to visual stimuli. This negative impact of progressive deafness on the perception of dynamic facial cues for emotion recognition contrasts with the proficiency of deaf subjects with and without CIs in processing visual speech cues (Rouger et al., 2007; Strelnikov et al., 2009; Lazard and Giraud, 2017). Altogether these results suggest there to be a trade-off between the processing of linguistic and non-linguistic visual stimuli. Copyright © 2017. Published by Elsevier B.V.

Telephone-quality pathological speech classification using empirical mode decomposition.

PubMed

Kaleem, M F; Ghoraani, B; Guergachi, A; Krishnan, S

2011-01-01

This paper presents a computationally simple and effective methodology based on empirical mode decomposition (EMD) for classification of telephone quality normal and pathological speech signals. EMD is used to decompose continuous normal and pathological speech signals into intrinsic mode functions, which are analyzed to extract physically meaningful and unique temporal and spectral features. Using continuous speech samples from a database of 51 normal and 161 pathological speakers, which has been modified to simulate telephone quality speech under different levels of noise, a linear classifier is used with the feature vector thus obtained to obtain a high classification accuracy, thereby demonstrating the effectiveness of the methodology. The classification accuracy reported in this paper (89.7% for signal-to-noise ratio 30 dB) is a significant improvement over previously reported results for the same task, and demonstrates the utility of our methodology for cost-effective remote voice pathology assessment over telephone channels.
V2S: Voice to Sign Language Translation System for Malaysian Deaf People

NASA Astrophysics Data System (ADS)

Mean Foong, Oi; Low, Tang Jung; La, Wai Wan

The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.
Perceptual learning of speech under optimal and adverse conditions.

PubMed

Zhang, Xujin; Samuel, Arthur G

2014-02-01

Humans have a remarkable ability to understand spoken language despite the large amount of variability in speech. Previous research has shown that listeners can use lexical information to guide their interpretation of atypical sounds in speech (Norris, McQueen, & Cutler, 2003). This kind of lexically induced perceptual learning enables people to adjust to the variations in utterances due to talker-specific characteristics, such as individual identity and dialect. The current study investigated perceptual learning in two optimal conditions: conversational speech (Experiment 1) versus clear speech (Experiment 2), and three adverse conditions: noise (Experiment 3a) versus two cognitive loads (Experiments 4a and 4b). Perceptual learning occurred in the two optimal conditions and in the two cognitive load conditions, but not in the noise condition. Furthermore, perceptual learning occurred only in the first of two sessions for each participant, and only for atypical /s/ sounds and not for atypical /f/ sounds. This pattern of learning and nonlearning reflects a balance between flexibility and stability that the speech system must have to deal with speech variability in the diverse conditions that speech is encountered. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Learning from human cadaveric prosections: Examining anxiety in speech therapy students.

PubMed

Criado-Álvarez, Juan Jose; González González, Jaime; Romo Barrientos, Carmen; Ubeda-Bañon, Isabel; Saiz-Sanchez, Daniel; Flores-Cuadrado, Alicia; Albertos-Marco, Juan Carlos; Martinez-Marcos, Alino; Mohedano-Moriano, Alicia

2017-09-01

Human anatomy education often utilizes the essential practices of cadaver dissection and examination of prosected specimens. However, these exposures to human cadavers and confronting death can be stressful and anxiety-inducing for students. This study aims to understand the attitudes, reactions, fears, and states of anxiety that speech therapy students experience in the dissection room. To that end, a before-and-after cross-sectional analysis was conducted with speech therapy students undertaking a dissection course for the first time. An anonymous questionnaire was administered before and after the exercise to understand students' feelings and emotions. State-Trait Anxiety Inventory questionnaires (STAI-S and STAI-T) were used to evaluate anxiety levels. The results of the study revealed that baseline anxiety levels measured using the STAI-T remained stable and unchanged during the dissection room experience (P > 0.05). Levels of emotional anxiety measured using the STAI-S decreased, from 15.3 to 11.1 points (P < 0.05). In the initial phase of the study, before any contact with the dissection room environment, 17% of students experienced anxiety, and this rate remained unchanged by end of the session (P > 0.05). A total of 63.4% of students described having thoughts about life and death. After the session, 100% of students recommended the dissection exercise, giving it a mean score of 9.1/10 points. Anatomy is an important subject for students in the health sciences, and dissection and prosection exercises frequently involve a series of uncomfortable and stressful experiences. Experiences in the dissection room may challenge some students' emotional equilibria. However, students consider the exercise to be very useful in their education and recommend it. Anat Sci Educ 10: 487-494. © 2017 American Association of Anatomists. © 2017 American Association of Anatomists.
Emotion Recognition of Weblog Sentences Based on an Ensemble Algorithm of Multi-label Classification and Word Emotions

NASA Astrophysics Data System (ADS)

Li, Ji; Ren, Fuji

Weblogs have greatly changed the communication ways of mankind. Affective analysis of blog posts is found valuable for many applications such as text-to-speech synthesis or computer-assisted recommendation. Traditional emotion recognition in text based on single-label classification can not satisfy higher requirements of affective computing. In this paper, the automatic identification of sentence emotion in weblogs is modeled as a multi-label text categorization task. Experiments are carried out on 12273 blog sentences from the Chinese emotion corpus Ren_CECps with 8-dimension emotion annotation. An ensemble algorithm RAKEL is used to recognize dominant emotions from the writer's perspective. Our emotion feature using detailed intensity representation for word emotions outperforms the other main features such as the word frequency feature and the traditional lexicon-based feature. In order to deal with relatively complex sentences, we integrate grammatical characteristics of punctuations, disjunctive connectives, modification relations and negation into features. It achieves 13.51% and 12.49% increases for Micro-averaged F1 and Macro-averaged F1 respectively compared to the traditional lexicon-based feature. Result shows that multiple-dimension emotion representation with grammatical features can efficiently classify sentence emotion in a multi-label problem.
Open-Source Multi-Language Audio Database for Spoken Language Processing Applications

DTIC Science & Technology

2012-12-01

Mandarin, and Russian . Approximately 30 hours of speech were collected for each language. Each passage has been carefully transcribed at the...manual and automatic methods. The Russian passages have not yet been marked at the phonetic level. Another phase of the work was to explore...You Tube. 300 passages were collected in each of three languages—English, Mandarin, and Russian . Approximately 30 hours of speech were
Robust fundamental frequency estimation in sustained vowels: Detailed algorithmic comparisons and information fusion with adaptive Kalman filtering

PubMed Central

Tsanas, Athanasios; Zañartu, Matías; Little, Max A.; Fox, Cynthia; Ramig, Lorraine O.; Clifford, Gari D.

2014-01-01

There has been consistent interest among speech signal processing researchers in the accurate estimation of the fundamental frequency (F0) of speech signals. This study examines ten F0 estimation algorithms (some well-established and some proposed more recently) to determine which of these algorithms is, on average, better able to estimate F0 in the sustained vowel /a/. Moreover, a robust method for adaptively weighting the estimates of individual F0 estimation algorithms based on quality and performance measures is proposed, using an adaptive Kalman filter (KF) framework. The accuracy of the algorithms is validated using (a) a database of 117 synthetic realistic phonations obtained using a sophisticated physiological model of speech production and (b) a database of 65 recordings of human phonations where the glottal cycles are calculated from electroglottograph signals. On average, the sawtooth waveform inspired pitch estimator and the nearly defect-free algorithms provided the best individual F0 estimates, and the proposed KF approach resulted in a ∼16% improvement in accuracy over the best single F0 estimation algorithm. These findings may be useful in speech signal processing applications where sustained vowels are used to assess vocal quality, when very accurate F0 estimation is required. PMID:24815269
Library Instruction in Communication Disorders: Which Databases Should Be Prioritized?

ERIC Educational Resources Information Center

Grabowsky, Adelia

2015-01-01

The field of communication disorders encompasses the health science disciplines of both speech-language pathology and audiology. Pertinent literature for communication disorders can be found in a number of databases. Librarians providing information literacy instruction may not have the time to cover more than a few resources. This study develops…
Are precues effective in proactively controlling taboo interference during speech production?

PubMed

White, Katherine K; Abrams, Lise; Hsi, Lisa R; Watkins, Emily C

2018-02-07

This research investigated whether precues engage proactive control to reduce emotional interference during speech production. A picture-word interference task required participants to name target pictures accompanied by taboo, negative, or neutral distractors. Proactive control was manipulated by presenting precues that signalled the type of distractor that would appear on the next trial. Experiment 1 included one block of trials with precues and one without, whereas Experiment 2 mixed precued and uncued trials. Consistent with previous research, picture naming was slowed in both experiments when distractors were taboo or negative compared to neutral, with the greatest slowing effect when distractors were taboo. Evidence that precues engaged proactive control to reduce interference from taboo (but not negative) distractors was found in Experiment 1. In contrast, mixing precued trials in Experiment 2 resulted in no taboo cueing benefit. These results suggest that item-level proactive control can be engaged under certain conditions to reduce taboo interference during speech production, findings that help to refine a role for cognitive control of distraction during speech production.
Playing Music for a Smarter Ear: Cognitive, Perceptual and Neurobiological Evidence

PubMed Central

Strait, Dana; Kraus, Nina

2012-01-01

Human hearing depends on a combination of cognitive and sensory processes that function by means of an interactive circuitry of bottom-up and top-down neural pathways, extending from the cochlea to the cortex and back again. Given that similar neural pathways are recruited to process sounds related to both music and language, it is not surprising that the auditory expertise gained over years of consistent music practice fine-tunes the human auditory system in a comprehensive fashion, strengthening neurobiological and cognitive underpinnings of both music and speech processing. In this review we argue not only that common neural mechanisms for speech and music exist, but that experience in music leads to enhancements in sensory and cognitive contributors to speech processing. Of specific interest is the potential for music training to bolster neural mechanisms that undergird language-related skills, such as reading and hearing speech in background noise, which are critical to academic progress, emotional health, and vocational success. PMID:22993456
Self-Reflection and the Inner Voice: Activation of the Left Inferior Frontal Gyrus During Perceptual and Conceptual Self-Referential Thinking

PubMed Central

Morin, Alain; Hamper, Breanne

2012-01-01

Inner speech involvement in self-reflection was examined by reviewing 130 studies assessing brain activation during self-referential processing in key self-domains: agency, self-recognition, emotions, personality traits, autobiographical memory, and miscellaneous (e.g., prospection, judgments). The left inferior frontal gyrus (LIFG) has been shown to be reliably recruited during inner speech production. The percentage of studies reporting LIFG activity for each self-dimension was calculated. Fifty five percent of all studies reviewed indicated LIFG (and presumably inner speech) activity during self-reflection tasks; on average LIFG activation is observed 16% of the time during completion of non-self tasks (e.g., attention, perception). The highest LIFG activation rate was observed during retrieval of autobiographical information. The LIFG was significantly more recruited during conceptual tasks (e.g., prospection, traits) than during perceptual tasks (agency and self-recognition). This constitutes additional evidence supporting the idea of a participation of inner speech in self-related thinking. PMID:23049653
Self-reflection and the inner voice: activation of the left inferior frontal gyrus during perceptual and conceptual self-referential thinking.

PubMed

Morin, Alain; Hamper, Breanne

2012-01-01

Inner speech involvement in self-reflection was examined by reviewing 130 studies assessing brain activation during self-referential processing in key self-domains: agency, self-recognition, emotions, personality traits, autobiographical memory, and miscellaneous (e.g., prospection, judgments). The left inferior frontal gyrus (LIFG) has been shown to be reliably recruited during inner speech production. The percentage of studies reporting LIFG activity for each self-dimension was calculated. Fifty five percent of all studies reviewed indicated LIFG (and presumably inner speech) activity during self-reflection tasks; on average LIFG activation is observed 16% of the time during completion of non-self tasks (e.g., attention, perception). The highest LIFG activation rate was observed during retrieval of autobiographical information. The LIFG was significantly more recruited during conceptual tasks (e.g., prospection, traits) than during perceptual tasks (agency and self-recognition). This constitutes additional evidence supporting the idea of a participation of inner speech in self-related thinking.
Evidence, Goals, and Outcomes in Stuttering Treatment: Applications With an Adolescent Who Stutters.

PubMed

Marcotte, Anne K

2018-01-09

The purpose of this clinical focus article is to summarize 1 possible process that a clinician might follow in designing and conducting a treatment program with John, a 14-year-old male individual who stutters. The available research evidence, practitioner experience, and consideration of individual preferences are combined to address goals, treatment procedures, and outcomes for John. The stuttering treatment research literature includes multiple well-designed reviews and individual studies that have shown the effectiveness of prolonged speech (and smooth speech and related variations) for improving stuttered speech and for improving social, emotional, cognitive, and related variables in adolescents who stutter. Based on that evidence, and incorporating the additional elements of practitioner experience and client preferences, this clinical focus article suggests that John would be likely to benefit from a treatment program based on prolonged speech. The basic structure of 1 possible such program is also described, with an emphasis on the goals and outcomes that John could be expected to achieve.
Relations Between Self-reported Executive Functioning and Speech Perception Skills in Adult Cochlear Implant Users.

PubMed

Moberly, Aaron C; Patel, Tirth R; Castellanos, Irina

2018-02-01

As a result of their hearing loss, adults with cochlear implants (CIs) would self-report poorer executive functioning (EF) skills than normal-hearing (NH) peers, and these EF skills would be associated with performance on speech recognition tasks. EF refers to a group of high order neurocognitive skills responsible for behavioral and emotional regulation during goal-directed activity, and EF has been found to be poorer in children with CIs than their NH age-matched peers. Moreover, there is increasing evidence that neurocognitive skills, including some EF skills, contribute to the ability to recognize speech through a CI. Thirty postlingually deafened adults with CIs and 42 age-matched NH adults were enrolled. Participants and their spouses or significant others (informants) completed well-validated self-reports or informant-reports of EF, the Behavior Rating Inventory of Executive Function - Adult (BRIEF-A). CI users' speech recognition skills were assessed in quiet using several measures of sentence recognition. NH peers were tested for recognition of noise-vocoded versions of the same speech stimuli. CI users self-reported difficulty on EF tasks of shifting and task monitoring. In CI users, measures of speech recognition correlated with several self-reported EF skills. The present findings provide further evidence that neurocognitive factors, including specific EF skills, may decline in association with hearing loss, and that some of these EF skills contribute to speech processing under degraded listening conditions.
BP reactivity to public speaking in stage 1 hypertension: influence of different task scenarios.

PubMed

Palatini, Paolo; Bratti, Paolo; Palomba, Daniela; Bonso, Elisa; Saladini, Francesca; Benetti, Elisabetta; Casiglia, Edoardo

2011-10-01

To investigate the blood pressure (BP) reaction to public speaking performed according to different emotionally distressing scenarios in stage 1 hypertension. METHODS. We assessed 64 hypertensive and 30 normotensive subjects. They performed three speech tasks with neutral, anger and anxiety scenarios. BP was assessed with the Finometer beat-to-beat non-invasive recording system throughout the test procedure. For all types of speech, the systolic BP response was greater in the hypertensive than the normotensive subjects (all p < 0.001). At repeated-measures analysis of covariate (R-M ANCOVA), a significant group-by-time interaction was found for all scenarios (p ≤ 0.001). For the diastolic BP response, the between-group difference was significant for the task with anxiety scenario (p < 0.05). At R-M ANCOVA, a group-by-time interaction of borderline statistical significance was found for the speech with anxiety content (p = 0.053) but not for the speeches with neutral or anger content. Within the hypertensive group, the diastolic BP increments during the speeches with anxiety and anger scenarios were greater than those during the speech with neutral scenario (both p < 0.001). These data indicate that reactivity to public speaking is increased in stage 1 hypertension. A speech with anxiety or anger scenario elicits a greater diastolic BP reaction than tasks with neutral content.
Mapping and Manipulating Facial Expression

ERIC Educational Resources Information Center

Theobald, Barry-John; Matthews, Iain; Mangini, Michael; Spies, Jeffrey R.; Brick, Timothy R.; Cohn, Jeffrey F.; Boker, Steven M.

2009-01-01

Nonverbal visual cues accompany speech to supplement the meaning of spoken words, signify emotional state, indicate position in discourse, and provide back-channel feedback. This visual information includes head movements, facial expressions and body gestures. In this article we describe techniques for manipulating both verbal and nonverbal facial…
Exceptional Pupils. Special Education Bulletin Number 1.

ERIC Educational Resources Information Center

Indiana State Dept. of Public Instruction, Indianapolis. Div. of Special Education.

An introduction to exceptional children precedes a discussion of each of the following areas of exceptionality; giftedness, mental retardation, physical handicaps and special health problems, blindness and partial vision, aural handicaps, speech handicaps, emotional disturbance, and learning disabilities. Each chapter is followed by a bibliography…
PRISE Reporter. Volume 12, 1980-81.

ERIC Educational Resources Information Center

PRISE Reporter, 1981

1981-01-01

The document consists of six issues of the "PRISE (Pennsylvania Resources and Information Center for Special Education) Reporter" which cover issues and happenings in the education of the mentally retarded, learning disabled, emotionally disturbed, physically handicapped, visually handicapped, and speech/hearing impaired. Lead articles include the…
Prosody production networks are modulated by sensory cues and social context.

PubMed

Klasen, Martin; von Marschall, Clara; Isman, Güldehen; Zvyagintsev, Mikhail; Gur, Ruben C; Mathiak, Klaus

2018-03-05

The neurobiology of emotional prosody production is not well investigated. In particular, the effects of cues and social context are not known. The present study sought to differentiate cued from free emotion generation and the effect of social feedback from a human listener. Online speech filtering enabled fMRI during prosodic communication in 30 participants. Emotional vocalizations were a) free, b) auditorily cued, c) visually cued, or d) with interactive feedback. In addition to distributed language networks, cued emotions increased activity in auditory and - in case of visual stimuli - visual cortex. Responses were larger in pSTG at the right hemisphere and the ventral striatum when participants were listened to and received feedback from the experimenter. Sensory, language, and reward networks contributed to prosody production and were modulated by cues and social context. The right pSTG is a central hub for communication in social interactions - in particular for interpersonal evaluation of vocal emotions.
Perceiving emotion: towards a realistic understanding of the task.

PubMed

Cowie, Roddy

2009-12-12

A decade ago, perceiving emotion was generally equated with taking a sample (a still photograph or a few seconds of speech) that unquestionably signified an archetypal emotional state, and attaching the appropriate label. Computational research has shifted that paradigm in multiple ways. Concern with realism is key. Emotion generally colours ongoing action and interaction: describing that colouring is a different problem from categorizing brief episodes of relatively pure emotion. Multiple challenges flow from that. Describing emotional colouring is a challenge in itself. One approach is to use everyday categories describing states that are partly emotional and partly cognitive. Another approach is to use dimensions. Both approaches need ways to deal with gradual changes over time and mixed emotions. Attaching target descriptions to a sample poses problems of both procedure and validation. Cues are likely to be distributed both in time and across modalities, and key decisions may depend heavily on context. The usefulness of acted data is limited because it tends not to reproduce these features. By engaging with these challenging issues, research is not only achieving impressive results, but also offering a much deeper understanding of the problem.

Studies in automatic speech recognition and its application in aerospace

NASA Astrophysics Data System (ADS)

Taylor, Michael Robinson

Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.
Differential effects of speech situations on mothers' and fathers' infant-directed and dog-directed speech: An acoustic analysis.

PubMed

Gergely, Anna; Faragó, Tamás; Galambos, Ágoston; Topál, József

2017-10-23

There is growing evidence that dog-directed and infant-directed speech have similar acoustic characteristics, like high overall pitch, wide pitch range, and attention-getting devices. However, it is still unclear whether dog- and infant-directed speech have gender or context-dependent acoustic features. In the present study, we collected comparable infant-, dog-, and adult directed speech samples (IDS, DDS, and ADS) in four different speech situations (Storytelling, Task solving, Teaching, and Fixed sentences situations); we obtained the samples from parents whose infants were younger than 30 months of age and also had pet dog at home. We found that ADS was different from IDS and DDS, independently of the speakers' gender and the given situation. Higher overall pitch in DDS than in IDS during free situations was also found. Our results show that both parents hyperarticulate their vowels when talking to children but not when addressing dogs: this result is consistent with the goal of hyperspeech in language tutoring. Mothers, however, exaggerate their vowels for their infants under 18 months more than fathers do. Our findings suggest that IDS and DDS have context-dependent features and support the notion that people adapt their prosodic features to the acoustic preferences and emotional needs of their audience.
Cepstral domain modification of audio signals for data embedding: preliminary results

NASA Astrophysics Data System (ADS)

Gopalan, Kaliappan

2004-06-01

A method of embedding data in an audio signal using cepstral domain modification is described. Based on successful embedding in the spectral points of perceptually masked regions in each frame of speech, first the technique was extended to embedding in the log spectral domain. This extension resulted at approximately 62 bits /s of embedding with less than 2 percent of bit error rate (BER) for a clean cover speech (from the TIMIT database), and about 2.5 percent for a noisy speech (from an air traffic controller database), when all frames - including silence and transition between voiced and unvoiced segments - were used. Bit error rate increased significantly when the log spectrum in the vicinity of a formant was modified. In the next procedure, embedding by altering the mean cepstral values of two ranges of indices was studied. Tests on both a noisy utterance and a clean utterance indicated barely noticeable perceptual change in speech quality when lower range of cepstral indices - corresponding to vocal tract region - was modified in accordance with data. With an embedding capacity of approximately 62 bits/s - using one bit per each frame regardless of frame energy or type of speech - initial results showed a BER of less than 1.5 percent for a payload capacity of 208 embedded bits using the clean cover speech. BER of less than 1.3 percent resulted for the noisy host with a capacity was 316 bits. When the cepstrum was modified in the region of excitation, BER increased to over 10 percent. With quantization causing no significant problem, the technique warrants further studies with different cepstral ranges and sizes. Pitch-synchronous cepstrum modification, for example, may be more robust to attacks. In addition, cepstrum modification in regions of speech that are perceptually masked - analogous to embedding in frequency masked regions - may yield imperceptible stego audio with low BER.
The effect of simultaneous text on the recall of noise-degraded speech.

PubMed

Grossman, Irina; Rajan, Ramesh

2017-05-01

Written and spoken language utilize the same processing system, enabling text to modulate speech processing. We investigated how simultaneously presented text affected speech recall in babble noise using a retrospective recall task. Participants were presented with text-speech sentence pairs in multitalker babble noise and then prompted to recall what they heard or what they read. In Experiment 1, sentence pairs were either congruent or incongruent and they were presented in silence or at 1 of 4 noise levels. Audio and Visual control groups were also tested with sentences presented in only 1 modality. Congruent text facilitated accurate recall of degraded speech; incongruent text had no effect. Text and speech were seldom confused for each other. A consideration of the effects of the language background found that monolingual English speakers outperformed early multilinguals at recalling degraded speech; however the effects of text on speech processing were analogous. Experiment 2 considered if the benefit provided by matching text was maintained when the congruency of the text and speech becomes more ambiguous because of the addition of partially mismatching text-speech sentence pairs that differed only on their final keyword and because of the use of low signal-to-noise ratios. The experiment focused on monolingual English speakers; the results showed that even though participants commonly confused text-for-speech during incongruent text-speech pairings, these confusions could not fully account for the benefit provided by matching text. Thus, we uniquely demonstrate that congruent text benefits the recall of noise-degraded speech. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Comparing Visible and Invisible Social Support: Non-evaluative Support Buffers Cardiovascular Responses to Stress.

PubMed

Kirsch, Julie A; Lehman, Barbara J

2015-12-01

Previous research suggests that in contrast to invisible social support, visible social support produces exaggerated negative emotional responses. Drawing on work by Bolger and colleagues, this study disentangled social support visibility from negative social evaluation in an examination of the effects of social support on negative emotions and cardiovascular responses. As part of an anticipatory speech task, 73 female participants were randomly assigned to receive no social support, invisible social support, non-confounded visible social support or visible social support as delivered in a 2007 study by Bolger and Amarel. Twelve readings, each for systolic blood pressure, diastolic blood pressure and heart rate were taken at 5-min intervals throughout the periods of baseline, reactivity and recovery. Cardiovascular outcomes were tested by incorporating a series of theoretically driven planned contrasts into tests of stress reactivity conducted through piecewise growth curve modelling. Linear and quadratic trends established cardiovascular reactivity to the task. Further, in comparison to the control and replication conditions, the non-confounded visible and invisible social support conditions attenuated cardiovascular reactivity over time. Pre- and post-speech negative emotional responses were not affected by the social support manipulations. These results suggest that appropriately delivered visible social support may be as beneficial as invisible social support. Copyright © 2014 John Wiley & Sons, Ltd.
(abstract) Synthesis of Speaker Facial Movements to Match Selected Speech Sequences

NASA Technical Reports Server (NTRS)

Scott, Kenneth C.

1994-01-01

We are developing a system for synthesizing image sequences the simulate the facial motion of a speaker. To perform this synthesis, we are pursuing two major areas of effort. We are developing the necessary computer graphics technology to synthesize a realistic image sequence of a person speaking selected speech sequences. Next, we are developing a model that expresses the relation between spoken phonemes and face/mouth shape. A subject is video taped speaking an arbitrary text that contains expression of the full list of desired database phonemes. The subject is video taped from the front speaking normally, recording both audio and video detail simultaneously. Using the audio track, we identify the specific video frames on the tape relating to each spoken phoneme. From this range we digitize the video frame which represents the extreme of mouth motion/shape. Thus, we construct a database of images of face/mouth shape related to spoken phonemes. A selected audio speech sequence is recorded which is the basis for synthesizing a matching video sequence; the speaker need not be the same as used for constructing the database. The audio sequence is analyzed to determine the spoken phoneme sequence and the relative timing of the enunciation of those phonemes. Synthesizing an image sequence corresponding to the spoken phoneme sequence is accomplished using a graphics technique known as morphing. Image sequence keyframes necessary for this processing are based on the spoken phoneme sequence and timing. We have been successful in synthesizing the facial motion of a native English speaker for a small set of arbitrary speech segments. Our future work will focus on advancement of the face shape/phoneme model and independent control of facial features.
Parental numeric language input to Mandarin Chinese and English speaking preschool children.

PubMed

Chang, Alicia; Sandhofer, Catherine M; Adelchanow, Lauren; Rottman, Benjamin

2011-03-01

The present study examined the number-specific parental language input to Mandarin- and English-speaking preschool-aged children. Mandarin and English transcripts from the CHILDES database were examined for amount of numeric speech, specific types of numeric speech and syntactic frames in which numeric speech appeared. The results showed that Mandarin-speaking parents talked about number more frequently than English-speaking parents. Further, the ways in which parents talked about number terms in the two languages was more supportive of a cardinal interpretation in Mandarin than in English. We discuss these results in terms of their implications for numerical understanding and later mathematical performance.
Beyond Early Intervention: Providing Support to Public School Personnel

ERIC Educational Resources Information Center

Wilson, Kathryn

2006-01-01

At age 3, children with hearing loss transition from Part C early intervention to Part B public school services. These children represent a heterogeneous population when considering factors such as communication approaches; speech, language, auditory and cognitive skills; social-emotional and motor development; parental involvement; hearing…
Coaching Athletes with Hidden Disabilities: Recommendations and Strategies for Coaching Education

ERIC Educational Resources Information Center

Vargas, Tiffanye; Flores, Margaret; Beyer, Robbi

2012-01-01

Hidden disabilities (HD) are those disabilities not readily apparent to the naked eye including specific learning disabilities, attention deficit hyperactivity disorder, emotional behavioral disorders, mild intellectual disabilities, and speech or language disabilities. Young athletes with HD may have difficulty listening to and following…
Perception of Sung Speech in Bimodal Cochlear Implant Users.

PubMed

Crew, Joseph D; Galvin, John J; Fu, Qian-Jie

2016-11-11

Combined use of a hearing aid (HA) and cochlear implant (CI) has been shown to improve CI users' speech and music performance. However, different hearing devices, test stimuli, and listening tasks may interact and obscure bimodal benefits. In this study, speech and music perception were measured in bimodal listeners for CI-only, HA-only, and CI + HA conditions, using the Sung Speech Corpus, a database of monosyllabic words produced at different fundamental frequencies. Sentence recognition was measured using sung speech in which pitch was held constant or varied across words, as well as for spoken speech. Melodic contour identification (MCI) was measured using sung speech in which the words were held constant or varied across notes. Results showed that sentence recognition was poorer with sung speech relative to spoken, with little difference between sung speech with a constant or variable pitch; mean performance was better with CI-only relative to HA-only, and best with CI + HA. MCI performance was better with constant words versus variable words; mean performance was better with HA-only than with CI-only and was best with CI + HA. Relative to CI-only, a strong bimodal benefit was observed for speech and music perception. Relative to the better ear, bimodal benefits remained strong for sentence recognition but were marginal for MCI. While variations in pitch and timbre may negatively affect CI users' speech and music perception, bimodal listening may partially compensate for these deficits. © The Author(s) 2016.
Multiresolution analysis (discrete wavelet transform) through Daubechies family for emotion recognition in speech.

NASA Astrophysics Data System (ADS)

Campo, D.; Quintero, O. L.; Bastidas, M.

2016-04-01

We propose a study of the mathematical properties of voice as an audio signal. This work includes signals in which the channel conditions are not ideal for emotion recognition. Multiresolution analysis- discrete wavelet transform - was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states. ANNs proved to be a system that allows an appropriate classification of such states. This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features. Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify.
The Influence of Native Language on Auditory-Perceptual Evaluation of Vocal Samples Completed by Brazilian and Canadian SLPs.

PubMed

Chaves, Cristiane Ribeiro; Campbell, Melanie; Côrtes Gama, Ana Cristina

2017-03-01

This study aimed to determine the influence of native language on the auditory-perceptual assessment of voice, as completed by Brazilian and Anglo-Canadian listeners using Brazilian vocal samples and the grade, roughness, breathiness, asthenia, strain (GRBAS) scale. This is an analytical, observational, comparative, and transversal study conducted at the Speech Language Pathology Department of the Federal University of Minas Gerais in Brazil, and at the Communication Sciences and Disorders Department of the University of Alberta in Canada. The GRBAS scale, connected speech, and a sustained vowel were used in this study. The vocal samples were drawn randomly from a database of recorded speech of Brazilian adults, some with healthy voices and some with voice disorders. The database is housed at the Federal University of Minas Gerais. Forty-six samples of connected speech (recitation of days of the week), produced by 35 women and 11 men, and 46 samples of the sustained vowel /a/, produced by 37 women and 9 men, were used in this study. The listeners were divided into two groups of three speech therapists, according to nationality: Brazilian or Anglo-Canadian. The groups were matched according to the years of professional experience of participants. The weighted kappa was used to calculate the intra- and inter-rater agreements, with 95% confidence intervals, respectively. An analysis of the intra-rater agreement showed that Brazilians and Canadians had similar results in auditory-perceptual evaluation of sustained vowel and connected speech. The results of the inter-rater agreement of connected speech and sustained vowel indicated that Brazilians and Canadians had, respectively, moderate agreement on the overall severity (0.57 and 0.50), breathiness (0.45 and 0.45), and asthenia (0.50 and 0.46); poor correlation on roughness (0.19 and 0.007); and weak correlation on strain to connected speech (0.22), and moderate correlation to sustained vowel (0.50). In general, auditory-perceptual evaluation is not influenced by the native language on most dimensions of the perceptual parameters of the GRBAS scale. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Situational influences on rhythmicity in speech, music, and their interaction.

PubMed

Hawkins, Sarah

2014-12-19

Brain processes underlying the production and perception of rhythm indicate considerable flexibility in how physical signals are interpreted. This paper explores how that flexibility might play out in rhythmicity in speech and music. There is much in common across the two domains, but there are also significant differences. Interpretations are explored that reconcile some of the differences, particularly with respect to how functional properties modify the rhythmicity of speech, within limits imposed by its structural constraints. Functional and structural differences mean that music is typically more rhythmic than speech, and that speech will be more rhythmic when the emotions are more strongly engaged, or intended to be engaged. The influence of rhythmicity on attention is acknowledged, and it is suggested that local increases in rhythmicity occur at times when attention is required to coordinate joint action, whether in talking or music-making. Evidence is presented which suggests that while these short phases of heightened rhythmical behaviour are crucial to the success of transitions in communicative interaction, their modality is immaterial: they all function to enhance precise temporal prediction and hence tightly coordinated joint action. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Extensions to the Speech Disorders Classification System (SDCS)

PubMed Central

Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.

2010-01-01

This report describes three extensions to a classification system for pediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). Part I describes a classification extension to the SDCS to differentiate motor speech disorders from speech delay and to differentiate among three subtypes of motor speech disorders. Part II describes the Madison Speech Assessment Protocol (MSAP), an approximately two-hour battery of 25 measures that includes 15 speech tests and tasks. Part III describes the Competence, Precision, and Stability Analytics (CPSA) framework, a current set of approximately 90 perceptual- and acoustic-based indices of speech, prosody, and voice used to quantify and classify subtypes of Speech Sound Disorders (SSD). A companion paper, Shriberg, Fourakis, et al. (2010) provides reliability estimates for the perceptual and acoustic data reduction methods used in the SDCS. The agreement estimates in the companion paper support the reliability of SDCS methods and illustrate the complementary roles of perceptual and acoustic methods in diagnostic analyses of SSD of unknown origin. Examples of research using the extensions to the SDCS described in the present report include diagnostic findings for a sample of youth with motor speech disorders associated with galactosemia (Shriberg, Potter, & Strand, 2010) and a test of the hypothesis of apraxia of speech in a group of children with autism spectrum disorders (Shriberg, Paul, Black, & van Santen, 2010). All SDCS methods and reference databases running in the PEPPER (Programs to Examine Phonetic and Phonologic Evaluation Records; [Shriberg, Allen, McSweeny, & Wilson, 2001]) environment will be disseminated without cost when complete. PMID:20831378
Expressed emotion displayed by the mothers of inhibited and uninhibited preschool-aged children.

PubMed

Raishevich, Natoshia; Kennedy, Susan J; Rapee, Ronald M

2010-01-01

In the current study, the Five Minute Speech Sample was used to assess the association between parent attitudes and children's behavioral inhibition in mothers of 120 behaviorally inhibited (BI) and 37 behaviorally uninhibited preschool-aged children. Mothers of BI children demonstrated significantly higher levels of emotional over-involvement (EOI) and self-sacrificing/overprotective behavior (SS/OP). However, there was no significant relationship between inhibition status and maternal criticism. Multiple regression also indicated that child temperament, but not maternal anxiety, was a significant predictor of both EOI and SS/OP.
Quality of life improvement after pressure equalization tube placement in Down syndrome: A prospective study.

PubMed

Labby, Alex; Mace, Jess C; Buncke, Michelle; MacArthur, Carol J

2016-09-01

To evaluate quality-of-life changes after bilateral pressure equalization tube placement with or without adenoidectomy for the treatment of chronic otitis media with effusion or recurrent acute otitis media in a pediatric Down syndrome population compared to controls. Prospective case-control observational study. The OM Outcome Survey (OMO-22) was administered to both patients with Down syndrome and controls before bilateral tube placement with or without adenoidectomy and at an average of 6-7 months postoperatively. Thirty-one patients with Down syndrome and 34 controls were recruited. Both pre-operative and post-operative between-group and within-group score comparisons were conducted for the Physical, Hearing/Balance, Speech, Emotional, and Social domains of the OMO-22. Both groups experienced improvement of mean symptom scores post-operatively. Patients with Down syndrome reported significant post-operative improvement in mean Physical and Hearing domain item scores while control patients reported significant improvement in Physical, Hearing, and Emotional domain item scores. All four symptom scores in the Speech domain, both pre-operatively and post-operatively, were significantly worse for Down syndrome patients compared to controls (p ≤ 0.008). Surgical placement of pressure equalizing tubes results in significant quality of life improvements in patients with Down syndrome and controls. Problems related to speech and balance are reported at a higher rate and persist despite intervention in the Down syndrome population. It is possible that longer follow up periods and/or more sensitive tools are required to measure speech improvements in the Down syndrome population after pressure equalizing tube placement ± adenoidectomy. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The Chinese Facial Emotion Recognition Database (CFERD): a computer-generated 3-D paradigm to measure the recognition of facial emotional expressions at different intensities.

PubMed

Huang, Charles Lung-Cheng; Hsiao, Sigmund; Hwu, Hai-Gwo; Howng, Shen-Long

2012-12-30

The Chinese Facial Emotion Recognition Database (CFERD), a computer-generated three-dimensional (3D) paradigm, was developed to measure the recognition of facial emotional expressions at different intensities. The stimuli consisted of 3D colour photographic images of six basic facial emotional expressions (happiness, sadness, disgust, fear, anger and surprise) and neutral faces of the Chinese. The purpose of the present study is to describe the development and validation of CFERD with nonclinical healthy participants (N=100; 50 men; age ranging between 18 and 50 years), and to generate normative data set. The results showed that the sensitivity index d' [d'=Z(hit rate)-Z(false alarm rate), where function Z(p), p∈[0,1
Emotion socialization in anxious youth: Parenting buffers emotional reactivity to peer negative events

PubMed Central

Oppenheimer, Caroline W.; Ladouceur, Cecile D.; Waller, Jennifer M.; Ryan, Neal D.; Allen, Kristy Benoit; Sheeber, Lisa; Forbes, Erika E; Dahl, Ronald E.; Silk, Jennifer S.

2016-01-01

Anxious youth exhibit heightened emotional reactivity, particularly to social-evaluative threat, such as peer evaluation and feedback, compared to non-anxious youth. Moreover, normative developmental changes during the transition into adolescence may exacerbate emotional reactivity to peer negative events, particularly for anxious youth. Therefore, it is important to investigate factors that may buffer emotional reactivity within peer contexts among anxious youth. The current study examined the role of parenting behaviors in child emotional reactivity to peer and non-peer negative events among 86 anxious youth in middle childhood to adolescence (Mean age = 11.29, 54% girls). Parenting behavior and affect was observed during a social-evaluative laboratory speech task for youth, and ecological momentary assessment (EMA) methods were used to examine youth emotional reactivity to typical daily negative events within peer and non-peer contexts. Results showed that parent positive behaviors, and low levels of parent anxious affect, during the stressful laboratory task for youth buffered youth negative emotional reactivity to real-world negative peer events, but not non-peer events. Findings inform our understanding of parenting influences on anxious youth's emotional reactivity to developmentally salient negative events during the transition into adolescence. PMID:26783026
Emotion Socialization in Anxious Youth: Parenting Buffers Emotional Reactivity to Peer Negative Events.

PubMed

Oppenheimer, Caroline W; Ladouceur, Cecile D; Waller, Jennifer M; Ryan, Neal D; Allen, Kristy Benoit; Sheeber, Lisa; Forbes, Erika E; Dahl, Ronald E; Silk, Jennifer S

2016-10-01

Anxious youth exhibit heightened emotional reactivity, particularly to social-evaluative threat, such as peer evaluation and feedback, compared to non-anxious youth. Moreover, normative developmental changes during the transition into adolescence may exacerbate emotional reactivity to peer negative events, particularly for anxious youth. Therefore, it is important to investigate factors that may buffer emotional reactivity within peer contexts among anxious youth. The current study examined the role of parenting behaviors in child emotional reactivity to peer and non-peer negative events among 86 anxious youth in middle childhood to adolescence (Mean age = 11.29, 54 % girls). Parenting behavior and affect was observed during a social-evaluative laboratory speech task for youth, and ecological momentary assessment (EMA) methods were used to examine youth emotional reactivity to typical daily negative events within peer and non-peer contexts. Results showed that parent positive behaviors, and low levels of parent anxious affect, during the stressful laboratory task for youth buffered youth negative emotional reactivity to real-world negative peer events, but not non-peer events. Findings inform our understanding of parenting influences on anxious youth's emotional reactivity to developmentally salient negative events during the transition into adolescence.
Expressed emotion in mothers of boys with gender identity disorder.

PubMed

Owen-Anderson, Allison F H; Bradley, Susan J; Zucker, Kenneth J

2010-01-01

The authors examined the construct of expressed emotion in mothers of 20 boys with gender identity disorder (GID), 20 clinical control boys with externalizing disorders (ECC), 20 community control boys (NCB), and 20 community control girls (NCG). The mean age of the children was 6.86 years (SD = 1.46, range = 4-8 years). The authors predicted that the mothers of boys with GID would demonstrate (a) higher percentages of expressed emotion, criticism, and emotional overinvolvement compared with normal controls; and (b) higher percentages of only emotional overinvolvement compared with mothers of boys with externalizing difficulties. They used the Five-Minute Speech Sample (Magana-Amato, A., 1986) to assess maternal expressed emotion. A significantly greater percentage of mothers in both clinical groups were classified as high expressed emotion than mothers in the NCB group. When the authors compared the GID group with all other groups combined, they found that the mothers of boys with GID were classified as having higher levels of a combination of both high or borderline emotional overinvolvement and low criticism than were mothers in the other 3 groups. The authors discuss expressed emotion as a maternal characteristic in the genesis and perpetuation of GID in boys.

Communicating with Virtual Humans.

ERIC Educational Resources Information Center

Thalmann, Nadia Magnenat

The face is a small part of a human, but it plays an essential role in communication. An open hybrid system for facial animation is presented. It encapsulates a considerable amount of information regarding facial models, movements, expressions, emotions, and speech. The complex description of facial animation can be handled better by assigning…
Responses of Teachers and Non-Teachers Regarding Placement of Exceptional Children. Final Report.

ERIC Educational Resources Information Center

Phelps, William R.

Evaluated were opinions of 10 teachers and 10 nonteachers on educational placement of children with such handicaps as orthopedic, visual, and speech problems. Analysis of questionnaires yielded the following information: the major differences centered on placement for the emotionally disturbed, hearing impaired, orthopedically handicapped, and…
Prosody and Formulaic Language in Treatment-Resistant Depression: Effects of Deep Brain Stimulation

ERIC Educational Resources Information Center

Bridges, Kelly A.

2014-01-01

Communication, specifically the elements crucial for normal social interaction, can be significantly affected in psychiatric illness, especially depression. Of specific importance are prosody (an aspect of speech that carries emotional valence) and formulaic language (non-novel linguistic segments that are prevalent in naturalistic conversation).…
Specificity of regional brain activity in anxiety types during emotion processing.

PubMed

Engels, Anna S; Heller, Wendy; Mohanty, Aprajita; Herrington, John D; Banich, Marie T; Webb, Andrew G; Miller, Gregory A

2007-05-01

The present study tested the hypothesis that anxious apprehension involves more left- than right-hemisphere activity and that anxious arousal is associated with the opposite pattern. Behavioral and fMRI responses to threat stimuli in an emotional Stroop task were examined in nonpatient groups reporting anxious apprehension, anxious arousal, or neither. Reaction times were longer for negative than for neutral words. As predicted, brain activation distinguished anxious groups in a left inferior frontal region associated with speech production and in a right-hemisphere inferior temporal area. Addressing a second hypothesis about left-frontal involvement in emotion, distinct left frontal regions were associated with anxious apprehension versus processing of positive information. Results support the proposed distinction between the two types of anxiety and resolve an inconsistency about the role of left-frontal activation in emotion and psychopathology.
Cell-phone vs microphone recordings: Judging emotion in the voice.

PubMed

Green, Joshua J; Eigsti, Inge-Marie

2017-09-01

Emotional states can be conveyed by vocal cues such as pitch and intensity. Despite the ubiquity of cellular telephones, there is limited information on how vocal emotional states are perceived during cell-phone transmissions. Emotional utterances (neutral, happy, angry) were elicited from two female talkers and simultaneously recorded via microphone and cell-phone. Ten-step continua (neutral to happy, neutral to angry) were generated using the straight algorithm. Analyses compared reaction time (RT) and emotion judgment as a function of recording type (microphone vs cell-phone). Logistic regression revealed no judgment differences between recording types, though there were interactions with emotion type. Multi-level model analyses indicated that RT data were best fit by a quadratic model, with slower RT at the middle of each continuum, suggesting greater ambiguity, and slower RT for cell-phone stimuli across blocks. While preliminary, results suggest that critical acoustic cues to emotion are largely retained in cell-phone transmissions, though with effects of recording source on RT, and support the methodological utility of collecting speech samples by phone.
Evaluation on health-related quality of life in deaf children with cochlear implant in China.

PubMed

Liu, Hong; Liu, Hong-Xiang; Kang, Hou-Yong; Gu, Zheng; Hong, Su-Ling

2016-09-01

Previous studies have shown that deaf children benefit considerably from cochlear implants. These improvements are found in areas such as speech perception, speech production, and audiology-verbal performance. Despite the increasing prevalence of cochlear implants in China, few studies have reported on health-related quality of life in children with cochlear implants. The main objective of this study was to explore health-related quality of life on children with cochlear implants in South-west China. A retrospective observational study of 213 CI users in Southwest China between 2010 and 2013. Participants were 213 individuals with bilateral severe-to-profound hearing loss who wore unilateral cochlear implants. The Nijmegen Cochlear Implant Questionnaire and Health Utility Index Mark III were used pre-implantation and 1 year post-implantation. Additionally, 1-year postoperative scores for Mandarin speech perception were compared with preoperative scores. Health-related quality of life improved post-operation with scores on the Nijmegen Cochlear Implant Questionnaire improving significantly in all subdomains, and the Health Utility Index 3 showing a significant improvement in the utility score and the subdomains of ''hearing," ''speech," and "emotion". Additionally, a significant improvement in speech recognition scores was found. No significant correlation was found between increased in quality of life and speech perception scores. Health-related quality of life and speech recognition in prelingual deaf children significantly improved post-operation. The lack of correlation between quality of life and speech perception suggests that when evaluating performance post-implantation in prelingual deaf children and adolescents, measures of both speech perception and quality of life should be used. Copyright © 2016. Published by Elsevier Ireland Ltd.
Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies.

PubMed

Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol Y; Saltzman, Elliot; Goldstein, Louis

2010-09-13

Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed "speech-inversion." This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.
Experimental investigation of cognitive and affective empathy in borderline personality disorder: Effects of ambiguity in multimodal social information processing.

PubMed

Niedtfeld, Inga

2017-07-01

Borderline personality disorder (BPD) is characterized by affective instability and interpersonal problems. In the context of social interaction, impairments in empathy are proposed to result in inadequate social behavior. In contrast to findings of reduced cognitive empathy, some authors suggested enhanced emotional empathy in BPD. It was investigated whether ambiguity leads to decreased cognitive or emotional empathy in BPD. Thirty-four patients with BPD and thirty-two healthy controls were presented with video clips, which were presented through prosody, facial expression, and speech content. Experimental conditions were designed to induce ambiguity by presenting neutral valence in one of these communication channels. Subjects were asked to indicate the actors' emotional valence, their decision confidence, and their own emotional state. BPD patients showed increased emotional empathy when neutral stories comprised nonverbally expressed emotions. In contrast, when all channels were emotional, patients showed lower emotional empathy than healthy controls. Regarding cognitive empathy, there were no significant differences between BPD patients and healthy control subjects in recognition accuracy, but reduced decision confidence in BPD. These results suggest that patients with BPD show altered emotional empathy, experiencing higher rates of emotional contagion when emotions are expressed nonverbally. The latter may contribute to misunderstandings and inadequate social behavior. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
[Minimal emotional dysfunction and first impression formation in personality disorders].

PubMed

Linden, M; Vilain, M

2011-01-01

"Minimal cerebral dysfunctions" are isolated impairments of basic mental functions, which are elements of complex functions like speech. The best described are cognitive dysfunctions such as reading and writing problems, dyscalculia, attention deficits, but also motor dysfunctions such as problems with articulation, hyperactivity or impulsivity. Personality disorders can be characterized by isolated emotional dysfunctions in relation to emotional adequacy, intensity and responsivity. For example, paranoid personality disorders can be characterized by continuous and inadequate distrust, as a disorder of emotional adequacy. Schizoid personality disorders can be characterized by low expressive emotionality, as a disorder of effect intensity, or dissocial personality disorders can be characterized by emotional non-responsivity. Minimal emotional dysfunctions cause interactional misunderstandings because of the psychology of "first impression formation". Studies have shown that in 100 ms persons build up complex and lasting emotional judgements about other persons. Therefore, minimal emotional dysfunctions result in interactional problems and adjustment disorders and in corresponding cognitive schemata.From the concept of minimal emotional dysfunctions specific psychotherapeutic interventions in respect to the patient-therapist relationship, the diagnostic process, the clarification of emotions and reality testing, and especially an understanding of personality disorders as impairment and "selection, optimization, and compensation" as a way of coping can be derived.
Phonologically-based biomarkers for major depressive disorder

NASA Astrophysics Data System (ADS)

Trevino, Andrea Carolina; Quatieri, Thomas Francis; Malyska, Nicolas

2011-12-01

Of increasing importance in the civilian and military population is the recognition of major depressive disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we introduce vocal biomarkers that are derived automatically from phonologically-based measures of speech rate. To assess our measures, we use a 35-speaker free-response speech database of subjects treated for depression over a 6-week duration. We find that dissecting average measures of speech rate into phone-specific characteristics and, in particular, combined phone-duration measures uncovers stronger relationships between speech rate and depression severity than global measures previously reported for a speech-rate biomarker. Results of this study are supported by correlation of our measures with depression severity and classification of depression state with these vocal measures. Our approach provides a general framework for analyzing individual symptom categories through phonological units, and supports the premise that speaking rate can be an indicator of psychomotor retardation severity.
Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives.

PubMed

Monson, Brian B; Lotto, Andrew J; Story, Brad H

2012-09-01

The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.
Training a new generation of speech-language pathologists with competences in the management of literacy disorders and learning disabilities in Hong Kong.

PubMed

Yuen, Kevin C P

2014-01-01

One of the recent developments in the education of speech-language pathology is to include literacy disorders and learning disabilities as key training components in the training curriculum. Disorders in reading and writing are interwoven with disorders in speaking and listening, which should be managed holistically, particularly in children and adolescents. With extensive training in clinical linguistics, language disorders, and other theoretical knowledge and clinical skills, speech-language pathologists (SLPs) are the best equipped and most competent professionals to screen, identify, diagnose, and manage individuals with literacy disorders. To tackle the challenges of and the huge demand for services in literacy as well as language and learning disorders, the Hong Kong Institute of Education has recently developed the Master of Science Programme in Educational Speech-Language Pathology and Learning Disabilities, which is one of the very first speech-language pathology training programmes in Asia to blend training components of learning disabilities, literacy disorders, and social-emotional-behavioural-developmental disabilities into a developmentally and medically oriented speech-language pathology training programme. This new training programme aims to prepare a new generation of SLPs to be able to offer comprehensive support to individuals with speech, language, literacy, learning, communication, and swallowing disorders of different developmental or neurogenic origins, particularly to infants and adolescents as well as to their family and educational team. © 2015 S. Karger AG, Basel.
Ultrasound applicability in Speech Language Pathology and Audiology.

PubMed

Barberena, Luciana da Silva; Brasil, Brunah de Castro; Melo, Roberta Michelon; Mezzomo, Carolina Lisbôa; Mota, Helena Bolli; Keske-Soares, Márcia

2014-01-01

To present recent studies that used the ultrasound in the fields of Speech Language Pathology and Audiology, which evidence possibilities of the applicability of this technique in different subareas. A bibliographic research was carried out in the PubMed database, using the keywords "ultrasonic," "speech," "phonetics," "Speech, Language and Hearing Sciences," "voice," "deglutition," and "myofunctional therapy," comprising some areas of Speech Language Pathology and Audiology Sciences. The keywords "ultrasound," "ultrasonography," "swallow," "orofacial myofunctional therapy," and "orofacial myology" were also used in the search. Studies in humans from the past 5 years were selected. In the preselection, duplicated studies, articles not fully available, and those that did not present direct relation between ultrasound and Speech Language Pathology and Audiology Sciences were discarded. The data were analyzed descriptively and classified subareas of Speech Language Pathology and Audiology Sciences. The following items were considered: purposes, participants, procedures, and results. We selected 12 articles for ultrasound versus speech/phonetics subarea, 5 for ultrasound versus voice, 1 for ultrasound versus muscles of mastication, and 10 for ultrasound versus swallow. Studies relating "ultrasound" and "Speech Language Pathology and Audiology Sciences" in the past 5 years were not found. Different studies on the use of ultrasound in Speech Language Pathology and Audiology Sciences were found. Each of them, according to its purpose, confirms new possibilities of the use of this instrument in the several subareas, aiming at a more accurate diagnosis and new evaluative and therapeutic possibilities.
Open Microphone Speech Understanding: Correct Discrimination Of In Domain Speech

NASA Technical Reports Server (NTRS)

Hieronymus, James; Aist, Greg; Dowding, John

2006-01-01

An ideal spoken dialogue system listens continually and determines which utterances were spoken to it, understands them and responds appropriately while ignoring the rest This paper outlines a simple method for achieving this goal which involves trading a slightly higher false rejection rate of in domain utterances for a higher correct rejection rate of Out of Domain (OOD) utterances. The system recognizes semantic entities specified by a unification grammar which is specialized by Explanation Based Learning (EBL). so that it only uses rules which are seen in the training data. The resulting grammar has probabilities assigned to each construct so that overgeneralizations are not a problem. The resulting system only recognizes utterances which reduce to a valid logical form which has meaning for the system and rejects the rest. A class N-gram grammar has been trained on the same training data. This system gives good recognition performance and offers good Out of Domain discrimination when combined with the semantic analysis. The resulting systems were tested on a Space Station Robot Dialogue Speech Database and a subset of the OGI conversational speech database. Both systems run in real time on a PC laptop and the present performance allows continuous listening with an acceptably low false acceptance rate. This type of open microphone system has been used in the Clarissa procedure reading and navigation spoken dialogue system which is being tested on the International Space Station.
Effects of music therapy in the treatment of children with delayed speech development - results of a pilot study.

PubMed

Gross, Wibke; Linden, Ulrike; Ostermann, Thomas

2010-07-21

Language development is one of the most significant processes of early childhood development. Children with delayed speech development are more at risk of acquiring other cognitive, social-emotional, and school-related problems. Music therapy appears to facilitate speech development in children, even within a short period of time. The aim of this pilot study is to explore the effects of music therapy in children with delayed speech development. A total of 18 children aged 3.5 to 6 years with delayed speech development took part in this observational study in which music therapy and no treatment were compared to demonstrate effectiveness. Individual music therapy was provided on an outpatient basis. An ABAB reversal design with alternations between music therapy and no treatment with an interval of approximately eight weeks between the blocks was chosen. Before and after each study period, a speech development test, a non-verbal intelligence test for children, and music therapy assessment scales were used to evaluate the speech development of the children. Compared to the baseline, we found a positive development in the study group after receiving music therapy. Both phonological capacity and the children's understanding of speech increased under treatment, as well as their cognitive structures, action patterns, and level of intelligence. Throughout the study period, developmental age converged with their biological age. Ratings according to the Nordoff-Robbins scales showed clinically significant changes in the children, namely in the areas of client-therapist relationship and communication. This study suggests that music therapy may have a measurable effect on the speech development of children through the treatment's interactions with fundamental aspects of speech development, including the ability to form and maintain relationships and prosodic abilities. Thus, music therapy may provide a basic and supportive therapy for children with delayed speech development. Further studies should be conducted to investigate the mechanisms of these interactions in greater depth. The trial is registered in the German clinical trials register; Trial-No.: DRKS00000343.
Sensing emotion in voices: Negativity bias and gender differences in a validation study of the Oxford Vocal ('OxVoc') sounds database.

PubMed

Young, Katherine S; Parsons, Christine E; LeBeau, Richard T; Tabak, Benjamin A; Sewart, Amy R; Stein, Alan; Kringelbach, Morten L; Craske, Michelle G

2017-08-01

Emotional expressions are an essential element of human interactions. Recent work has increasingly recognized that emotional vocalizations can color and shape interactions between individuals. Here we present data on the psychometric properties of a recently developed database of authentic nonlinguistic emotional vocalizations from human adults and infants (the Oxford Vocal 'OxVoc' Sounds Database; Parsons, Young, Craske, Stein, & Kringelbach, 2014). In a large sample (n = 562), we demonstrate that adults can reliably categorize these sounds (as 'positive,' 'negative,' or 'sounds with no emotion'), and rate valence in these sounds consistently over time. In an extended sample (n = 945, including the initial n = 562), we also investigated a number of individual difference factors in relation to valence ratings of these vocalizations. Results demonstrated small but significant effects of (a) symptoms of depression and anxiety with more negative ratings of adult neutral vocalizations (R2 = .011 and R2 = .008, respectively) and (b) gender differences in perceived valence such that female listeners rated adult neutral vocalizations more positively and infant cry vocalizations more negatively than male listeners (R2 = .021, R2 = .010, respectively). Of note, we did not find evidence of negativity bias among other affective vocalizations or gender differences in perceived valence of adult laughter, adult cries, infant laughter, or infant neutral vocalizations. Together, these findings largely converge with factors previously shown to impact processing of emotional facial expressions, suggesting a modality-independent impact of depression, anxiety, and listener gender, particularly among vocalizations with more ambiguous valence. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
What an otolaryngologist should know about evaluation of a child referred for delay in speech development.

PubMed

Tonn, Christopher R; Grundfast, Kenneth M

2014-03-01

Otolaryngologists are asked to evaluate children who a parent, physician, or someone else believes is slow in developing speech. Therefore, an otolaryngologist should be familiar with milestones for normal speech development, the causes of delay in speech development, and the best ways to help assure that children develop the ability to speak in a normal way. To provide information for otolaryngologists that is helpful in the evaluation and management of children perceived to be delayed in developing speech. Data were obtained via literature searches, online databases, textbooks, and the most recent national guidelines on topics including speech delay and language delay and the underlying disorders that can cause delay in developing speech. Emphasis was placed on epidemiology, pathophysiology, most common presentation, and treatment strategies. Most of the sources referenced were published within the past 5 years. Our article is a summary of major causes of speech delay based on reliable sources as listed herein. Speech delay can be the manifestation of a spectrum of disorders affecting the language comprehension and/or speech production pathways, ranging from disorders involving global developmental limitations to motor dysfunction to hearing loss. Determining the cause of a child's delay in speech production is a time-sensitive issue because a child loses valuable opportunities in intellectual development if his or her communication defect is not addressed and ameliorated with treatment. Knowing several key items about each disorder can help otolaryngologists direct families to the correct health care provider to maximize the child's learning potential and intellectual growth curve.
Connecting multimodality in human communication

PubMed Central

Regenbogen, Christina; Habel, Ute; Kellermann, Thilo

2013-01-01

A successful reciprocal evaluation of social signals serves as a prerequisite for social coherence and empathy. In a previous fMRI study we studied naturalistic communication situations by presenting video clips to our participants and recording their behavioral responses regarding empathy and its components. In two conditions, all three channels transported congruent emotional or neutral information, respectively. Three conditions selectively presented two emotional channels and one neutral channel and were thus bimodally emotional. We reported channel-specific emotional contributions in modality-related areas, elicited by dynamic video clips with varying combinations of emotionality in facial expressions, prosody, and speech content. However, to better understand the underlying mechanisms accompanying a naturalistically displayed human social interaction in some key regions that presumably serve as specific processing hubs for facial expressions, prosody, and speech content, we pursued a reanalysis of the data. Here, we focused on two different descriptions of temporal characteristics within these three modality-related regions [right fusiform gyrus (FFG), left auditory cortex (AC), left angular gyrus (AG) and left dorsomedial prefrontal cortex (dmPFC)]. By means of a finite impulse response (FIR) analysis within each of the three regions we examined the post-stimulus time-courses as a description of the temporal characteristics of the BOLD response during the video clips. Second, effective connectivity between these areas and the left dmPFC was analyzed using dynamic causal modeling (DCM) in order to describe condition-related modulatory influences on the coupling between these regions. The FIR analysis showed initially diminished activation in bimodally emotional conditions but stronger activation than that observed in neutral videos toward the end of the stimuli, possibly by bottom-up processes in order to compensate for a lack of emotional information. The DCM analysis instead showed a pronounced top-down control. Remarkably, all connections from the dmPFC to the three other regions were modulated by the experimental conditions. This observation is in line with the presumed role of the dmPFC in the allocation of attention. In contrary, all incoming connections to the AG were modulated, indicating its key role in integrating multimodal information and supporting comprehension. Notably, the input from the FFG to the AG was enhanced when facial expressions conveyed emotional information. These findings serve as preliminary results in understanding network dynamics in human emotional communication and empathy. PMID:24265613
Connecting multimodality in human communication.

PubMed

Regenbogen, Christina; Habel, Ute; Kellermann, Thilo

2013-01-01

A successful reciprocal evaluation of social signals serves as a prerequisite for social coherence and empathy. In a previous fMRI study we studied naturalistic communication situations by presenting video clips to our participants and recording their behavioral responses regarding empathy and its components. In two conditions, all three channels transported congruent emotional or neutral information, respectively. Three conditions selectively presented two emotional channels and one neutral channel and were thus bimodally emotional. We reported channel-specific emotional contributions in modality-related areas, elicited by dynamic video clips with varying combinations of emotionality in facial expressions, prosody, and speech content. However, to better understand the underlying mechanisms accompanying a naturalistically displayed human social interaction in some key regions that presumably serve as specific processing hubs for facial expressions, prosody, and speech content, we pursued a reanalysis of the data. Here, we focused on two different descriptions of temporal characteristics within these three modality-related regions [right fusiform gyrus (FFG), left auditory cortex (AC), left angular gyrus (AG) and left dorsomedial prefrontal cortex (dmPFC)]. By means of a finite impulse response (FIR) analysis within each of the three regions we examined the post-stimulus time-courses as a description of the temporal characteristics of the BOLD response during the video clips. Second, effective connectivity between these areas and the left dmPFC was analyzed using dynamic causal modeling (DCM) in order to describe condition-related modulatory influences on the coupling between these regions. The FIR analysis showed initially diminished activation in bimodally emotional conditions but stronger activation than that observed in neutral videos toward the end of the stimuli, possibly by bottom-up processes in order to compensate for a lack of emotional information. The DCM analysis instead showed a pronounced top-down control. Remarkably, all connections from the dmPFC to the three other regions were modulated by the experimental conditions. This observation is in line with the presumed role of the dmPFC in the allocation of attention. In contrary, all incoming connections to the AG were modulated, indicating its key role in integrating multimodal information and supporting comprehension. Notably, the input from the FFG to the AG was enhanced when facial expressions conveyed emotional information. These findings serve as preliminary results in understanding network dynamics in human emotional communication and empathy.
Phi-square Lexical Competition Database (Phi-Lex): an online tool for quantifying auditory and visual lexical competition.

PubMed

Strand, Julia F

2014-03-01

A widely agreed-upon feature of spoken word recognition is that multiple lexical candidates in memory are simultaneously activated in parallel when a listener hears a word, and that those candidates compete for recognition (Luce, Goldinger, Auer, & Vitevitch, Perception 62:615-625, 2000; Luce & Pisoni, Ear and Hearing 19:1-36, 1998; McClelland & Elman, Cognitive Psychology 18:1-86, 1986). Because the presence of those competitors influences word recognition, much research has sought to quantify the processes of lexical competition. Metrics that quantify lexical competition continuously are more effective predictors of auditory and visual (lipread) spoken word recognition than are the categorical metrics traditionally used (Feld & Sommers, Speech Communication 53:220-228, 2011; Strand & Sommers, Journal of the Acoustical Society of America 130:1663-1672, 2011). A limitation of the continuous metrics is that they are somewhat computationally cumbersome and require access to existing speech databases. This article describes the Phi-square Lexical Competition Database (Phi-Lex): an online, searchable database that provides access to multiple metrics of auditory and visual (lipread) lexical competition for English words, available at www.juliastrand.com/phi-lex .

Analyzing crowdsourced ratings of speech-based take-over requests for automated driving.

PubMed

Bazilinskyy, P; de Winter, J C F

2017-10-01

Take-over requests in automated driving should fit the urgency of the traffic situation. The robustness of various published research findings on the valuations of speech-based warning messages is unclear. This research aimed to establish how people value speech-based take-over requests as a function of speech rate, background noise, spoken phrase, and speaker's gender and emotional tone. By means of crowdsourcing, 2669 participants from 95 countries listened to a random 10 out of 140 take-over requests, and rated each take-over request on urgency, commandingness, pleasantness, and ease of understanding. Our results replicate several published findings, in particular that an increase in speech rate results in a monotonic increase of perceived urgency. The female voice was easier to understand than a male voice when there was a high level of background noise, a finding that contradicts the literature. Moreover, a take-over request spoken with Indian accent was found to be easier to understand by participants from India than by participants from other countries. Our results replicate effects in the literature regarding speech-based warnings, and shed new light on effects of background noise, gender, and nationality. The results may have implications for the selection of appropriate take-over requests in automated driving. Additionally, our study demonstrates the promise of crowdsourcing for testing human factors and ergonomics theories with large sample sizes. Copyright © 2017 Elsevier Ltd. All rights reserved.
[Verbal and gestural communication in interpersonal interaction with Alzheimer's disease patients].

PubMed

Schiaratura, Loris Tamara; Di Pastena, Angela; Askevis-Leherpeux, Françoise; Clément, Sylvain

2015-03-01

Communication can be defined as a verbal and non verbal exchange of thoughts and emotions. While verbal communication deficit in Alzheimer's disease is well documented, very little is known about gestural communication, especially in interpersonal situations. This study examines the production of gestures and its relations with verbal aspects of communication. Three patients suffering from moderately severe Alzheimer's disease were compared to three healthy adults. Each one were given a series of pictures and asked to explain which one she preferred and why. The interpersonal interaction was video recorded. Analyses concerned verbal production (quantity and quality) and gestures. Gestures were either non representational (i.e., gestures of small amplitude punctuating speech or accentuating some parts of utterance) or representational (i.e., referring to the object of the speech). Representational gestures were coded as iconic (depicting of concrete aspects), metaphoric (depicting of abstract meaning) or deictic (pointing toward an object). In comparison with healthy participants, patients revealed a decrease in quantity and quality of speech. Nevertheless, their production of gestures was always present. This pattern is in line with the conception that gestures and speech depend on different communicational systems and look inconsistent with the assumption of a parallel dissolution of gesture and speech. Moreover, analyzing the articulation between verbal and gestural dimensions suggests that representational gestures may compensate for speech deficits. It underlines the importance for the role of gestures in maintaining interpersonal communication.
Music and speech prosody: a common rhythm.

PubMed

Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

2013-01-01

Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).
Data-driven analysis of functional brain interactions during free listening to music and speech.

PubMed

Fang, Jun; Hu, Xintao; Han, Junwei; Jiang, Xi; Zhu, Dajiang; Guo, Lei; Liu, Tianming

2015-06-01

Natural stimulus functional magnetic resonance imaging (N-fMRI) such as fMRI acquired when participants were watching video streams or listening to audio streams has been increasingly used to investigate functional mechanisms of the human brain in recent years. One of the fundamental challenges in functional brain mapping based on N-fMRI is to model the brain's functional responses to continuous, naturalistic and dynamic natural stimuli. To address this challenge, in this paper we present a data-driven approach to exploring functional interactions in the human brain during free listening to music and speech streams. Specifically, we model the brain responses using N-fMRI by measuring the functional interactions on large-scale brain networks with intrinsically established structural correspondence, and perform music and speech classification tasks to guide the systematic identification of consistent and discriminative functional interactions when multiple subjects were listening music and speech in multiple categories. The underlying premise is that the functional interactions derived from N-fMRI data of multiple subjects should exhibit both consistency and discriminability. Our experimental results show that a variety of brain systems including attention, memory, auditory/language, emotion, and action networks are among the most relevant brain systems involved in classic music, pop music and speech differentiation. Our study provides an alternative approach to investigating the human brain's mechanism in comprehension of complex natural music and speech.
Music and speech prosody: a common rhythm

PubMed Central

Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo

2013-01-01

Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022
Perception of affective and linguistic prosody: an ALE meta-analysis of neuroimaging studies

PubMed Central

Brown, Steven

2014-01-01

Prosody refers to the melodic and rhythmic aspects of speech. Two forms of prosody are typically distinguished: ‘affective prosody’ refers to the expression of emotion in speech, whereas ‘linguistic prosody’ relates to the intonation of sentences, including the specification of focus within sentences and stress within polysyllabic words. While these two processes are united by their use of vocal pitch modulation, they are functionally distinct. In order to examine the localization and lateralization of speech prosody in the brain, we performed two voxel-based meta-analyses of neuroimaging studies of the perception of affective and linguistic prosody. There was substantial sharing of brain activations between analyses, particularly in right-hemisphere auditory areas. However, a major point of divergence was observed in the inferior frontal gyrus: affective prosody was more likely to activate Brodmann area 47, while linguistic prosody was more likely to activate the ventral part of area 44. PMID:23934416
Consensus Paper: Cerebellum and Emotion.

PubMed

Adamaszek, M; D'Agata, F; Ferrucci, R; Habas, C; Keulen, S; Kirkby, K C; Leggio, M; Mariën, P; Molinari, M; Moulton, E; Orsi, L; Van Overwalle, F; Papadelis, C; Priori, A; Sacchetti, B; Schutter, D J; Styliadis, C; Verhoeven, J

2017-04-01

Over the past three decades, insights into the role of the cerebellum in emotional processing have substantially increased. Indeed, methodological refinements in cerebellar lesion studies and major technological advancements in the field of neuroscience are in particular responsible to an exponential growth of knowledge on the topic. It is timely to review the available data and to critically evaluate the current status of the role of the cerebellum in emotion and related domains. The main aim of this article is to present an overview of current facts and ongoing debates relating to clinical, neuroimaging, and neurophysiological findings on the role of the cerebellum in key aspects of emotion. Experts in the field of cerebellar research discuss the range of cerebellar contributions to emotion in nine topics. Topics include the role of the cerebellum in perception and recognition, forwarding and encoding of emotional information, and the experience and regulation of emotional states in relation to motor, cognitive, and social behaviors. In addition, perspectives including cerebellar involvement in emotional learning, pain, emotional aspects of speech, and neuropsychiatric aspects of the cerebellum in mood disorders are briefly discussed. Results of this consensus paper illustrate how theory and empirical research have converged to produce a composite picture of brain topography, physiology, and function that establishes the role of the cerebellum in many aspects of emotional processing.
Early Postimplant Speech Perception and Language Skills Predict Long-Term Language and Neurocognitive Outcomes Following Pediatric Cochlear Implantation

PubMed Central

Kronenberger, William G.; Castellanos, Irina; Pisoni, David B.

2017-01-01

Purpose We sought to determine whether speech perception and language skills measured early after cochlear implantation in children who are deaf, and early postimplant growth in speech perception and language skills, predict long-term speech perception, language, and neurocognitive outcomes. Method Thirty-six long-term users of cochlear implants, implanted at an average age of 3.4 years, completed measures of speech perception, language, and executive functioning an average of 14.4 years postimplantation. Speech perception and language skills measured in the 1st and 2nd years postimplantation and open-set word recognition measured in the 3rd and 4th years postimplantation were obtained from a research database in order to assess predictive relations with long-term outcomes. Results Speech perception and language skills at 6 and 18 months postimplantation were correlated with long-term outcomes for language, verbal working memory, and parent-reported executive functioning. Open-set word recognition was correlated with early speech perception and language skills and long-term speech perception and language outcomes. Hierarchical regressions showed that early speech perception and language skills at 6 months postimplantation and growth in these skills from 6 to 18 months both accounted for substantial variance in long-term outcomes for language and verbal working memory that was not explained by conventional demographic and hearing factors. Conclusion Speech perception and language skills measured very early postimplantation, and early postimplant growth in speech perception and language, may be clinically relevant markers of long-term language and neurocognitive outcomes in users of cochlear implants. Supplemental materials https://doi.org/10.23641/asha.5216200 PMID:28724130
Review of HaNDLE-on-QoL: a database of published papers that use questionnaires to report quality of life in patients with cancer of the head and neck.

PubMed

Wotherspoon, R J; Kanatas, A N; Rogers, S N

2018-02-01

HaNDLE-on-QoL (Head And Neck Database Listing Evidence on QoL) is a searchable database that comprises abstracts of papers that have used questionnaires to report on quality of life (QoL) in patients with cancer of the head and neck. It can be searched by title, first author, year of publication, words used in the abstract, site of cancer, study design, and questionnaires used. The aim of this paper was to summarise its contents. In May 2017 we searched the website using the criteria above. It contained 1498 papers (including 149 reviews), and the number is increasing each year. Most studies concerned a combination of subsites in the head and neck (n=871); 180 focused specifically on oral sites, and 109 on the larynx. The commonest topics were swallowing (n=353), speech (n=299), pain (n=292), emotions (n=226), and depression (n=193). Nearly all the papers concerned function or predictors of health-related QoL (HRQoL), but 98 were clinical or randomised controlled trials. The site included over 250 questionnaires of which the most common were the European Organisation for Research and Treatment of Cancer C30 (EORTC-C30, n=369), the EORTC-head and neck 35 (EORTC H&N35, n=353), and the University of Washington Quality of Life (UWQoL) (n=276). HaNDLE-on-QoL highlights the complexity of QoL after treatment and the diversity and range of the studies. It is a useful point of reference for those involved in clinical practice or research. Copyright © 2017 The British Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.
The Influence of Negative Emotion on Cognitive and Emotional Control Remains Intact in Aging

PubMed Central

Zinchenko, Artyom; Obermeier, Christian; Kanske, Philipp; Schröger, Erich; Villringer, Arno; Kotz, Sonja A.

2017-01-01

Healthy aging is characterized by a gradual decline in cognitive control and inhibition of interferences, while emotional control is either preserved or facilitated. Emotional control regulates the processing of emotional conflicts such as in irony in speech, and cognitive control resolves conflict between non-affective tendencies. While negative emotion can trigger control processes and speed up resolution of both cognitive and emotional conflicts, we know little about how aging affects the interaction of emotion and control. In two EEG experiments, we compared the influence of negative emotion on cognitive and emotional conflict processing in groups of younger adults (mean age = 25.2 years) and older adults (69.4 years). Participants viewed short video clips and either categorized spoken vowels (cognitive conflict) or their emotional valence (emotional conflict), while the visual facial information was congruent or incongruent. Results show that negative emotion modulates both cognitive and emotional conflict processing in younger and older adults as indicated in reduced response times and/or enhanced event-related potentials (ERPs). In emotional conflict processing, we observed a valence-specific N100 ERP component in both age groups. In cognitive conflict processing, we observed an interaction of emotion by congruence in the N100 responses in both age groups, and a main effect of congruence in the P200 and N200. Thus, the influence of emotion on conflict processing remains intact in aging, despite a marked decline in cognitive control. Older adults may prioritize emotional wellbeing and preserve the role of emotion in cognitive and emotional control. PMID:29163132
The Influence of Negative Emotion on Cognitive and Emotional Control Remains Intact in Aging.

PubMed

Zinchenko, Artyom; Obermeier, Christian; Kanske, Philipp; Schröger, Erich; Villringer, Arno; Kotz, Sonja A

2017-01-01

Healthy aging is characterized by a gradual decline in cognitive control and inhibition of interferences, while emotional control is either preserved or facilitated. Emotional control regulates the processing of emotional conflicts such as in irony in speech, and cognitive control resolves conflict between non-affective tendencies. While negative emotion can trigger control processes and speed up resolution of both cognitive and emotional conflicts, we know little about how aging affects the interaction of emotion and control. In two EEG experiments, we compared the influence of negative emotion on cognitive and emotional conflict processing in groups of younger adults (mean age = 25.2 years) and older adults (69.4 years). Participants viewed short video clips and either categorized spoken vowels (cognitive conflict) or their emotional valence (emotional conflict), while the visual facial information was congruent or incongruent. Results show that negative emotion modulates both cognitive and emotional conflict processing in younger and older adults as indicated in reduced response times and/or enhanced event-related potentials (ERPs). In emotional conflict processing, we observed a valence-specific N100 ERP component in both age groups. In cognitive conflict processing, we observed an interaction of emotion by congruence in the N100 responses in both age groups, and a main effect of congruence in the P200 and N200. Thus, the influence of emotion on conflict processing remains intact in aging, despite a marked decline in cognitive control. Older adults may prioritize emotional wellbeing and preserve the role of emotion in cognitive and emotional control.
Suppression and expression of emotion in social and interpersonal outcomes: A meta-analysis.

PubMed

Chervonsky, Elizabeth; Hunt, Caroline

2017-06-01

Emotion expression is critical for the communication of important social information, such as emotional states and behavioral intentions. However, people tend to vary in their level of emotional expression. This meta-analysis investigated the relationships between levels of emotion expression and suppression, and social and interpersonal outcomes. PsycINFO databases, as well as reference lists were searched. Forty-three papers from a total of 3,200 papers met inclusion criteria, allowing for 105 effect sizes to be calculated. Meta-analyses revealed that greater suppression of emotion was significantly associated with poorer social wellbeing, including more negative first impressions, lower social support, lower social satisfaction and quality, and poorer romantic relationship quality. Furthermore, the expression of positive and general/nonspecific emotion was related to better social outcomes, while the expression of anger was associated with poorer social wellbeing. Expression of negative emotion generally was also associated with poorer social outcomes, although this effect size was very small and consisted of mixed results. These findings highlight the importance of considering the role that regulation of emotional expression can play in the development of social dysfunction and interpersonal problems. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Emotion regulation choice in an evaluative context: the moderating role of self-esteem.

PubMed

Shafir, Roni; Guarino, Tara; Lee, Ihno A; Sheppes, Gal

2017-12-01

Evaluative contexts can be stressful, but relatively little is known about how different individuals who vary in responses to self-evaluation make emotion regulatory choices to cope in these situations. To address this gap, participants who vary in self-esteem gave an impromptu speech, rated how they perceived they had performed on multiple evaluative dimensions, and subsequently chose between disengaging attention from emotional processing (distraction) and engaging with emotional processing via changing its meaning (reappraisal), while waiting to receive feedback regarding these evaluative dimensions. According to our framework, distraction can offer stronger short-term relief than reappraisal, but, distraction is costly in the long run relative to reappraisal because it does not allow learning from evaluative feedback. We predicted and found that participants with lower (but not higher) self-esteem react defensively to threat of failure by seeking short-term relief via distraction over the long-term benefit of reappraisal, as perceived failure increases. Implications for the understanding of emotion regulation and self-esteem are discussed.
"The Seventh Seal."

ERIC Educational Resources Information Center

Palmer, Peter M.

1969-01-01

The significance of Bergman's "Seventh Seal" lies not in the speeches nor in the actions of the central characters but rather in the film's form, its totality created by the emotive elements of imagery and sound together with the intellectual elements of actions and words. The scene-units are related to a central motif (the opening of…
Development in Children's Interpretation of Pitch Cues to Emotions

ERIC Educational Resources Information Center

Quam, Carolyn; Swingley, Daniel

2012-01-01

Young infants respond to positive and negative speech prosody (A. Fernald, 1993), yet 4-year-olds rely on lexical information when it conflicts with paralinguistic cues to approval or disapproval (M. Friend, 2003). This article explores this surprising phenomenon, testing one hundred eighteen 2- to 5-year-olds' use of isolated pitch cues to…
Communication Interventions and Their Impact on Behaviour in the Young Child: A Systematic Review

ERIC Educational Resources Information Center

Law, James; Plunkett, Charlene C.; Stringer, Helen

2012-01-01

Speech, language and communication needs (SLCN) and social, emotional and behaviour difficulties (SEBD) commonly overlap, yet we know relatively little about the mechanism linking the two, specifically to what extent it is possible to reduce behaviour difficulties by targeted communication skills. The EPPI Centre systematic review methodology was…
Toward an Understanding of the Emotional Nature of Stage Fright: A Three Factor Theory.

ERIC Educational Resources Information Center

Cahn, Dudley D.

A comprehensive understanding of stage fright will better enable teachers and researchers to select the most appropriate "cure" and to determine those cases in which speech training will help reduce stage fright or other states of communication apprehension. Attempts to understand stage fright have focused on three psychological theories…
Social Information Guides Infants' Selection of Foods

ERIC Educational Resources Information Center

Shutts, Kristin; Kinzler, Katherine D.; McKee, Caitlin B.; Spelke, Elizabeth S.

2009-01-01

Two experiments investigated the influence of socially conveyed emotions and speech on infants' choices among food. After watching films in which two unfamiliar actresses each spoke while eating a different kind of food, 12-month-old infants were allowed to choose between the two foods. In Experiment 1, infants selected a food endorsed by a…
Early Intervention Practices for Children with Hearing Loss: Impact of Professional Development

ERIC Educational Resources Information Center

Martin-Prudent, Angi; Lartz, Maribeth; Borders, Christina; Meehan, Tracy

2016-01-01

Early identification and appropriate intervention services for children who are deaf or hard of hearing significantly increase the likelihood of better language, speech, and social-emotional development. However, current research suggests that there is a critical shortage of professionals trained to provide early intervention services to deaf and…
An Annotated Bibliography of Some Recent Articles That Correlate with the Sewall Early Education Developmental Program (SEED).

ERIC Educational Resources Information Center

Jackson, Janice; Flamboe, Thomas C.

The annotated bibliography contains approximately 110 references (1969-1976) of articles related to the Sewall Early Education Developmental Program. Entries are arranged alphabetically by author within the following seven topic areas: social emotional, gross motor, fine motor, adaptive reasoning, speech and language, feeding and dressing and…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.