Methods for eliciting, annotating, and analyzing databases for child speech development.
Beckman, Mary E; Plummer, Andrew R; Munson, Benjamin; Reidy, Patrick F
2017-09-01
Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.
The EpiSLI Database: A Publicly Available Database on Speech and Language
ERIC Educational Resources Information Center
Tomblin, J. Bruce
2010-01-01
Purpose: This article describes a database that was created in the process of conducting a large-scale epidemiologic study of specific language impairment (SLI). As such, this database will be referred to as the EpiSLI database. Children with SLI have unexpected and unexplained difficulties learning and using spoken language. Although there is no…
Pathological speech signal analysis and classification using empirical mode decomposition.
Kaleem, Muhammad; Ghoraani, Behnaz; Guergachi, Aziz; Krishnan, Sridhar
2013-07-01
Automated classification of normal and pathological speech signals can provide an objective and accurate mechanism for pathological speech diagnosis, and is an active area of research. A large part of this research is based on analysis of acoustic measures extracted from sustained vowels. However, sustained vowels do not reflect real-world attributes of voice as effectively as continuous speech, which can take into account important attributes of speech such as rapid voice onset and termination, changes in voice frequency and amplitude, and sudden discontinuities in speech. This paper presents a methodology based on empirical mode decomposition (EMD) for classification of continuous normal and pathological speech signals obtained from a well-known database. EMD is used to decompose randomly chosen portions of speech signals into intrinsic mode functions, which are then analyzed to extract meaningful temporal and spectral features, including true instantaneous features which can capture discriminative information in signals hidden at local time-scales. A total of six features are extracted, and a linear classifier is used with the feature vector to classify continuous speech portions obtained from a database consisting of 51 normal and 161 pathological speakers. A classification accuracy of 95.7 % is obtained, thus demonstrating the effectiveness of the methodology.
Speech Databases of Typical Children and Children with SLI
Grill, Pavel; Tučková, Jana
2016-01-01
The extent of research on children’s speech in general and on disordered speech specifically is very limited. In this article, we describe the process of creating databases of children’s speech and the possibilities for using such databases, which have been created by the LANNA research group in the Faculty of Electrical Engineering at Czech Technical University in Prague. These databases have been principally compiled for medical research but also for use in other areas, such as linguistics. Two databases were recorded: one for healthy children’s speech (recorded in kindergarten and in the first level of elementary school) and the other for pathological speech of children with a Specific Language Impairment (recorded at a surgery of speech and language therapists and at the hospital). Both databases were sub-divided according to specific demands of medical research. Their utilization can be exoteric, specifically for linguistic research and pedagogical use as well as for studies of speech-signal processing. PMID:26963508
Ali, Zulfiqar; Alsulaiman, Mansour; Muhammad, Ghulam; Elamvazuthi, Irraivan; Al-Nasheri, Ahmed; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H
2017-05-01
A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Emotion recognition from speech: tools and challenges
NASA Astrophysics Data System (ADS)
Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.
2015-05-01
Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.
Bohland, Jason W; Myers, Emma M; Kim, Esther
2014-01-01
A number of heritable disorders impair the normal development of speech and language processes and occur in large numbers within the general population. While candidate genes and loci have been identified, the gap between genotype and phenotype is vast, limiting current understanding of the biology of normal and disordered processes. This gap exists not only in our scientific knowledge, but also in our research communities, where genetics researchers and speech, language, and cognitive scientists tend to operate independently. Here we describe a web-based, domain-specific, curated database that represents information about genotype-phenotype relations specific to speech and language disorders, as well as neuroimaging results demonstrating focal brain differences in relevant patients versus controls. Bringing these two distinct data types into a common database ( http://neurospeech.org/sldb ) is a first step toward bringing molecular level information into cognitive and computational theories of speech and language function. One bridge between these data types is provided by densely sampled profiles of gene expression in the brain, such as those provided by the Allen Brain Atlases. Here we present results from exploratory analyses of human brain gene expression profiles for genes implicated in speech and language disorders, which are annotated in our database. We then discuss how such datasets can be useful in the development of computational models that bridge levels of analysis, necessary to provide a mechanistic understanding of heritable language disorders. We further describe our general approach to information integration, discuss important caveats and considerations, and offer a specific but speculative example based on genes implicated in stuttering and basal ganglia function in speech motor control.
Keshtiari, Niloofar; Kuhlmann, Michael; Eslami, Moharram; Klann-Delius, Gisela
2015-03-01
Research on emotional speech often requires valid stimuli for assessing perceived emotion through prosody and lexical content. To date, no comprehensive emotional speech database for Persian is officially available. The present article reports the process of designing, compiling, and evaluating a comprehensive emotional speech database for colloquial Persian. The database contains a set of 90 validated novel Persian sentences classified in five basic emotional categories (anger, disgust, fear, happiness, and sadness), as well as a neutral category. These sentences were validated in two experiments by a group of 1,126 native Persian speakers. The sentences were articulated by two native Persian speakers (one male, one female) in three conditions: (1) congruent (emotional lexical content articulated in a congruent emotional voice), (2) incongruent (neutral sentences articulated in an emotional voice), and (3) baseline (all emotional and neutral sentences articulated in neutral voice). The speech materials comprise about 470 sentences. The validity of the database was evaluated by a group of 34 native speakers in a perception test. Utterances recognized better than five times chance performance (71.4 %) were regarded as valid portrayals of the target emotions. Acoustic analysis of the valid emotional utterances revealed differences in pitch, intensity, and duration, attributes that may help listeners to correctly classify the intended emotion. The database is designed to be used as a reliable material source (for both text and speech) in future cross-cultural or cross-linguistic studies of emotional speech, and it is available for academic research purposes free of charge. To access the database, please contact the first author.
A systematic review of treatment intensity in speech disorders.
Kaipa, Ramesh; Peterson, Abigail Marie
2016-12-01
Treatment intensity (sometimes referred to as "practice amount") has been well-investigated in learning non-speech tasks, but its role in treating speech disorders has not been largely analysed. This study reviewed the literature regarding treatment intensity in speech disorders. A systematic search was conducted in four databases using appropriate search terms. Seven articles from a total of 580 met the inclusion criteria. The speech disorders investigated included speech sound disorders, dysarthria, acquired apraxia of speech and childhood apraxia of speech. All seven studies were evaluated for their methodological quality, research phase and evidence level. Evidence level of reviewed studies ranged from moderate to strong. With regard to the research phase, only one study was considered to be phase III research, which corresponds to the controlled trial phase. The remaining studies were considered to be phase II research, which corresponds to the phase where magnitude of therapeutic effect is assessed. Results suggested that higher treatment intensity was favourable over lower treatment intensity of specific treatment technique(s) for treating childhood apraxia of speech and speech sound (phonological) disorders. Future research should incorporate randomised-controlled designs to establish optimal treatment intensity that is specific to each of the speech disorders.
Perceptual learning of speech under optimal and adverse conditions.
Zhang, Xujin; Samuel, Arthur G
2014-02-01
Humans have a remarkable ability to understand spoken language despite the large amount of variability in speech. Previous research has shown that listeners can use lexical information to guide their interpretation of atypical sounds in speech (Norris, McQueen, & Cutler, 2003). This kind of lexically induced perceptual learning enables people to adjust to the variations in utterances due to talker-specific characteristics, such as individual identity and dialect. The current study investigated perceptual learning in two optimal conditions: conversational speech (Experiment 1) versus clear speech (Experiment 2), and three adverse conditions: noise (Experiment 3a) versus two cognitive loads (Experiments 4a and 4b). Perceptual learning occurred in the two optimal conditions and in the two cognitive load conditions, but not in the noise condition. Furthermore, perceptual learning occurred only in the first of two sessions for each participant, and only for atypical /s/ sounds and not for atypical /f/ sounds. This pattern of learning and nonlearning reflects a balance between flexibility and stability that the speech system must have to deal with speech variability in the diverse conditions that speech is encountered. PsycINFO Database Record (c) 2014 APA, all rights reserved.
The Reliability of Methodological Ratings for speechBITE Using the PEDro-P Scale
ERIC Educational Resources Information Center
Murray, Elizabeth; Power, Emma; Togher, Leanne; McCabe, Patricia; Munro, Natalie; Smith, Katherine
2013-01-01
Background: speechBITE (http://www.speechbite.com) is an online database established in order to help speech and language therapists gain faster access to relevant research that can used in clinical decision-making. In addition to containing more than 3000 journal references, the database also provides methodological ratings on the PEDro-P (an…
Sound Classification in Hearing Aids Inspired by Auditory Scene Analysis
NASA Astrophysics Data System (ADS)
Büchler, Michael; Allegro, Silvia; Launer, Stefan; Dillier, Norbert
2005-12-01
A sound classification system for the automatic recognition of the acoustic environment in a hearing aid is discussed. The system distinguishes the four sound classes "clean speech," "speech in noise," "noise," and "music." A number of features that are inspired by auditory scene analysis are extracted from the sound signal. These features describe amplitude modulations, spectral profile, harmonicity, amplitude onsets, and rhythm. They are evaluated together with different pattern classifiers. Simple classifiers, such as rule-based and minimum-distance classifiers, are compared with more complex approaches, such as Bayes classifier, neural network, and hidden Markov model. Sounds from a large database are employed for both training and testing of the system. The achieved recognition rates are very high except for the class "speech in noise." Problems arise in the classification of compressed pop music, strongly reverberated speech, and tonal or fluctuating noises.
Narayanan, Shrikanth; Toutios, Asterios; Ramanarayanan, Vikram; Lammert, Adam; Kim, Jangwon; Lee, Sungbok; Nayak, Krishna; Kim, Yoon-Chul; Zhu, Yinghua; Goldstein, Louis; Byrd, Dani; Bresch, Erik; Ghosh, Prasanta; Katsamanis, Athanasios; Proctor, Michael
2014-01-01
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community. PMID:25190403
Quadcopter Control Using Speech Recognition
NASA Astrophysics Data System (ADS)
Malik, H.; Darma, S.; Soekirno, S.
2018-04-01
This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).
On the Development of Speech Resources for the Mixtec Language
2013-01-01
The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). PMID:23710134
The Atlanta Motor Speech Disorders Corpus: Motivation, Development, and Utility.
Laures-Gore, Jacqueline; Russell, Scott; Patel, Rupal; Frankel, Michael
2016-01-01
This paper describes the design and collection of a comprehensive spoken language dataset from speakers with motor speech disorders in Atlanta, Ga., USA. This collaborative project aimed to gather a spoken database consisting of nonmainstream American English speakers residing in the Southeastern US in order to provide a more diverse perspective of motor speech disorders. Ninety-nine adults with an acquired neurogenic disorder resulting in a motor speech disorder were recruited. Stimuli include isolated vowels, single words, sentences with contrastive focus, sentences with emotional content and prosody, sentences with acoustic and perceptual sensitivity to motor speech disorders, as well as 'The Caterpillar' and 'The Grandfather' passages. Utility of this data in understanding the potential interplay of dialect and dysarthria was demonstrated with a subset of the speech samples existing in the database. The Atlanta Motor Speech Disorders Corpus will enrich our understanding of motor speech disorders through the examination of speech from a diverse group of speakers. © 2016 S. Karger AG, Basel.
Analysis of False Starts in Spontaneous Speech.
ERIC Educational Resources Information Center
O'Shaughnessy, Douglas
A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…
2004-09-01
Databases 2-2 2.3.1 Translanguage English Database 2-2 2.3.2 Australian National Database of Spoken Language 2-3 2.3.3 Strange Corpus 2-3 2.3.4...some relevance to speech technology research. 2.3.1 Translanguage English Database In a daring plan Joseph Mariani, then at LIMSI-CNRS, proposed to...native speakers. The database is known as the ‘ Translanguage English Database’ but is often referred to as the ‘terrible English database.’ About 28
Reviewing the connection between speech and obstructive sleep apnea.
Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A
2016-02-20
Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea-hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients' condition. We first evaluate AHI prediction using state-of-the-art speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.
Lexical frequency and acoustic reduction in spoken Dutch
NASA Astrophysics Data System (ADS)
Pluymaekers, Mark; Ernestus, Mirjam; Baayen, R. Harald
2005-10-01
This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with different frequencies. The materials came from a large database of face-to-face conversations. For each word containing a target affix, one token was randomly selected for acoustic analysis. Measurements were made of the duration of the affix as a whole and the durations of the individual segments in the affix. For three of the four affixes, a higher frequency of the carrier word led to shorter realizations of the affix as a whole, individual segments in the affix, or both. Other relevant factors were the sex and age of the speaker, segmental context, and speech rate. To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes.
Implementation of Three Text to Speech Systems for Kurdish Language
NASA Astrophysics Data System (ADS)
Bahrampour, Anvar; Barkhoda, Wafa; Azami, Bahram Zahir
Nowadays, concatenative method is used in most modern TTS systems to produce artificial speech. The most important challenge in this method is choosing appropriate unit for creating database. This unit must warranty smoothness and high quality speech, and also, creating database for it must reasonable and inexpensive. For example, syllable, phoneme, allophone, and, diphone are appropriate units for all-purpose systems. In this paper, we implemented three synthesis systems for Kurdish language based on syllable, allophone, and diphone and compare their quality using subjective testing.
One approach to design of speech emotion database
NASA Astrophysics Data System (ADS)
Uhrin, Dominik; Chmelikova, Zdenka; Tovarek, Jaromir; Partila, Pavol; Voznak, Miroslav
2016-05-01
This article describes a system for evaluating the credibility of recordings with emotional character. Sound recordings form Czech language database for training and testing systems of speech emotion recognition. These systems are designed to detect human emotions in his voice. The emotional state of man is useful in the security forces and emergency call service. Man in action (soldier, police officer and firefighter) is often exposed to stress. Information about the emotional state (his voice) will help to dispatch to adapt control commands for procedure intervention. Call agents of emergency call service must recognize the mental state of the caller to adjust the mood of the conversation. In this case, the evaluation of the psychological state is the key factor for successful intervention. A quality database of sound recordings is essential for the creation of the mentioned systems. There are quality databases such as Berlin Database of Emotional Speech or Humaine. The actors have created these databases in an audio studio. It means that the recordings contain simulated emotions, not real. Our research aims at creating a database of the Czech emotional recordings of real human speech. Collecting sound samples to the database is only one of the tasks. Another one, no less important, is to evaluate the significance of recordings from the perspective of emotional states. The design of a methodology for evaluating emotional recordings credibility is described in this article. The results describe the advantages and applicability of the developed method.
High-frequency energy in singing and speech
NASA Astrophysics Data System (ADS)
Monson, Brian Bruce
While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.
Predicting Language Outcome and Recovery After Stroke (PLORAS)
Price, CJ; Seghier, ML; Leff, AP
2013-01-01
The ability of comprehend and produce speech after stroke depends on whether the areas of the brain that support language have been damaged. Here we review two different ways to predict language outcome after stroke. The first depends on understanding the neural circuits that support language. This model-based approach is a challenging endeavor because language is a complex cognitive function that involves the interaction of many different brain areas. The second approach does not require an understanding of why a lesion impairs language, instead, predictions are made on the basis of how previous patients with the same lesion recovered. This requires a database storing the speech and language abilities of a large population of patients who have, between them, incurred a comprehensive range of focal brain damage. In addition it requires a system that converts an MRI scan from a new patient into a 3D description of the lesion and then compares this lesion to all others on the database. The outputs of this system are the longitudinal language outcomes of corresponding patients in the database. This will provide a new patient, their carers and the clinician team managing them the range of likely recovery patterns over a variety of language measures. PMID:20212513
van Heuven, Walter J. B.; Pitchford, Nicola J.; Ledgeway, Timothy
2017-01-01
Databases containing lexical properties on any given orthography are crucial for psycholinguistic research. In the last ten years, a number of lexical databases have been developed for Greek. However, these lack important part-of-speech information. Furthermore, the need for alternative procedures for calculating syllabic measurements and stress information, as well as combination of several metrics to investigate linguistic properties of the Greek language are highlighted. To address these issues, we present a new extensive lexical database of Modern Greek (GreekLex 2) with part-of-speech information for each word and accurate syllabification and orthographic information predictive of stress, as well as several measurements of word similarity and phonetic information. The addition of detailed statistical information about Greek part-of-speech, syllabification, and stress neighbourhood allowed novel analyses of stress distribution within different grammatical categories and syllabic lengths to be carried out. Results showed that the statistical preponderance of stress position on the pre-final syllable that is reported for Greek language is dependent upon grammatical category. Additionally, analyses showed that a proportion higher than 90% of the tokens in the database would be stressed correctly solely by relying on stress neighbourhood information. The database and the scripts for orthographic and phonological syllabification as well as phonetic transcription are available at http://www.psychology.nottingham.ac.uk/greeklex/. PMID:28231303
Kyparissiadis, Antonios; van Heuven, Walter J B; Pitchford, Nicola J; Ledgeway, Timothy
2017-01-01
Databases containing lexical properties on any given orthography are crucial for psycholinguistic research. In the last ten years, a number of lexical databases have been developed for Greek. However, these lack important part-of-speech information. Furthermore, the need for alternative procedures for calculating syllabic measurements and stress information, as well as combination of several metrics to investigate linguistic properties of the Greek language are highlighted. To address these issues, we present a new extensive lexical database of Modern Greek (GreekLex 2) with part-of-speech information for each word and accurate syllabification and orthographic information predictive of stress, as well as several measurements of word similarity and phonetic information. The addition of detailed statistical information about Greek part-of-speech, syllabification, and stress neighbourhood allowed novel analyses of stress distribution within different grammatical categories and syllabic lengths to be carried out. Results showed that the statistical preponderance of stress position on the pre-final syllable that is reported for Greek language is dependent upon grammatical category. Additionally, analyses showed that a proportion higher than 90% of the tokens in the database would be stressed correctly solely by relying on stress neighbourhood information. The database and the scripts for orthographic and phonological syllabification as well as phonetic transcription are available at http://www.psychology.nottingham.ac.uk/greeklex/.
ERIC Educational Resources Information Center
Justice, Laura M.; Breit-Smith, Allison; Rogers, Margaret
2010-01-01
Purpose: This clinical forum was organized to provide a means for informing the research and clinical communities of one mechanism through which research capacity might be enhanced within the field of speech-language pathology. Specifically, forum authors describe the process of conducting secondary analyses of extant databases to answer questions…
Small intragenic deletion in FOXP2 associated with childhood apraxia of speech and dysarthria.
Turner, Samantha J; Hildebrand, Michael S; Block, Susan; Damiano, John; Fahey, Michael; Reilly, Sheena; Bahlo, Melanie; Scheffer, Ingrid E; Morgan, Angela T
2013-09-01
Relatively little is known about the neurobiological basis of speech disorders although genetic determinants are increasingly recognized. The first gene for primary speech disorder was FOXP2, identified in a large, informative family with verbal and oral dyspraxia. Subsequently, many de novo and familial cases with a severe speech disorder associated with FOXP2 mutations have been reported. These mutations include sequencing alterations, translocations, uniparental disomy, and genomic copy number variants. We studied eight probands with speech disorder and their families. Family members were phenotyped using a comprehensive assessment of speech, oral motor function, language, literacy skills, and cognition. Coding regions of FOXP2 were screened to identify novel variants. Segregation of the variant was determined in the probands' families. Variants were identified in two probands. One child with severe motor speech disorder had a small de novo intragenic FOXP2 deletion. His phenotype included features of childhood apraxia of speech and dysarthria, oral motor dyspraxia, receptive and expressive language disorder, and literacy difficulties. The other variant was found in a family in two of three family members with stuttering, and also in the mother with oral motor impairment. This variant was considered a benign polymorphism as it was predicted to be non-pathogenic with in silico tools and found in database controls. This is the first report of a small intragenic deletion of FOXP2 that is likely to be the cause of severe motor speech disorder associated with language and literacy problems. Copyright © 2013 Wiley Periodicals, Inc.
Motor laterality as an indicator of speech laterality.
Flowers, Kenneth A; Hudson, John M
2013-03-01
The determination of speech laterality, especially where it is anomalous, is both a theoretical issue and a practical problem for brain surgery. Handedness is commonly thought to be related to speech representation, but exactly how is not clearly understood. This investigation analyzed handedness by preference rating and performance on a reliable task of motor laterality in 34 patients undergoing a Wada test, to see if they could provide an indicator of speech laterality. Hand usage preference ratings divided patients into left, right, and mixed in preference. Between-hand differences in movement time on a pegboard task determined motor laterality. Results were correlated (χ2) with speech representation as determined by a standard Wada test. It was found that patients whose between-hand difference in speed on the motor task was small or inconsistent were the ones whose Wada test speech representation was likely to be ambiguous or anomalous, whereas all those with a consistently large between-hand difference showed clear unilateral speech representation in the hemisphere controlling the better hand (χ2 = 10.45, df = 1, p < .01, η2 = 0.55) This relationship prevailed across hand preference and level of skill in the hands itself. We propose that motor and speech laterality are related where they both involve a central control of motor output sequencing and that a measure of that aspect of the former will indicate the likely representation of the latter. A between-hand measure of motor laterality based on such a measure may indicate the possibility of anomalous speech representation. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Human phoneme recognition depending on speech-intrinsic variability.
Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger
2010-11-01
The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers
Mustafa, Mumtaz Begum; Salim, Siti Salwah; Mohamed, Noraini; Al-Qatab, Bassam; Siong, Chng Eng
2014-01-01
Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data. PMID:24466004
Analysis of glottal source parameters in Parkinsonian speech.
Hanratty, Jane; Deegan, Catherine; Walsh, Mary; Kirkpatrick, Barry
2016-08-01
Diagnosis and monitoring of Parkinson's disease has a number of challenges as there is no definitive biomarker despite the broad range of symptoms. Research is ongoing to produce objective measures that can either diagnose Parkinson's or act as an objective decision support tool. Recent research on speech based measures have demonstrated promising results. This study aims to investigate the characteristics of the glottal source signal in Parkinsonian speech. An experiment is conducted in which a selection of glottal parameters are tested for their ability to discriminate between healthy and Parkinsonian speech. Results for each glottal parameter are presented for a database of 50 healthy speakers and a database of 16 speakers with Parkinsonian speech symptoms. Receiver operating characteristic (ROC) curves were employed to analyse the results and the area under the ROC curve (AUC) values were used to quantify the performance of each glottal parameter. The results indicate that glottal parameters can be used to discriminate between healthy and Parkinsonian speech, although results varied for each parameter tested. For the task of separating healthy and Parkinsonian speech, 2 out of the 7 glottal parameters tested produced AUC values of over 0.9.
Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals
1988-10-12
Secusrity Clamifieation, Nlassively-Parallel Architectures for Automa ic Recognitio of Visua, Speech Signals 12. PERSONAL AUTHOR(S) Terrence J...characteristics of speech from tJhe, visual speech signals. Neural networks have been trained on a database of vowels. The rqw images of faces , aligned and...images of faces , aligned and preprocessed, were used as input to these network which were trained to estimate the corresponding envelope of the
Vocal Tract Representation in the Recognition of Cerebral Palsied Speech
ERIC Educational Resources Information Center
Rudzicz, Frank; Hirst, Graeme; van Lieshout, Pascal
2012-01-01
Purpose: In this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine. Method: Data were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech.…
Parental Numeric Language Input to Mandarin Chinese and English Speaking Preschool Children
ERIC Educational Resources Information Center
Chang, Alicia; Sandhofer, Catherine M.; Adelchanow, Lauren; Rottman, Benjamin
2011-01-01
The present study examined the number-specific parental language input to Mandarin- and English-speaking preschool-aged children. Mandarin and English transcripts from the CHILDES database were examined for amount of numeric speech, specific types of numeric speech and syntactic frames in which numeric speech appeared. The results showed that…
Reference-free automatic quality assessment of tracheoesophageal speech.
Huang, Andy; Falk, Tiago H; Chan, Wai-Yip; Parsa, Vijay; Doyle, Philip
2009-01-01
Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in conjunction with a recently introduced speech modulation spectrum measure. Tests performed on two TE speech databases demonstrate that the modulation spectral measure and a subset of features in the standard ITU-T P.563 algorithm estimate TE speech quality with better correlation (up to 0.9) than previously proposed features.
Two Different Communication Genres and Implications for Vocabulary Development and Learning to Read
ERIC Educational Resources Information Center
Massaro, Dominic W.
2015-01-01
This study examined potential differences in vocabulary found in picture books and adult's speech to children and to other adults. Using a small sample of various sources of speech and print, Hayes observed that print had a more extensive vocabulary than speech. The current analyses of two different spoken language databases and an assembled…
Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai
2017-01-01
Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705
ERIC Educational Resources Information Center
Rodriguez, Luis J.; Torres, M. Ines
2006-01-01
Previous works in English have revealed that disfluencies follow regular patterns and that incorporating them into the language model of a speech recognizer leads to lower perplexities and sometimes to a better performance. Although work on disfluency modeling has been applied outside the English community (e.g., in Japanese), as far as we know…
NASA Astrophysics Data System (ADS)
Dat, Tran Huy; Takeda, Kazuya; Itakura, Fumitada
We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.
Automatic evaluation of hypernasality based on a cleft palate speech database.
He, Ling; Zhang, Jing; Liu, Qi; Yin, Heng; Lech, Margaret; Huang, Yunzhi
2015-05-01
The hypernasality is one of the most typical characteristics of cleft palate (CP) speech. The evaluation outcome of hypernasality grading decides the necessity of follow-up surgery. Currently, the evaluation of CP speech is carried out by experienced speech therapists. However, the result strongly depends on their clinical experience and subjective judgment. This work aims to propose an automatic evaluation system for hypernasality grading in CP speech. The database tested in this work is collected by the Hospital of Stomatology, Sichuan University, which has the largest number of CP patients in China. Based on the production process of hypernasality, source sound pulse and vocal tract filter features are presented. These features include pitch, the first and second energy amplified frequency bands, cepstrum based features, MFCC, short-time energy in the sub-bands features. These features combined with KNN classier are applied to automatically classify four grades of hypernasality: normal, mild, moderate and severe. The experiment results show that the proposed system achieves a good performance. The classification rates for four hypernasality grades reach up to 80.4%. The sensitivity of proposed features to the gender is also discussed.
2009-04-01
Available Military Speech Databases 2-2 2.3.1 FELIN Database 2-2 2.3.1.1 Overview 2-2 2.3.1.2 Technical Specifications 2-3 2.3.1.3 Limitations...emotion, confusion due to conflicting information, psychological tension, pain , and other typical conditions encountered in the modern battlefield...too, the number of possible language combinations scale with N3. It is clear that in a field of research that has only recently started and with so
Voice Outcomes of Adults Diagnosed with Pediatric Vocal Fold Nodules and Impact of Speech Therapy.
Song, Brian H; Merchant, Maqdooda; Schloegel, Luke
2017-11-01
Objective To evaluate the voice outcomes of adults diagnosed with vocal fold nodules (VFNs) as children and to assess the impact of speech therapy on long-term voice outcomes. Study Design Prospective cohort study. Setting Large health care system. Subjects and Methods Subjects diagnosed with VFNs as children between the years 1996 and 2008 were identified within a medical record database of a large health care system. Included subjects were 3 to 12 years old at the time of diagnosis, had a documented laryngeal examination within 90 days of diagnosis, and were ≥18 years as of December 31, 2014. Qualified subjects were contacted by telephone and administered the Vocal Handicap Index-10 (VHI-10) and a 15-item questionnaire inquiring for confounding factors. Results A total of 155 subjects were included, with a mean age of 21.4 years (range, 18-29). The male:female ratio was 2.3:1. Mean VHI-10 score for the entire cohort was 5.4. Mean VHI-10 scores did not differ between those who received speech therapy (6.1) and those who did not (4.5; P = .08). Both groups were similar with respect to confounding risk factors that can contribute to dysphonia, although the no-therapy group had a disproportionately higher number of subjects who consumed >10 alcoholic drinks per week ( P = .01). Conclusion The majority of adults with VFNs as children will achieve a close-to-normal voice quality when they reach adulthood. In our cohort, speech therapy did not appear to have an impact on the long-term voice outcomes.
Objective measurement of motor speech characteristics in the healthy pediatric population.
Wong, A W; Allegro, J; Tirado, Y; Chadha, N; Campisi, P
2011-12-01
To obtain objective measurements of motor speech characteristics in normal children, using a computer-based motor speech software program. Cross-sectional, observational design in a university-based ambulatory pediatric otolaryngology clinic. Participants included 112 subjects (54 females and 58 males) aged 4-18 years. Participants with previously diagnosed hearing loss, voice and motor disorders, and children unable to repeat a passage in English were excluded. Voice samples were recorded and analysed using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ). The MSP produced measures of diadochokinetics, second formant transition, intonation, and syllabic rates. Demographic data, including sex, age, and cigarette smoke exposure were obtained. Normative data for several motor speech characteristics were derived for children ranging from age 4 to 18 years. A number of age-dependent changes were indentified, including an increase in average diadochokinetic rate (p<0.001) and standard syllabic duration (p<0.001) with age. There were no identified differences in motor speech characteristics between males and females across the measured age range. Variations in fundamental frequency (Fo) during speech did not change significantly with age for both males and females. To our knowledge, this is the first pediatric normative database for the MSP progam. The MSP is suitable for testing children and can be used to study developmental changes in motor speech. The analysis demonstrated that males and females behave similarly and show the same relationship with age for the motor speech characteristics studied. This normative database will provide essential comparative data for future studies exploring alterations in motor speech that may occur with hearing, voice, and motor disorders and to assess the results of targeted therapies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Speech perception and production in severe environments
NASA Astrophysics Data System (ADS)
Pisoni, David B.
1990-09-01
The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.
A new feature constituting approach to detection of vocal fold pathology
NASA Astrophysics Data System (ADS)
Hariharan, M.; Polat, Kemal; Yaacob, Sazali
2014-08-01
In the last two decades, non-invasive methods through acoustic analysis of voice signal have been proved to be excellent and reliable tool to diagnose vocal fold pathologies. This paper proposes a new feature vector based on the wavelet packet transform and singular value decomposition for the detection of vocal fold pathology. k-means clustering based feature weighting is proposed to increase the distinguishing performance of the proposed features. In this work, two databases Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database and MAPACI speech pathology database are used. Four different supervised classifiers such as k-nearest neighbour (k-NN), least-square support vector machine, probabilistic neural network and general regression neural network are employed for testing the proposed features. The experimental results uncover that the proposed features give very promising classification accuracy of 100% for both MEEI database and MAPACI speech pathology database.
Subjective comparison and evaluation of speech enhancement algorithms
Hu, Yi; Loizou, Philipos C.
2007-01-01
Making meaningful comparisons between the performance of the various speech enhancement algorithms proposed over the years, has been elusive due to lack of a common speech database, differences in the types of noise used and differences in the testing methodology. To facilitate such comparisons, we report on the development of a noisy speech corpus suitable for evaluation of speech enhancement algorithms. This corpus is subsequently used for the subjective evaluation of 13 speech enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. The subjective evaluation was performed by Dynastat, Inc. using the ITU-T P.835 methodology designed to evaluate the speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests. PMID:18046463
Connected word recognition using a cascaded neuro-computational model
NASA Astrophysics Data System (ADS)
Hoya, Tetsuya; van Leeuwen, Cees
2016-10-01
We propose a novel framework for processing a continuous speech stream that contains a varying number of words, as well as non-speech periods. Speech samples are segmented into word-tokens and non-speech periods. An augmented version of an earlier-proposed, cascaded neuro-computational model is used for recognising individual words within the stream. Simulation studies using both a multi-speaker-dependent and speaker-independent digit string database show that the proposed method yields a recognition performance comparable to that obtained by a benchmark approach using hidden Markov models with embedded training.
ERIC Educational Resources Information Center
Raskind, Marshall
1993-01-01
This article describes assistive technologies for persons with learning disabilities, including word processing, spell checking, proofreading programs, outlining/"brainstorming" programs, abbreviation expanders, speech recognition, speech synthesis/screen review, optical character recognition systems, personal data managers, free-form databases,…
Higher order statistical analysis of /x/ in male speech.
Orr, M C; Lithgow, B
2005-03-01
This paper presents a study of kurtosis analysis for the sound /x/ in male speech, /x/ is the sound of the 'o' at the end of words such as 'ago'. The sound analysed for this paper came from the Australian National Database of Spoken Language, more specifically the male speaker 17. The /x/ was isolated and extracted from the database by the author in a quiet booth using standard multimedia software. A 5 millisecond window was used for the analysis as it was shown previously by the author to be the most appropriate size for speech phoneme analysis. The significance of the research presented here is shown in the results where a majority of coefficients had a platykurtic (kurtosis between 0 and 3) value as opposed to the previously held leptokurtic (kurtosis > 3) belief.
The varieties of speech to young children.
Huttenlocher, Janellen; Vasilyeva, Marina; Waterfall, Heidi R; Vevea, Jack L; Hedges, Larry V
2007-09-01
This article examines caregiver speech to young children. The authors obtained several measures of the speech used to children during early language development (14-30 months). For all measures, they found substantial variation across individuals and subgroups. Speech patterns vary with caregiver education, and the differences are maintained over time. While there are distinct levels of complexity for different caregivers, there is a common pattern of increase across age within the range that characterizes each educational group. Thus, caregiver speech exhibits both long-standing patterns of linguistic behavior and adjustment for the interlocutor. This information about the variability of speech by individual caregivers provides a framework for systematic study of the role of input in language acquisition. PsycINFO Database Record (c) 2007 APA, all rights reserved
An Analysis of The Parameters Used In Speech ABR Assessment Protocols.
Sanfins, Milaine D; Hatzopoulos, Stavros; Donadon, Caroline; Diniz, Thais A; Borges, Leticia R; Skarzynski, Piotr H; Colella-Santos, Maria Francisca
2018-04-01
The aim of this study was to assess the parameters of choice, such as duration, intensity, rate, polarity, number of sweeps, window length, stimulated ear, fundamental frequency, first formant, and second formant, from previously published speech ABR studies. To identify candidate articles, five databases were assessed using the following keyword descriptors: speech ABR, ABR-speech, speech auditory brainstem response, auditory evoked potential to speech, speech-evoked brainstem response, and complex sounds. The search identified 1288 articles published between 2005 and 2015. After filtering the total number of papers according to the inclusion and exclusion criteria, 21 studies were selected. Analyzing the protocol details used in 21 studies suggested that there is no consensus to date on a speech-ABR protocol and that the parameters of analysis used are quite variable between studies. This inhibits the wider generalization and extrapolation of data across languages and studies.
NASA Astrophysics Data System (ADS)
Nomura, Yukihiro; Lu, Jianming; Sekiya, Hiroo; Yahagi, Takashi
This paper presents a speech enhancement using the classification between the dominants of speech and noise. In our system, a new classification scheme between the dominants of speech and noise is proposed. The proposed classifications use the standard deviation of the spectrum of observation signal in each band. We introduce two oversubtraction factors for the dominants of speech and noise, respectively. And spectral subtraction is carried out after the classification. The proposed method is tested on several noise types from the Noisex-92 database. From the investigation of segmental SNR, Itakura-Saito distance measure, inspection of spectrograms and listening tests, the proposed system is shown to be effective to reduce background noise. Moreover, the enhanced speech using our system generates less musical noise and distortion than that of conventional systems.
Voice Technologies in Libraries: A Look into the Future.
ERIC Educational Resources Information Center
Lange, Holley R., Ed.; And Others
1991-01-01
Discussion of synthesized speech and voice recognition focuses on a forum that addressed the potential for speech technologies in libraries. Topics discussed by three contributors include possible library applications in technical processing, book receipt, circulation control, and database access; use by disabled and illiterate users; and problems…
NASA Astrophysics Data System (ADS)
Costache, G. N.; Gavat, I.
2004-09-01
Along with the aggressive growing of the amount of digital data available (text, audio samples, digital photos and digital movies joined all in the multimedia domain) the need for classification, recognition and retrieval of this kind of data became very important. In this paper will be presented a system structure to handle multimedia data based on a recognition perspective. The main processing steps realized for the interesting multimedia objects are: first, the parameterization, by analysis, in order to obtain a description based on features, forming the parameter vector; second, a classification, generally with a hierarchical structure to make the necessary decisions. For audio signals, both speech and music, the derived perceptual features are the melcepstral (MFCC) and the perceptual linear predictive (PLP) coefficients. For images, the derived features are the geometric parameters of the speaker mouth. The hierarchical classifier consists generally in a clustering stage, based on the Kohonnen Self-Organizing Maps (SOM) and a final stage, based on a powerful classification algorithm called Support Vector Machines (SVM). The system, in specific variants, is applied with good results in two tasks: the first, is a bimodal speech recognition which uses features obtained from speech signal fused to features obtained from speaker's image and the second is a music retrieval from large music database.
Perceptual Speech and Paralinguistic Skills of Adolescents with Williams Syndrome
ERIC Educational Resources Information Center
Hargrove, Patricia M.; Pittelko, Stephen; Fillingane, Evan; Rustman, Emily; Lund, Bonnie
2013-01-01
The purpose of this research was to compare selected speech and paralinguistic skills of speakers with Williams syndrome (WS) and typically developing peers and to demonstrate the feasibility of providing preexisting databases to students to facilitate graduate research. In a series of three studies, conversational samples of 12 adolescents with…
Random Deep Belief Networks for Recognizing Emotions from Speech Signals.
Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang
2017-01-01
Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
Random Deep Belief Networks for Recognizing Emotions from Speech Signals
Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang
2017-01-01
Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition. PMID:28356908
The broadcast of shared attention and its impact on political persuasion.
Shteynberg, Garriy; Bramlett, James M; Fles, Elizabeth H; Cameron, Jaclyn
2016-11-01
In democracies where multitudes yield political influence, so does broadcast media that reaches those multitudes. However, broadcast media may not be powerful simply because it reaches a certain audience, but because each of the recipients is aware of that fact. That is, watching broadcast media can evoke a state of shared attention, or the perception of simultaneous coattention with others. Whereas past research has investigated the effects of shared attention with a few socially close others (i.e., friends, acquaintances, minimal ingroup members), we examine the impact of shared attention with a multitude of unfamiliar others in the context of televised broadcasting. In this paper, we explore whether shared attention increases the psychological impact of televised political speeches, and whether fewer numbers of coattending others diminishes this effect. Five studies investigate whether the perception of simultaneous coattention, or shared attention, on a mass broadcasted political speech leads to more extreme judgments. The results indicate that the perception of synchronous coattention (as compared with coattending asynchronously and attending alone) renders persuasive speeches more persuasive, and unpersuasive speeches more unpersuasive. We also find that recall memory for the content of the speech mediates the effect of shared attention on political persuasion. The results are consistent with the notion that shared attention on mass broadcasted information results in deeper processing of the content, rendering judgments more extreme. In all, our findings imply that shared attention is a cognitive capacity that supports large-scale social coordination, where multitudes of people can cognitively prioritize simultaneously coattended information. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Speech therapy after thyroidectomy
Wu, Che-Wei
2017-01-01
Common complaints of patients who have received thyroidectomy include dysphonia (voice dysfunction) and dysphagia (difficulty swallowing). One cause of these surgical outcomes is recurrent laryngeal nerve paralysis. Many studies have discussed the effectiveness of speech therapy (e.g., voice therapy and dysphagia therapy) for improving dysphonia and dysphagia, but not specifically in patients who have received thyroidectomy. Therefore, the aim of this paper was to discuss issues regarding speech therapy such as voice therapy and dysphagia for patients after thyroidectomy. Another aim was to review the literature on speech therapy for patients with recurrent laryngeal nerve paralysis after thyroidectomy. Databases used for the literature review in this study included, PubMed, MEDLINE, Academic Search Primer, ERIC, CINAHL Plus, and EBSCO. The articles retrieved by database searches were classified and screened for relevance by using EndNote. Of the 936 articles retrieved, 18 discussed “voice assessment and thyroidectomy”, 3 discussed “voice therapy and thyroidectomy”, and 11 discussed “surgical interventions for voice restoration after thyroidectomy”. Only 3 studies discussed topics related to “swallowing function assessment/treatment and thyroidectomy”. Although many studies have investigated voice changes and assessment methods in thyroidectomy patients, few recent studies have investigated speech therapy after thyroidectomy. Additionally, some studies have addressed dysphagia after thyroidectomy, but few have discussed assessment and treatment of dysphagia after thyroidectomy. PMID:29142841
Moulin, Annie; Bernard, André; Tordella, Laurent; Vergne, Judith; Gisbert, Annie; Martin, Christian; Richard, Céline
2017-05-01
Speech perception scores are widely used to assess patient's functional hearing, yet most linguistic material used in these audiometric tests dates to before the availability of large computerized linguistic databases. In an ENT clinic population of 120 patients with median hearing loss of 43-dB HL, we quantified the variability and the sensitivity of speech perception scores to hearing loss, measured using disyllabic word lists, as a function of both the number of ten-word lists and type of scoring used (word, syllables or phonemes). The mean word recognition scores varied significantly across lists from 54 to 68%. The median of the variability of the word recognition score ranged from 30% for one ten-word list down to 20% for three ten-word lists. Syllabic and phonemic scores showed much less variability with standard deviations decreasing by 1.15 with the use of syllabic scores and by 1.45 with phonemic scores. The sensitivity of each list to hearing loss and distortions varied significantly. There was an increase in the minimum effect size that could be seen for syllabic scores compared to word scores, with no significant further improvement with phonemic scores. The use of at least two ten-word lists, quoted in syllables rather than in whole words, contributed to a large decrease in variability and an increase in sensitivity to hearing loss. However, those results emphasize the need of using updated linguistic material for clinical speech score assessments.
Emotion to emotion speech conversion in phoneme level
NASA Astrophysics Data System (ADS)
Bulut, Murtaza; Yildirim, Serdar; Busso, Carlos; Lee, Chul Min; Kazemzadeh, Ebrahim; Lee, Sungbok; Narayanan, Shrikanth
2004-10-01
Having an ability to synthesize emotional speech can make human-machine interaction more natural in spoken dialogue management. This study investigates the effectiveness of prosodic and spectral modification in phoneme level on emotion-to-emotion speech conversion. The prosody modification is performed with the TD-PSOLA algorithm (Moulines and Charpentier, 1990). We also transform the spectral envelopes of source phonemes to match those of target phonemes using LPC-based spectral transformation approach (Kain, 2001). Prosodic speech parameters (F0, duration, and energy) for target phonemes are estimated from the statistics obtained from the analysis of an emotional speech database of happy, angry, sad, and neutral utterances collected from actors. Listening experiments conducted with native American English speakers indicate that the modification of prosody only or spectrum only is not sufficient to elicit targeted emotions. The simultaneous modification of both prosody and spectrum results in higher acceptance rates of target emotions, suggesting that not only modeling speech prosody but also modeling spectral patterns that reflect underlying speech articulations are equally important to synthesize emotional speech with good quality. We are investigating suprasegmental level modifications for further improvement in speech quality and expressiveness.
Analyzing a multimodal biometric system using real and virtual users
NASA Astrophysics Data System (ADS)
Scheidat, Tobias; Vielhauer, Claus
2007-02-01
Three main topics of recent research on multimodal biometric systems are addressed in this article: The lack of sufficiently large multimodal test data sets, the influence of cultural aspects and data protection issues of multimodal biometric data. In this contribution, different possibilities are presented to extend multimodal databases by generating so-called virtual users, which are created by combining single biometric modality data of different users. Comparative tests on databases containing real and virtual users based on a multimodal system using handwriting and speech are presented, to study to which degree the use of virtual multimodal databases allows conclusions with respect to recognition accuracy in comparison to real multimodal data. All tests have been carried out on databases created from donations from three different nationality groups. This allows to review the experimental results both in general and in context of cultural origin. The results show that in most cases the usage of virtual persons leads to lower accuracy than the usage of real users in terms of the measurement applied: the Equal Error Rate. Finally, this article will address the general question how the concept of virtual users may influence the data protection requirements for multimodal evaluation databases in the future.
Russo, Frank A.
2018-01-01
The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976. PMID:29768426
Toutios, Asterios; Narayanan, Shrikanth S
2016-01-01
Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.
Raghavan, Ramesh; Camarata, Stephen; White, Karl; Barbaresi, William; Parish, Susan; Krahn, Gloria
2018-05-17
The aim of the study was to provide an overview of population science as applied to speech and language disorders, illustrate data sources, and advance a research agenda on the epidemiology of these conditions. Computer-aided database searches were performed to identify key national surveys and other sources of data necessary to establish the incidence, prevalence, and course and outcome of speech and language disorders. This article also summarizes a research agenda that could enhance our understanding of the epidemiology of these disorders. Although the data yielded estimates of prevalence and incidence for speech and language disorders, existing sources of data are inadequate to establish reliable rates of incidence, prevalence, and outcomes for speech and language disorders at the population level. Greater support for inclusion of speech and language disorder-relevant questions is necessary in national health surveys to build the population science in the field.
TOUTIOS, ASTERIOS; NARAYANAN, SHRIKANTH S.
2016-01-01
Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development. PMID:27833745
Vieira, Vanessa Pedrosa; De Biase, Noemi; Peccin, Maria Stella; Atallah, Alvaro Nagib
2009-06-01
To evaluate the methodological adequacy of voice and laryngeal study designs published in speech-language pathology and otorhinolaryngology journals indexed for the ISI Web of Knowledge (ISI Web) and the MEDLINE database. A cross-sectional study conducted at the Universidade Federal de São Paulo (Federal University of São Paulo). Two Brazilian speech-language pathology and otorhinolaryngology journals (Pró-Fono and Revista Brasileira de Otorrinolaringologia) and two international speech-language pathology and otorhinolaryngology journals (Journal of Voice, Laryngoscope), all dated between 2000 and 2004, were hand-searched by specialists. Subsequently, voice and larynx publications were separated, and a speech-language pathologist and otorhinolaryngologist classified 374 articles from the four journals according to objective and study design. The predominant objective contained in the articles was that of primary diagnostic evaluation (27%), and the most frequent study design was case series (33.7%). A mere 7.8% of the studies were designed adequately with respect to the stated objectives. There was no statistical difference in the methodological quality of studies indexed for the ISI Web and the MEDLINE database. The studies published in both national journals, indexed for the MEDLINE database, and international journals, indexed for the ISI Web, demonstrate weak methodology, with research poorly designed to meet the proposed objectives. There is much scientific work to be done in order to decrease uncertainty in the field analysed.
A Mis-recognized Medical Vocabulary Correction System for Speech-based Electronic Medical Record
Seo, Hwa Jeong; Kim, Ju Han; Sakabe, Nagamasa
2002-01-01
Speech recognition as an input tool for electronic medical record (EMR) enables efficient data entry at the point of care. However, the recognition accuracy for medical vocabulary is much poorer than that for doctor-patient dialogue. We developed a mis-recognized medical vocabulary correction system based on syllable-by-syllable comparison of speech text against medical vocabulary database. Using specialty medical vocabulary, the algorithm detects and corrects mis-recognized medical vocabularies in narrative text. Our preliminary evaluation showed 94% of accuracy in mis-recognized medical vocabulary correction.
Freedom of racist speech: Ego and expressive threats.
White, Mark H; Crandall, Christian S
2017-09-01
Do claims of "free speech" provide cover for prejudice? We investigate whether this defense of racist or hate speech serves as a justification for prejudice. In a series of 8 studies (N = 1,624), we found that explicit racial prejudice is a reliable predictor of the "free speech defense" of racist expression. Participants endorsed free speech values for singing racists songs or posting racist comments on social media; people high in prejudice endorsed free speech more than people low in prejudice (meta-analytic r = .43). This endorsement was not principled-high levels of prejudice did not predict endorsement of free speech values when identical speech was directed at coworkers or the police. Participants low in explicit racial prejudice actively avoided endorsing free speech values in racialized conditions compared to nonracial conditions, but participants high in racial prejudice increased their endorsement of free speech values in racialized conditions. Three experiments failed to find evidence that defense of racist speech by the highly prejudiced was based in self-relevant or self-protective motives. Two experiments found evidence that the free speech argument protected participants' own freedom to express their attitudes; the defense of other's racist speech seems motivated more by threats to autonomy than threats to self-regard. These studies serve as an elaboration of the Justification-Suppression Model (Crandall & Eshleman, 2003) of prejudice expression. The justification of racist speech by endorsing fundamental political values can serve to buffer racial and hate speech from normative disapproval. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Inner Speech's Relationship with Overt Speech in Poststroke Aphasia
ERIC Educational Resources Information Center
Stark, Brielle C.; Geva, Sharon; Warburton, Elizabeth A.
2017-01-01
Purpose: Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech…
Kong, Anthony Pak-Hin; Law, Sam-Po; Kwan, Connie Ching-Yin; Lai, Christy; Lam, Vivian
2014-01-01
Gestures are commonly used together with spoken language in human communication. One major limitation of gesture investigations in the existing literature lies in the fact that the coding of forms and functions of gestures has not been clearly differentiated. This paper first described a recently developed Database of Speech and GEsture (DoSaGE) based on independent annotation of gesture forms and functions among 119 neurologically unimpaired right-handed native speakers of Cantonese (divided into three age and two education levels), and presented findings of an investigation examining how gesture use was related to age and linguistic performance. Consideration of these two factors, for which normative data are currently very limited or lacking in the literature, is relevant and necessary when one evaluates gesture employment among individuals with and without language impairment. Three speech tasks, including monologue of a personally important event, sequential description, and story-telling, were used for elicitation. The EUDICO Linguistic ANnotator (ELAN) software was used to independently annotate each participant’s linguistic information of the transcript, forms of gestures used, and the function for each gesture. About one-third of the subjects did not use any co-verbal gestures. While the majority of gestures were non-content-carrying, which functioned mainly for reinforcing speech intonation or controlling speech flow, the content-carrying ones were used to enhance speech content. Furthermore, individuals who are younger or linguistically more proficient tended to use fewer gestures, suggesting that normal speakers gesture differently as a function of age and linguistic performance. PMID:25667563
Caballero-Morales, Santiago-Omar
2013-01-01
An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410
Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin
2016-01-01
Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
Cardoso, Rita; Mercier, Céline; Santos, Helena; Atkinson-Clement, Cyril; Carvalho, Joana; Welby, Pauline; Oliveira, Pedro; D'Imperio, Mariapaola; Frota, Sónia; Letanneux, Alban; Vigario, Marina; Cruz, Marisa; Martins, Isabel Pavão; Viallet, François
2016-01-01
Introduction Individuals with Parkinson's disease (PD) have to deal with several aspects of voice and speech decline and thus alteration of communication ability during the course of the disease. Among these communication impairments, 3 major challenges include: (1) dysarthria, consisting of orofacial motor dysfunction and dysprosody, which is linked to the neurodegenerative processes; (2) effects of the pharmacological treatment, which vary according to the disease stage; and (3) particular speech modifications that may be language-specific, that is, dependent on the language spoken by the patients. The main objective of the FraLusoPark project is to provide a thorough evaluation of changes in PD speech as a result of pharmacological treatment and disease duration in 2 different languages (French vs European Portuguese). Methods and analysis Individuals with PD are enrolled in the study in France (N=60) and Portugal (N=60). Their global motor disability and orofacial motor functions is assessed with specific clinical rating scales, without (OFF) and with (ON) pharmacological treatment. 2 groups of 60 healthy age-matched volunteers provide the reference for between-group comparisons. Along with the clinical examinations, several speech tasks are recorded to obtain acoustic and perceptual measures. Patient-reported outcome measures are used to assess the psychosocial impact of dysarthria on quality of life. Ethics and dissemination The study has been approved by the local responsible committees on human experimentation and is conducted in accordance with the ethical standards. A valuable large-scale database of speech recordings and metadata from patients with PD in France and Portugal will be constructed. Results will be disseminated in several articles in peer-reviewed journals and in conference presentations. Recommendations on how to assess speech and voice disorders in individuals with PD to monitor the progression and management of symptoms will be provided. Trial registration number NCT02753192, Pre-results. PMID:27856480
How our own speech rate influences our perception of others.
Bosker, Hans Rutger
2017-08-01
In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through 6 experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing prerecorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Transitioning from analog to digital audio recording in childhood speech sound disorders.
Shriberg, Lawrence D; McSweeny, Jane L; Anderson, Bruce E; Campbell, Thomas F; Chial, Michael R; Green, Jordan R; Hauner, Katherina K; Moore, Christopher A; Rusiewicz, Heather L; Wilson, David L
2005-06-01
Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants' speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise.
Transitioning from analog to digital audio recording in childhood speech sound disorders
Shriberg, Lawrence D.; McSweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.
2014-01-01
Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants’ speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise. PMID:16019779
Celeste, Letícia Corrêa; Zanoni, Graziela; Queiroga, Bianca; Alves, Luciana Mendonça
2017-03-09
To map the profile of Brazilian Speech Therapists who report acting in Educational Speech Therapy, with regard to aspects related to training, performance and professional experience. Retrospective study, based on secondary database analysis of the Federal Council of Hearing and Speech Sciences on the questionnaires reporting acting with Educational Environment. 312 questionnaires were completed, of which 93.3% by women aged 30-39 years. Most Speech Therapists continued the studies, opting mostly for specialization. Almost 50% of respondents, have worked for less than six years with the speciality, most significantly in the public service (especially municipal) and private area. The profile of the Speech Therapists active in the Educational area in Brazil is a professional predominantly female, who values to continue their studies after graduation, looking mostly for specialization in the following areas: Audiology and Orofacial Motor. The time experience of the majority is up to 10 years of work whose nature is divided mainly in public (municipal) and private schools. The performance of Speech Therapists in the Educational area concentrates in Elementary and Primary school, with varied workload.
Telephone-quality pathological speech classification using empirical mode decomposition.
Kaleem, M F; Ghoraani, B; Guergachi, A; Krishnan, S
2011-01-01
This paper presents a computationally simple and effective methodology based on empirical mode decomposition (EMD) for classification of telephone quality normal and pathological speech signals. EMD is used to decompose continuous normal and pathological speech signals into intrinsic mode functions, which are analyzed to extract physically meaningful and unique temporal and spectral features. Using continuous speech samples from a database of 51 normal and 161 pathological speakers, which has been modified to simulate telephone quality speech under different levels of noise, a linear classifier is used with the feature vector thus obtained to obtain a high classification accuracy, thereby demonstrating the effectiveness of the methodology. The classification accuracy reported in this paper (89.7% for signal-to-noise ratio 30 dB) is a significant improvement over previously reported results for the same task, and demonstrates the utility of our methodology for cost-effective remote voice pathology assessment over telephone channels.
V2S: Voice to Sign Language Translation System for Malaysian Deaf People
NASA Astrophysics Data System (ADS)
Mean Foong, Oi; Low, Tang Jung; La, Wai Wan
The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.
Open-Source Multi-Language Audio Database for Spoken Language Processing Applications
2012-12-01
Mandarin, and Russian . Approximately 30 hours of speech were collected for each language. Each passage has been carefully transcribed at the...manual and automatic methods. The Russian passages have not yet been marked at the phonetic level. Another phase of the work was to explore...You Tube. 300 passages were collected in each of three languages—English, Mandarin, and Russian . Approximately 30 hours of speech were
Tsanas, Athanasios; Zañartu, Matías; Little, Max A.; Fox, Cynthia; Ramig, Lorraine O.; Clifford, Gari D.
2014-01-01
There has been consistent interest among speech signal processing researchers in the accurate estimation of the fundamental frequency (F0) of speech signals. This study examines ten F0 estimation algorithms (some well-established and some proposed more recently) to determine which of these algorithms is, on average, better able to estimate F0 in the sustained vowel /a/. Moreover, a robust method for adaptively weighting the estimates of individual F0 estimation algorithms based on quality and performance measures is proposed, using an adaptive Kalman filter (KF) framework. The accuracy of the algorithms is validated using (a) a database of 117 synthetic realistic phonations obtained using a sophisticated physiological model of speech production and (b) a database of 65 recordings of human phonations where the glottal cycles are calculated from electroglottograph signals. On average, the sawtooth waveform inspired pitch estimator and the nearly defect-free algorithms provided the best individual F0 estimates, and the proposed KF approach resulted in a ∼16% improvement in accuracy over the best single F0 estimation algorithm. These findings may be useful in speech signal processing applications where sustained vowels are used to assess vocal quality, when very accurate F0 estimation is required. PMID:24815269
Library Instruction in Communication Disorders: Which Databases Should Be Prioritized?
ERIC Educational Resources Information Center
Grabowsky, Adelia
2015-01-01
The field of communication disorders encompasses the health science disciplines of both speech-language pathology and audiology. Pertinent literature for communication disorders can be found in a number of databases. Librarians providing information literacy instruction may not have the time to cover more than a few resources. This study develops…
ERIC Educational Resources Information Center
Iuzzini-Seigel, Jenya; Hogan, Tiffany P.; Green, Jordan R.
2017-01-01
Purpose: The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and…
Evaluating deep learning architectures for Speech Emotion Recognition.
Fayek, Haytham M; Lech, Margaret; Cavedon, Lawrence
2017-08-01
Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cepstral domain modification of audio signals for data embedding: preliminary results
NASA Astrophysics Data System (ADS)
Gopalan, Kaliappan
2004-06-01
A method of embedding data in an audio signal using cepstral domain modification is described. Based on successful embedding in the spectral points of perceptually masked regions in each frame of speech, first the technique was extended to embedding in the log spectral domain. This extension resulted at approximately 62 bits /s of embedding with less than 2 percent of bit error rate (BER) for a clean cover speech (from the TIMIT database), and about 2.5 percent for a noisy speech (from an air traffic controller database), when all frames - including silence and transition between voiced and unvoiced segments - were used. Bit error rate increased significantly when the log spectrum in the vicinity of a formant was modified. In the next procedure, embedding by altering the mean cepstral values of two ranges of indices was studied. Tests on both a noisy utterance and a clean utterance indicated barely noticeable perceptual change in speech quality when lower range of cepstral indices - corresponding to vocal tract region - was modified in accordance with data. With an embedding capacity of approximately 62 bits/s - using one bit per each frame regardless of frame energy or type of speech - initial results showed a BER of less than 1.5 percent for a payload capacity of 208 embedded bits using the clean cover speech. BER of less than 1.3 percent resulted for the noisy host with a capacity was 316 bits. When the cepstrum was modified in the region of excitation, BER increased to over 10 percent. With quantization causing no significant problem, the technique warrants further studies with different cepstral ranges and sizes. Pitch-synchronous cepstrum modification, for example, may be more robust to attacks. In addition, cepstrum modification in regions of speech that are perceptually masked - analogous to embedding in frequency masked regions - may yield imperceptible stego audio with low BER.
Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis
ERIC Educational Resources Information Center
Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.
2009-01-01
Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
The effect of simultaneous text on the recall of noise-degraded speech.
Grossman, Irina; Rajan, Ramesh
2017-05-01
Written and spoken language utilize the same processing system, enabling text to modulate speech processing. We investigated how simultaneously presented text affected speech recall in babble noise using a retrospective recall task. Participants were presented with text-speech sentence pairs in multitalker babble noise and then prompted to recall what they heard or what they read. In Experiment 1, sentence pairs were either congruent or incongruent and they were presented in silence or at 1 of 4 noise levels. Audio and Visual control groups were also tested with sentences presented in only 1 modality. Congruent text facilitated accurate recall of degraded speech; incongruent text had no effect. Text and speech were seldom confused for each other. A consideration of the effects of the language background found that monolingual English speakers outperformed early multilinguals at recalling degraded speech; however the effects of text on speech processing were analogous. Experiment 2 considered if the benefit provided by matching text was maintained when the congruency of the text and speech becomes more ambiguous because of the addition of partially mismatching text-speech sentence pairs that differed only on their final keyword and because of the use of low signal-to-noise ratios. The experiment focused on monolingual English speakers; the results showed that even though participants commonly confused text-for-speech during incongruent text-speech pairings, these confusions could not fully account for the benefit provided by matching text. Thus, we uniquely demonstrate that congruent text benefits the recall of noise-degraded speech. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
(abstract) Synthesis of Speaker Facial Movements to Match Selected Speech Sequences
NASA Technical Reports Server (NTRS)
Scott, Kenneth C.
1994-01-01
We are developing a system for synthesizing image sequences the simulate the facial motion of a speaker. To perform this synthesis, we are pursuing two major areas of effort. We are developing the necessary computer graphics technology to synthesize a realistic image sequence of a person speaking selected speech sequences. Next, we are developing a model that expresses the relation between spoken phonemes and face/mouth shape. A subject is video taped speaking an arbitrary text that contains expression of the full list of desired database phonemes. The subject is video taped from the front speaking normally, recording both audio and video detail simultaneously. Using the audio track, we identify the specific video frames on the tape relating to each spoken phoneme. From this range we digitize the video frame which represents the extreme of mouth motion/shape. Thus, we construct a database of images of face/mouth shape related to spoken phonemes. A selected audio speech sequence is recorded which is the basis for synthesizing a matching video sequence; the speaker need not be the same as used for constructing the database. The audio sequence is analyzed to determine the spoken phoneme sequence and the relative timing of the enunciation of those phonemes. Synthesizing an image sequence corresponding to the spoken phoneme sequence is accomplished using a graphics technique known as morphing. Image sequence keyframes necessary for this processing are based on the spoken phoneme sequence and timing. We have been successful in synthesizing the facial motion of a native English speaker for a small set of arbitrary speech segments. Our future work will focus on advancement of the face shape/phoneme model and independent control of facial features.
Parental numeric language input to Mandarin Chinese and English speaking preschool children.
Chang, Alicia; Sandhofer, Catherine M; Adelchanow, Lauren; Rottman, Benjamin
2011-03-01
The present study examined the number-specific parental language input to Mandarin- and English-speaking preschool-aged children. Mandarin and English transcripts from the CHILDES database were examined for amount of numeric speech, specific types of numeric speech and syntactic frames in which numeric speech appeared. The results showed that Mandarin-speaking parents talked about number more frequently than English-speaking parents. Further, the ways in which parents talked about number terms in the two languages was more supportive of a cardinal interpretation in Mandarin than in English. We discuss these results in terms of their implications for numerical understanding and later mathematical performance.
Speaker emotion recognition: from classical classifiers to deep neural networks
NASA Astrophysics Data System (ADS)
Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri
2018-04-01
Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.
Perception of Sung Speech in Bimodal Cochlear Implant Users.
Crew, Joseph D; Galvin, John J; Fu, Qian-Jie
2016-11-11
Combined use of a hearing aid (HA) and cochlear implant (CI) has been shown to improve CI users' speech and music performance. However, different hearing devices, test stimuli, and listening tasks may interact and obscure bimodal benefits. In this study, speech and music perception were measured in bimodal listeners for CI-only, HA-only, and CI + HA conditions, using the Sung Speech Corpus, a database of monosyllabic words produced at different fundamental frequencies. Sentence recognition was measured using sung speech in which pitch was held constant or varied across words, as well as for spoken speech. Melodic contour identification (MCI) was measured using sung speech in which the words were held constant or varied across notes. Results showed that sentence recognition was poorer with sung speech relative to spoken, with little difference between sung speech with a constant or variable pitch; mean performance was better with CI-only relative to HA-only, and best with CI + HA. MCI performance was better with constant words versus variable words; mean performance was better with HA-only than with CI-only and was best with CI + HA. Relative to CI-only, a strong bimodal benefit was observed for speech and music perception. Relative to the better ear, bimodal benefits remained strong for sentence recognition but were marginal for MCI. While variations in pitch and timbre may negatively affect CI users' speech and music perception, bimodal listening may partially compensate for these deficits. © The Author(s) 2016.
Chaves, Cristiane Ribeiro; Campbell, Melanie; Côrtes Gama, Ana Cristina
2017-03-01
This study aimed to determine the influence of native language on the auditory-perceptual assessment of voice, as completed by Brazilian and Anglo-Canadian listeners using Brazilian vocal samples and the grade, roughness, breathiness, asthenia, strain (GRBAS) scale. This is an analytical, observational, comparative, and transversal study conducted at the Speech Language Pathology Department of the Federal University of Minas Gerais in Brazil, and at the Communication Sciences and Disorders Department of the University of Alberta in Canada. The GRBAS scale, connected speech, and a sustained vowel were used in this study. The vocal samples were drawn randomly from a database of recorded speech of Brazilian adults, some with healthy voices and some with voice disorders. The database is housed at the Federal University of Minas Gerais. Forty-six samples of connected speech (recitation of days of the week), produced by 35 women and 11 men, and 46 samples of the sustained vowel /a/, produced by 37 women and 9 men, were used in this study. The listeners were divided into two groups of three speech therapists, according to nationality: Brazilian or Anglo-Canadian. The groups were matched according to the years of professional experience of participants. The weighted kappa was used to calculate the intra- and inter-rater agreements, with 95% confidence intervals, respectively. An analysis of the intra-rater agreement showed that Brazilians and Canadians had similar results in auditory-perceptual evaluation of sustained vowel and connected speech. The results of the inter-rater agreement of connected speech and sustained vowel indicated that Brazilians and Canadians had, respectively, moderate agreement on the overall severity (0.57 and 0.50), breathiness (0.45 and 0.45), and asthenia (0.50 and 0.46); poor correlation on roughness (0.19 and 0.007); and weak correlation on strain to connected speech (0.22), and moderate correlation to sustained vowel (0.50). In general, auditory-perceptual evaluation is not influenced by the native language on most dimensions of the perceptual parameters of the GRBAS scale. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Speech recognition: Acoustic phonetic and lexical knowledge representation
NASA Astrophysics Data System (ADS)
Zue, V. W.
1983-02-01
The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words, and determine to what extent the phontactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.
Sobol-Shikler, Tal; Robinson, Peter
2010-07-01
We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance.
Extensions to the Speech Disorders Classification System (SDCS)
Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.
2010-01-01
This report describes three extensions to a classification system for pediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). Part I describes a classification extension to the SDCS to differentiate motor speech disorders from speech delay and to differentiate among three subtypes of motor speech disorders. Part II describes the Madison Speech Assessment Protocol (MSAP), an approximately two-hour battery of 25 measures that includes 15 speech tests and tasks. Part III describes the Competence, Precision, and Stability Analytics (CPSA) framework, a current set of approximately 90 perceptual- and acoustic-based indices of speech, prosody, and voice used to quantify and classify subtypes of Speech Sound Disorders (SSD). A companion paper, Shriberg, Fourakis, et al. (2010) provides reliability estimates for the perceptual and acoustic data reduction methods used in the SDCS. The agreement estimates in the companion paper support the reliability of SDCS methods and illustrate the complementary roles of perceptual and acoustic methods in diagnostic analyses of SSD of unknown origin. Examples of research using the extensions to the SDCS described in the present report include diagnostic findings for a sample of youth with motor speech disorders associated with galactosemia (Shriberg, Potter, & Strand, 2010) and a test of the hypothesis of apraxia of speech in a group of children with autism spectrum disorders (Shriberg, Paul, Black, & van Santen, 2010). All SDCS methods and reference databases running in the PEPPER (Programs to Examine Phonetic and Phonologic Evaluation Records; [Shriberg, Allen, McSweeny, & Wilson, 2001]) environment will be disseminated without cost when complete. PMID:20831378
Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali
2015-01-01
In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141
Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies.
Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol Y; Saltzman, Elliot; Goldstein, Louis
2010-09-13
Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed "speech-inversion." This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.
Phonologically-based biomarkers for major depressive disorder
NASA Astrophysics Data System (ADS)
Trevino, Andrea Carolina; Quatieri, Thomas Francis; Malyska, Nicolas
2011-12-01
Of increasing importance in the civilian and military population is the recognition of major depressive disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we introduce vocal biomarkers that are derived automatically from phonologically-based measures of speech rate. To assess our measures, we use a 35-speaker free-response speech database of subjects treated for depression over a 6-week duration. We find that dissecting average measures of speech rate into phone-specific characteristics and, in particular, combined phone-duration measures uncovers stronger relationships between speech rate and depression severity than global measures previously reported for a speech-rate biomarker. Results of this study are supported by correlation of our measures with depression severity and classification of depression state with these vocal measures. Our approach provides a general framework for analyzing individual symptom categories through phonological units, and supports the premise that speaking rate can be an indicator of psychomotor retardation severity.
Monson, Brian B; Lotto, Andrew J; Story, Brad H
2012-09-01
The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.
Ultrasound applicability in Speech Language Pathology and Audiology.
Barberena, Luciana da Silva; Brasil, Brunah de Castro; Melo, Roberta Michelon; Mezzomo, Carolina Lisbôa; Mota, Helena Bolli; Keske-Soares, Márcia
2014-01-01
To present recent studies that used the ultrasound in the fields of Speech Language Pathology and Audiology, which evidence possibilities of the applicability of this technique in different subareas. A bibliographic research was carried out in the PubMed database, using the keywords "ultrasonic," "speech," "phonetics," "Speech, Language and Hearing Sciences," "voice," "deglutition," and "myofunctional therapy," comprising some areas of Speech Language Pathology and Audiology Sciences. The keywords "ultrasound," "ultrasonography," "swallow," "orofacial myofunctional therapy," and "orofacial myology" were also used in the search. Studies in humans from the past 5 years were selected. In the preselection, duplicated studies, articles not fully available, and those that did not present direct relation between ultrasound and Speech Language Pathology and Audiology Sciences were discarded. The data were analyzed descriptively and classified subareas of Speech Language Pathology and Audiology Sciences. The following items were considered: purposes, participants, procedures, and results. We selected 12 articles for ultrasound versus speech/phonetics subarea, 5 for ultrasound versus voice, 1 for ultrasound versus muscles of mastication, and 10 for ultrasound versus swallow. Studies relating "ultrasound" and "Speech Language Pathology and Audiology Sciences" in the past 5 years were not found. Different studies on the use of ultrasound in Speech Language Pathology and Audiology Sciences were found. Each of them, according to its purpose, confirms new possibilities of the use of this instrument in the several subareas, aiming at a more accurate diagnosis and new evaluative and therapeutic possibilities.
Open Microphone Speech Understanding: Correct Discrimination Of In Domain Speech
NASA Technical Reports Server (NTRS)
Hieronymus, James; Aist, Greg; Dowding, John
2006-01-01
An ideal spoken dialogue system listens continually and determines which utterances were spoken to it, understands them and responds appropriately while ignoring the rest This paper outlines a simple method for achieving this goal which involves trading a slightly higher false rejection rate of in domain utterances for a higher correct rejection rate of Out of Domain (OOD) utterances. The system recognizes semantic entities specified by a unification grammar which is specialized by Explanation Based Learning (EBL). so that it only uses rules which are seen in the training data. The resulting grammar has probabilities assigned to each construct so that overgeneralizations are not a problem. The resulting system only recognizes utterances which reduce to a valid logical form which has meaning for the system and rejects the rest. A class N-gram grammar has been trained on the same training data. This system gives good recognition performance and offers good Out of Domain discrimination when combined with the semantic analysis. The resulting systems were tested on a Space Station Robot Dialogue Speech Database and a subset of the OGI conversational speech database. Both systems run in real time on a PC laptop and the present performance allows continuous listening with an acceptably low false acceptance rate. This type of open microphone system has been used in the Clarissa procedure reading and navigation spoken dialogue system which is being tested on the International Space Station.
Tonn, Christopher R; Grundfast, Kenneth M
2014-03-01
Otolaryngologists are asked to evaluate children who a parent, physician, or someone else believes is slow in developing speech. Therefore, an otolaryngologist should be familiar with milestones for normal speech development, the causes of delay in speech development, and the best ways to help assure that children develop the ability to speak in a normal way. To provide information for otolaryngologists that is helpful in the evaluation and management of children perceived to be delayed in developing speech. Data were obtained via literature searches, online databases, textbooks, and the most recent national guidelines on topics including speech delay and language delay and the underlying disorders that can cause delay in developing speech. Emphasis was placed on epidemiology, pathophysiology, most common presentation, and treatment strategies. Most of the sources referenced were published within the past 5 years. Our article is a summary of major causes of speech delay based on reliable sources as listed herein. Speech delay can be the manifestation of a spectrum of disorders affecting the language comprehension and/or speech production pathways, ranging from disorders involving global developmental limitations to motor dysfunction to hearing loss. Determining the cause of a child's delay in speech production is a time-sensitive issue because a child loses valuable opportunities in intellectual development if his or her communication defect is not addressed and ameliorated with treatment. Knowing several key items about each disorder can help otolaryngologists direct families to the correct health care provider to maximize the child's learning potential and intellectual growth curve.
Strand, Julia F
2014-03-01
A widely agreed-upon feature of spoken word recognition is that multiple lexical candidates in memory are simultaneously activated in parallel when a listener hears a word, and that those candidates compete for recognition (Luce, Goldinger, Auer, & Vitevitch, Perception 62:615-625, 2000; Luce & Pisoni, Ear and Hearing 19:1-36, 1998; McClelland & Elman, Cognitive Psychology 18:1-86, 1986). Because the presence of those competitors influences word recognition, much research has sought to quantify the processes of lexical competition. Metrics that quantify lexical competition continuously are more effective predictors of auditory and visual (lipread) spoken word recognition than are the categorical metrics traditionally used (Feld & Sommers, Speech Communication 53:220-228, 2011; Strand & Sommers, Journal of the Acoustical Society of America 130:1663-1672, 2011). A limitation of the continuous metrics is that they are somewhat computationally cumbersome and require access to existing speech databases. This article describes the Phi-square Lexical Competition Database (Phi-Lex): an online, searchable database that provides access to multiple metrics of auditory and visual (lipread) lexical competition for English words, available at www.juliastrand.com/phi-lex .
HomeBank: An Online Repository of Daylong Child-Centered Audio Recordings
VanDam, Mark; Warlaumont, Anne S.; Bergelson, Elika; Cristia, Alejandrina; Soderstrom, Melanie; De Palma, Paul; MacWhinney, Brian
2017-01-01
HomeBank is introduced here. It is a public, permanent, extensible, online database of daylong audio recorded in naturalistic environments. HomeBank serves two primary purposes. First, it is a repository for raw audio and associated files: one database requires special permissions, and another redacted database allows unrestricted public access. Associated files include metadata such as participant demographics and clinical diagnostics, automated annotations, and human-generated transcriptions and annotations. Many recordings use the child-perspective LENA recorders (LENA Research Foundation, Boulder, Colorado, United States), but various recordings and metadata can be accommodated. The HomeBank database can have both vetted and unvetted recordings, with different levels of accessibility. Additionally, HomeBank is an open repository for processing and analysis tools for HomeBank or similar data sets. HomeBank is flexible for users and contributors, making primary data available to researchers, especially those in child development, linguistics, and audio engineering. HomeBank facilitates researchers’ access to large-scale data and tools, linking the acoustic, auditory, and linguistic characteristics of children’s environments with a variety of variables including socioeconomic status, family characteristics, language trajectories, and disorders. Automated processing applied to daylong home audio recordings is now becoming widely used in early intervention initiatives, helping parents to provide richer speech input to at-risk children. PMID:27111272
Pinto, Serge; Cardoso, Rita; Sadat, Jasmin; Guimarães, Isabel; Mercier, Céline; Santos, Helena; Atkinson-Clement, Cyril; Carvalho, Joana; Welby, Pauline; Oliveira, Pedro; D'Imperio, Mariapaola; Frota, Sónia; Letanneux, Alban; Vigario, Marina; Cruz, Marisa; Martins, Isabel Pavão; Viallet, François; Ferreira, Joaquim J
2016-11-17
Individuals with Parkinson's disease (PD) have to deal with several aspects of voice and speech decline and thus alteration of communication ability during the course of the disease. Among these communication impairments, 3 major challenges include: (1) dysarthria, consisting of orofacial motor dysfunction and dysprosody, which is linked to the neurodegenerative processes; (2) effects of the pharmacological treatment, which vary according to the disease stage; and (3) particular speech modifications that may be language-specific, that is, dependent on the language spoken by the patients. The main objective of the FraLusoPark project is to provide a thorough evaluation of changes in PD speech as a result of pharmacological treatment and disease duration in 2 different languages (French vs European Portuguese). Individuals with PD are enrolled in the study in France (N=60) and Portugal (N=60). Their global motor disability and orofacial motor functions is assessed with specific clinical rating scales, without (OFF) and with (ON) pharmacological treatment. 2 groups of 60 healthy age-matched volunteers provide the reference for between-group comparisons. Along with the clinical examinations, several speech tasks are recorded to obtain acoustic and perceptual measures. Patient-reported outcome measures are used to assess the psychosocial impact of dysarthria on quality of life. The study has been approved by the local responsible committees on human experimentation and is conducted in accordance with the ethical standards. A valuable large-scale database of speech recordings and metadata from patients with PD in France and Portugal will be constructed. Results will be disseminated in several articles in peer-reviewed journals and in conference presentations. Recommendations on how to assess speech and voice disorders in individuals with PD to monitor the progression and management of symptoms will be provided. NCT02753192, Pre-results. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Kronenberger, William G.; Castellanos, Irina; Pisoni, David B.
2017-01-01
Purpose We sought to determine whether speech perception and language skills measured early after cochlear implantation in children who are deaf, and early postimplant growth in speech perception and language skills, predict long-term speech perception, language, and neurocognitive outcomes. Method Thirty-six long-term users of cochlear implants, implanted at an average age of 3.4 years, completed measures of speech perception, language, and executive functioning an average of 14.4 years postimplantation. Speech perception and language skills measured in the 1st and 2nd years postimplantation and open-set word recognition measured in the 3rd and 4th years postimplantation were obtained from a research database in order to assess predictive relations with long-term outcomes. Results Speech perception and language skills at 6 and 18 months postimplantation were correlated with long-term outcomes for language, verbal working memory, and parent-reported executive functioning. Open-set word recognition was correlated with early speech perception and language skills and long-term speech perception and language outcomes. Hierarchical regressions showed that early speech perception and language skills at 6 months postimplantation and growth in these skills from 6 to 18 months both accounted for substantial variance in long-term outcomes for language and verbal working memory that was not explained by conventional demographic and hearing factors. Conclusion Speech perception and language skills measured very early postimplantation, and early postimplant growth in speech perception and language, may be clinically relevant markers of long-term language and neurocognitive outcomes in users of cochlear implants. Supplemental materials https://doi.org/10.23641/asha.5216200 PMID:28724130
Automatic speech recognition technology development at ITT Defense Communications Division
NASA Technical Reports Server (NTRS)
White, George M.
1977-01-01
An assessment of the applications of automatic speech recognition to defense communication systems is presented. Future research efforts include investigations into the following areas: (1) dynamic programming; (2) recognition of speech degraded by noise; (3) speaker independent recognition; (4) large vocabulary recognition; (5) word spotting and continuous speech recognition; and (6) isolated word recognition.
ERIC Educational Resources Information Center
Oliveira, Carla; Lousada, Marisa; Jesus, Luis M. T.
2015-01-01
Children with speech sound disorders (SSD) represent a large number of speech and language therapists' caseloads. The intervention with children who have SSD can involve different therapy approaches, and these may be articulatory or phonologically based. Some international studies reveal a widespread application of articulatory based approaches in…
2015-05-28
recognition is simpler and requires less computational resources compared to other inputs such as facial expressions . The Berlin database of Emotional ...Processing Magazine, IEEE, vol. 18, no. 1, pp. 32– 80, 2001. [15] K. R. Scherer, T. Johnstone, and G. Klasmeyer, “Vocal expression of emotion ...Network for Real-Time Speech- Emotion Recognition 5a. CONTRACT NUMBER IN-HOUSE 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 62788F 6. AUTHOR(S) Q
Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting.
Wöllmer, Martin; Marchi, Erik; Squartini, Stefano; Schuller, Björn
2011-09-01
Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today's automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database-a corpus containing emotionally colored conversations with a cognitive system for "Sensitive Artificial Listening".
NASA Astrophysics Data System (ADS)
Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.
2009-12-01
This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions
NASA Astrophysics Data System (ADS)
Wang, Longbiao; Minami, Kazue; Yamamoto, Kazumasa; Nakagawa, Seiichi
In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.
Speech disorders in neurofibromatosis type 1: a sample survey.
Cosyns, Marjan; Vandeweghe, Lies; Mortier, Geert; Janssens, Sandra; Van Borsel, John
2010-01-01
Neurofibromatosis type 1 (NF1) is an autosomal-dominant neurocutaneous disorder with an estimated prevalence of two to three cases per 10,000 population. While the physical characteristics have been well documented, speech disorders have not been fully characterized in NF1 patients. This study serves as a pilot to identify key issues in the speech of NF1 patients. In particular, the aim is to explore further the occurrence and nature of problems associated with speech as perceived by the patients themselves. A questionnaire was sent to 149 patients with NF1 registered at the Department of Genetics, Ghent University Hospital. The questionnaire inquired about articulation, hearing, breathing, voice, resonance and fluency. Sixty individuals ranging in age from 4.5 to 61.3 years returned completed questionnaires and these served as the database for the study. The results of this sample survey were compared with data of the normal population. About two-thirds of participants experienced at least one speech or speech-related problem of any type. Compared with the normal population, the NF1 group indicated more articulation difficulties, hearing impairment, abnormalities in loudness, and stuttering. The results indicate that speech difficulties are an area of interest in the NF1 population. Further research to elucidate these findings is needed.
SyllabO+: A new tool to study sublexical phenomena in spoken Quebec French.
Bédard, Pascale; Audet, Anne-Marie; Drouin, Patrick; Roy, Johanna-Pascale; Rivard, Julie; Tremblay, Pascale
2017-10-01
Sublexical phonotactic regularities in language have a major impact on language development, as well as on speech processing and production throughout the entire lifespan. To understand the impact of phonotactic regularities on speech and language functions at the behavioral and neural levels, it is essential to have access to oral language corpora to study these complex phenomena in different languages. Yet, probably because of their complexity, oral language corpora remain less common than written language corpora. This article presents the first corpus and database of spoken Quebec French syllables and phones: SyllabO+. This corpus contains phonetic transcriptions of over 300,000 syllables (over 690,000 phones) extracted from recordings of 184 healthy adult native Quebec French speakers, ranging in age from 20 to 97 years. To ensure the representativeness of the corpus, these recordings were made in both formal and familiar communication contexts. Phonotactic distributional statistics (e.g., syllable and co-occurrence frequencies, percentages, percentile ranks, transition probabilities, and pointwise mutual information) were computed from the corpus. An open-access online application to search the database was developed, and is available at www.speechneurolab.ca/syllabo . In this article, we present a brief overview of the corpus, as well as the syllable and phone databases, and we discuss their practical applications in various fields of research, including cognitive neuroscience, psycholinguistics, neurolinguistics, experimental psychology, phonetics, and phonology. Nonacademic practical applications are also discussed, including uses in speech-language pathology.
Developing a corpus of spoken language variability
NASA Astrophysics Data System (ADS)
Carmichael, Lesley; Wright, Richard; Wassink, Alicia Beckford
2003-10-01
We are developing a novel, searchable corpus as a research tool for investigating phonetic and phonological phenomena across various speech styles. Five speech styles have been well studied independently in previous work: reduced (casual), careful (hyperarticulated), citation (reading), Lombard effect (speech in noise), and ``motherese'' (child-directed speech). Few studies to date have collected a wide range of styles from a single set of speakers, and fewer yet have provided publicly available corpora. The pilot corpus includes recordings of (1) a set of speakers participating in a variety of tasks designed to elicit the five speech styles, and (2) casual peer conversations and wordlists to illustrate regional vowels. The data include high-quality recordings and time-aligned transcriptions linked to text files that can be queried. Initial measures drawn from the database provide comparison across speech styles along the following acoustic dimensions: MLU (changes in unit duration); relative intra-speaker intensity changes (mean and dynamic range); and intra-speaker pitch values (minimum, maximum, mean, range). The corpus design will allow for a variety of analyses requiring control of demographic and style factors, including hyperarticulation variety, disfluencies, intonation, discourse analysis, and detailed spectral measures.
Speech-Like Rhythm in a Voiced and Voiceless Orangutan Call
Lameira, Adriano R.; Hardus, Madeleine E.; Bartlett, Adrian M.; Shumaker, Robert W.; Wich, Serge A.; Menken, Steph B. J.
2015-01-01
The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ∼5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech. Three essential predictions remain, however, to be tested to assess this hypothesis' validity; (i) Great apes, our closest relatives, should likewise produce 5Hz-rhythm signals, (ii) speech-like rhythm should involve calls articulatorily similar to consonants and vowels given that speech rhythm is the direct product of stringing together these two basic elements, and (iii) speech-like rhythm should be experience-based. Via cinematic analyses we demonstrate that an ex-entertainment orangutan produces two calls at a speech-like rhythm, coined “clicks” and “faux-speech.” Like voiceless consonants, clicks required no vocal fold action, but did involve independent manoeuvring over lips and tongue. In parallel to vowels, faux-speech showed harmonic and formant modulations, implying vocal fold and supralaryngeal action. This rhythm was several times faster than orangutan chewing rates, as observed in monkeys and humans. Critically, this rhythm was seven-fold faster, and contextually distinct, than any other known rhythmic calls described to date in the largest database of the orangutan repertoire ever assembled. The first two predictions advanced by this study are validated and, based on parsimony and exclusion of potential alternative explanations, initial support is given to the third prediction. Irrespectively of the putative origins of these calls and underlying mechanisms, our findings demonstrate irrevocably that great apes are not respiratorily, articulatorilly, or neurologically constrained for the production of consonant- and vowel-like calls at speech rhythm. Orangutan clicks and faux-speech confirm the importance of rhythmic speech antecedents within the primate lineage, and highlight potential articulatory homologies between great ape calls and human consonants and vowels. PMID:25569211
Neural networks supporting audiovisual integration for speech: A large-scale lesion study.
Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius
2018-06-01
Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
Effects of context and word class on lexical retrieval in Chinese speakers with anomic aphasia.
Law, Sam-Po; Kong, Anthony Pak-Hin; Lai, Loretta Wing-Shan; Lai, Christy
2015-01-01
Differences in processing nouns and verbs have been investigated intensely in psycholinguistics and neuropsychology in past decades. However, the majority of studies examining retrieval of these word classes have involved tasks of single word stimuli or responses. While the results have provided rich information for addressing issues about grammatical class distinctions, it is unclear whether they have adequate ecological validity for understanding lexical retrieval in connected speech which characterizes daily verbal communication. Previous investigations comparing retrieval of nouns and verbs in single word production and connected speech have reported either discrepant performance between the two contexts with presence of word class dissociation in picture naming but absence in connected speech, or null effects of word class. In addition, word finding difficulties have been found to be less severe in connected speech than picture naming. However, these studies have failed to match target stimuli of the two word classes and between tasks on psycholinguistic variables known to affect performance in response latency and/or accuracy. The present study compared lexical retrieval of nouns and verbs in picture naming and connected speech from picture description, procedural description, and story-telling among 19 Chinese speakers with anomic aphasia and their age, gender, and education matched healthy controls, to understand the influence of grammatical class on word production across speech contexts when target items were balanced for confounding variables between word classes and tasks. Elicitation of responses followed the protocol of the AphasiaBank consortium (http://talkbank.org/AphasiaBank/). Target words for confrontation naming were based on well-established naming tests, while those for narrative were drawn from a large database of normal speakers. Selected nouns and verbs in the two contexts were matched for age-of-acquisition (AoA) and familiarity. Influence of imageability was removed through statistical control. When AoA and familiarity were balanced, nouns were retrieved better than verbs, and performance was higher in picture naming than connected speech. When imageability was further controlled for, only the effect of task remained significant. The absence of word class effects when confounding variables are controlled for is similar to many previous reports; however, the pattern of better word retrieval in naming is rare but compatible with the account that processing demands are higher in narrative than naming. The overall findings have strongly suggested the importance of including connected speech tasks in any language assessment and evaluation of language rehabilitation of individuals with aphasia.
de Kleijn, Jasper L; van Kalmthout, Ludwike W M; van der Vossen, Martijn J B; Vonck, Bernard M D; Topsakal, Vedat; Bruijnzeel, Hanneke
2018-05-24
Although current guidelines recommend cochlear implantation only for children with profound hearing impairment (HI) (>90 decibel [dB] hearing level [HL]), studies show that children with severe hearing impairment (>70-90 dB HL) could also benefit from cochlear implantation. To perform a systematic review to identify audiologic thresholds (in dB HL) that could serve as an audiologic candidacy criterion for pediatric cochlear implantation using 4 domains of speech and language development as independent outcome measures (speech production, speech perception, receptive language, and auditory performance). PubMed and Embase databases were searched up to June 28, 2017, to identify studies comparing speech and language development between children who were profoundly deaf using cochlear implants and children with severe hearing loss using hearing aids, because no studies are available directly comparing children with severe HI in both groups. If cochlear implant users with profound HI score better on speech and language tests than those with severe HI who use hearing aids, this outcome could support adjusting cochlear implantation candidacy criteria to lower audiologic thresholds. Literature search, screening, and article selection were performed using a predefined strategy. Article screening was executed independently by 4 authors in 2 pairs; consensus on article inclusion was reached by discussion between these 4 authors. This study is reported according to the Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) statement. Title and abstract screening of 2822 articles resulted in selection of 130 articles for full-text review. Twenty-one studies were selected for critical appraisal, resulting in selection of 10 articles for data extraction. Two studies formulated audiologic thresholds (in dB HLs) at which children could qualify for cochlear implantation: (1) at 4-frequency pure-tone average (PTA) thresholds of 80 dB HL or greater based on speech perception and auditory performance subtests and (2) at PTA thresholds of 88 and 96 dB HL based on a speech perception subtest. In 8 of the 18 outcome measures, children with profound HI using cochlear implants performed similarly to children with severe HI using hearing aids. Better performance of cochlear implant users was shown with a picture-naming test and a speech perception in noise test. Owing to large heterogeneity in study population and selected tests, it was not possible to conduct a meta-analysis. Studies indicate that lower audiologic thresholds (≥80 dB HL) than are advised in current national and manufacturer guidelines would be appropriate as audiologic candidacy criteria for pediatric cochlear implantation.
Effects of context and word class on lexical retrieval in Chinese speakers with anomic aphasia
Law, Sam-Po; Kong, Anthony Pak-Hin; Lai, Loretta Wing-Shan; Lai, Christy
2014-01-01
Background Differences in processing nouns and verbs have been investigated intensely in psycholinguistics and neuropsychology in past decades. However, the majority of studies examining retrieval of these word classes have involved tasks of single word stimuli or responses. While the results have provided rich information for addressing issues about grammatical class distinctions, it is unclear whether they have adequate ecological validity for understanding lexical retrieval in connected speech which characterizes daily verbal communication. Previous investigations comparing retrieval of nouns and verbs in single word production and connected speech have reported either discrepant performance between the two contexts with presence of word class dissociation in picture naming but absence in connected speech, or null effects of word class. In addition, word finding difficulties have been found to be less severe in connected speech than picture naming. However, these studies have failed to match target stimuli of the two word classes and between tasks on psycholinguistic variables known to affect performance in response latency and/or accuracy. Aims The present study compared lexical retrieval of nouns and verbs in picture naming and connected speech from picture description, procedural description, and story-telling among 19 Chinese speakers with anomic aphasia and their age, gender, and education matched healthy controls, to understand the influence of grammatical class on word production across speech contexts when target items were balanced for confounding variables between word classes and tasks. Methods & Procedures Elicitation of responses followed the protocol of the AphasiaBank consortium (http://talkbank.org/AphasiaBank/). Target words for confrontation naming were based on well-established naming tests, while those for narrative were drawn from a large database of normal speakers. Selected nouns and verbs in the two contexts were matched for age-of-acquisition (AoA) and familiarity. Influence of imageability was removed through statistical control. Outcomes & Results When AoA and familiarity were balanced, nouns were retrieved better than verbs, and performance was higher in picture naming than connected speech. When imageability was further controlled for, only the effect of task remained significant. Conclusions The absence of word class effects when confounding variables are controlled for is similar to many previous reports; however, the pattern of better word retrieval in naming is rare but compatible with the account that processing demands are higher in narrative than naming. The overall findings have strongly suggested the importance of including connected speech tasks in any language assessment and evaluation of language rehabilitation of individuals with aphasia. PMID:25505810
Improving language models for radiology speech recognition.
Paulett, John M; Langlotz, Curtis P
2009-02-01
Speech recognition systems have become increasingly popular as a means to produce radiology reports, for reasons both of efficiency and of cost. However, the suboptimal recognition accuracy of these systems can affect the productivity of the radiologists creating the text reports. We analyzed a database of over two million de-identified radiology reports to determine the strongest determinants of word frequency. Our results showed that body site and imaging modality had a similar influence on the frequency of words and of three-word phrases as did the identity of the speaker. These findings suggest that the accuracy of speech recognition systems could be significantly enhanced by further tailoring their language models to body site and imaging modality, which are readily available at the time of report creation.
ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goh, Garrett B.; Siegel, Charles M.; Vishnu, Abhinav
With access to large datasets, deep neural networks through representation learning have been able to identify patterns from raw data, achieving human-level accuracy in image and speech recognition tasks. However, in chemistry, availability of large standardized and labelled datasets is scarce, and with a multitude of chemical properties of interest, chemical data is inherently small and fragmented. In this work, we explore transfer learning techniques in conjunction with the existing Chemception CNN model, to create a transferable and generalizable deep neural network for small-molecule property prediction. Our latest model, ChemNet learns in a semi-supervised manner from inexpensive labels computed frommore » the ChEMBL database. When fine-tuned to the Tox21, HIV and FreeSolv dataset, which are 3 separate chemical tasks that ChemNet was not originally trained on, we demonstrate that ChemNet exceeds the performance of existing Chemception models, contemporary MLP models that trains on molecular fingerprints, and it matches the performance of the ConvGraph algorithm, the current state-of-the-art. Furthermore, as ChemNet has been pre-trained on a large diverse chemical database, it can be used as a universal “plug-and-play” deep neural network, which accelerates the deployment of deep neural networks for the prediction of novel small-molecule chemical properties.« less
Telephony-based voice pathology assessment using automated speech analysis.
Moran, Rosalyn J; Reilly, Richard B; de Chazal, Philip; Lacy, Peter D
2006-03-01
A system for remotely detecting vocal fold pathologies using telephone-quality speech is presented. The system uses a linear classifier, processing measurements of pitch perturbation, amplitude perturbation and harmonic-to-noise ratio derived from digitized speech recordings. Voice recordings from the Disordered Voice Database Model 4337 system were used to develop and validate the system. Results show that while a sustained phonation, recorded in a controlled environment, can be classified as normal or pathologic with accuracy of 89.1%, telephone-quality speech can be classified as normal or pathologic with an accuracy of 74.2%, using the same scheme. Amplitude perturbation features prove most robust for telephone-quality speech. The pathologic recordings were then subcategorized into four groups, comprising normal, neuromuscular pathologic, physical pathologic and mixed (neuromuscular with physical) pathologic. A separate classifier was developed for classifying the normal group from each pathologic subcategory. Results show that neuromuscular disorders could be detected remotely with an accuracy of 87%, physical abnormalities with an accuracy of 78% and mixed pathology voice with an accuracy of 61%. This study highlights the real possibility for remote detection and diagnosis of voice pathology.
Hemispheric asymmetry in the hierarchical perception of music and speech.
Rosenthal, Matthew A
2016-11-01
The perception of music and speech involves a higher level, cognitive mechanism that allows listeners to form expectations for future music and speech events. This article comprehensively reviews studies on hemispheric differences in the formation of melodic and harmonic expectations in music and selectively reviews studies on hemispheric differences in the formation of syntactic and semantic expectations in speech. On the basis of this review, it is concluded that the higher level mechanism flexibly lateralizes music processing to either hemisphere depending on the expectation generated by a given musical context. When a context generates in the listener an expectation whose elements are sequentially ordered over time, higher level processing is dominant in the left hemisphere. When a context generates in the listener an expectation whose elements are not sequentially ordered over time, higher level processing is dominant in the right hemisphere. This article concludes with a spreading activation model that describes expectations for music and speech in terms of shared temporal and nontemporal representations. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Monson, Brian B.; Lotto, Andrew J.; Story, Brad H.
2012-01-01
The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech. PMID:22978902
Perez, Hector R.; Stoeckle, James H.
2016-01-01
Abstract Objective To provide an update on the epidemiology, genetics, pathophysiology, diagnosis, and treatment of developmental stuttering. Quality of evidence The MEDLINE and Cochrane databases were searched for past and recent studies on the epidemiology, genetics, pathophysiology, diagnosis, and treatment of developmental stuttering. Most recommendations are based on small studies, limited-quality evidence, or consensus. Main message Stuttering is a speech disorder, common in persons of all ages, that affects normal fluency and time patterning of speech. Stuttering has been associated with differences in brain anatomy, functioning, and dopamine regulation thought to be due to genetic causes. Attention to making a correct diagnosis or referral in children is important because there is growing consensus that early intervention with speech therapy for children who stutter is critical. For adults, stuttering can be associated with substantial psychosocial morbidity including social anxiety and low quality of life. Pharmacologic treatment has received attention in recent years, but clinical evidence is limited. The mainstay of treatment for children and adults remains speech therapy. Conclusion A growing body of research has attempted to uncover the pathophysiology of stuttering. Referral for speech therapy remains the best option for children and adults. PMID:27303004
Phonetic Modification of Vowel Space in Storybook Speech to Infants up to 2 Years of Age
ERIC Educational Resources Information Center
Burnham, Evamarie B.; Wieland, Elizabeth A.; Kondaurova, Maria V.; McAuley, J. Devin; Bergeson, Tonya R.; Dilley, Laura C.
2015-01-01
Purpose: A large body of literature has indicated vowel space area expansion in infant-directed (ID) speech compared with adult-directed (AD) speech, which may promote language acquisition. The current study tested whether this expansion occurs in storybook speech read to infants at various points during their first 2 years of life. Method: In 2…
Free Speech and the Rights of Congress: Robert M. LaFollette and the Argument from Principle.
ERIC Educational Resources Information Center
Schliessmann, Michael R.
Senator Robert LaFollette's speech to the United States Senate on "Free Speech and the Right of Congress to Declare the Objects of War," given October 6, 1917, epitomized his opposition to the war and the Wilson administration's largely successful moves to suppress public criticism of the war. In the speech he asserted his position on…
Combining Multiple Knowledge Sources for Speech Recognition
1988-09-15
Thus, the first is thle to clarify the pronunciationt ( TASSEAJ for the acronym TASA !). best adaptation sentence, the second sentence, whens addled...10 rapid adapltati,,n sen- tenrces, and 15 spell-i,, de phrases. 6101 resource rirailageo lei SPEAKER-DEPENDENT DATABASE sentences were randortily...combining the smoothed phoneme models with the de - system tested on a standard database using two well de . tailed context models. BYBLOS makes maximal use
NASA Astrophysics Data System (ADS)
S. Al-Kaltakchi, Musab T.; Woo, Wai L.; Dlay, Satnam; Chambers, Jonathon A.
2017-12-01
In this study, a speaker identification system is considered consisting of a feature extraction stage which utilizes both power normalized cepstral coefficients (PNCCs) and Mel frequency cepstral coefficients (MFCC). Normalization is applied by employing cepstral mean and variance normalization (CMVN) and feature warping (FW), together with acoustic modeling using a Gaussian mixture model-universal background model (GMM-UBM). The main contributions are comprehensive evaluations of the effect of both additive white Gaussian noise (AWGN) and non-stationary noise (NSN) (with and without a G.712 type handset) upon identification performance. In particular, three NSN types with varying signal to noise ratios (SNRs) were tested corresponding to street traffic, a bus interior, and a crowded talking environment. The performance evaluation also considered the effect of late fusion techniques based on score fusion, namely, mean, maximum, and linear weighted sum fusion. The databases employed were TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3600 speech utterances. As recommendations from the study, mean fusion is found to yield overall best performance in terms of speaker identification accuracy (SIA) with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings.
Mispronunciation Detection for Language Learning and Speech Recognition Adaptation
ERIC Educational Resources Information Center
Ge, Zhenhao
2013-01-01
The areas of "mispronunciation detection" (or "accent detection" more specifically) within the speech recognition community are receiving increased attention now. Two application areas, namely language learning and speech recognition adaptation, are largely driving this research interest and are the focal points of this work.…
Speech Appliances in the Treatment of Phonological Disorders.
ERIC Educational Resources Information Center
Ruscello, Dennis M.
1995-01-01
This article addresses the rationale for and issues related to the use of speech appliances, especially a removable speech appliance that positions the tongue to produce the correct /r/ phoneme. Research results suggest that this appliance was successful with a large group of clients. (Author/DB)
ERIC Educational Resources Information Center
Johnston, Dale
2006-01-01
Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…
Nixon's Checkers: A Rhetoric-Communication Criticism.
ERIC Educational Resources Information Center
Kneupper, Charles W.; Mabry, Edward A.
Richard Nixon's "Checkers" speech, a response to charges brought against the "Nixon fund," was primarily an effort to explain the behavior of Eisenhower's 1952 presidential-campaign staff. The effectiveness of this speech was largely due to Nixon's self-disclosure within the context of the speech's narrative mode. In…
Sorrell v. IMS Health: issues and opportunities for informaticians
Petersen, Carolyn; DeMuro, Paul; Goodman, Kenneth W; Kaplan, Bonnie
2013-01-01
In 2011, the US Supreme Court decided Sorrell v. IMS Health, Inc., a case that addressed the mining of large aggregated databases and the sale of prescriber data for marketing prescription drugs. The court struck down a Vermont law that required data mining companies to obtain permission from individual providers before selling prescription records that included identifiable physician prescription information to pharmaceutical companies for drug marketing. The decision was based on constitutional free speech protections rather than data sharing considerations. Sorrell illustrates challenges at the intersection of biomedical informatics, public health, constitutional liberties, and ethics. As states, courts, regulatory agencies, and federal bodies respond to Sorrell, informaticians’ expertise can contribute to more informed, ethical, and appropriate policies. PMID:23104048
Liu, Pan; Pell, Marc D
2012-12-01
To establish a valid database of vocal emotional stimuli in Mandarin Chinese, a set of Chinese pseudosentences (i.e., semantically meaningless sentences that resembled real Chinese) were produced by four native Mandarin speakers to express seven emotional meanings: anger, disgust, fear, sadness, happiness, pleasant surprise, and neutrality. These expressions were identified by a group of native Mandarin listeners in a seven-alternative forced choice task, and items reaching a recognition rate of at least three times chance performance in the seven-choice task were selected as a valid database and then subjected to acoustic analysis. The results demonstrated expected variations in both perceptual and acoustic patterns of the seven vocal emotions in Mandarin. For instance, fear, anger, sadness, and neutrality were associated with relatively high recognition, whereas happiness, disgust, and pleasant surprise were recognized less accurately. Acoustically, anger and pleasant surprise exhibited relatively high mean f0 values and large variation in f0 and amplitude; in contrast, sadness, disgust, fear, and neutrality exhibited relatively low mean f0 values and small amplitude variations, and happiness exhibited a moderate mean f0 value and f0 variation. Emotional expressions varied systematically in speech rate and harmonics-to-noise ratio values as well. This validated database is available to the research community and will contribute to future studies of emotional prosody for a number of purposes. To access the database, please contact pan.liu@mail.mcgill.ca.
Representative American Speeches 1996-1997. The Reference Shelf Volume 69 Number 6.
ERIC Educational Resources Information Center
Logue, Calvin McLeod, Ed.; DeHart, Jean, Ed.
This collection of representative speeches delivered by public officials and other prominent persons contains addresses to both large and small organizations, given both on ceremonial occasions and on less formal occasions. The collection contains school commencement addresses, addresses to government bodies, speeches to international…
Speech Perception as a Cognitive Process: The Interactive Activation Model.
ERIC Educational Resources Information Center
Elman, Jeffrey L.; McClelland, James L.
Research efforts to model speech perception in terms of a processing system in which knowledge and processing are distributed over large numbers of highly interactive--but computationally primative--elements are described in this report. After discussing the properties of speech that demand a parallel interactive processing system, the report…
Imitation of Para-Phonological Detail Following Left Hemisphere Lesions
ERIC Educational Resources Information Center
Kappes, Juliane; Baumgaertner, Annette; Peschke, Claudia; Goldenberg, Georg; Ziegler, Wolfram
2010-01-01
Imitation in speech refers to the unintentional transfer of phonologically irrelevant acoustic-phonetic information of auditory input into speech motor output. Evidence for such imitation effects has been explained within the framework of episodic theories. However, it is largely unclear, which neural structures mediate speech imitation and how…
Chen-Ying Hung; Wei-Chen Chen; Po-Tsun Lai; Ching-Heng Lin; Chi-Chun Lee
2017-07-01
Electronic medical claims (EMCs) can be used to accurately predict the occurrence of a variety of diseases, which can contribute to precise medical interventions. While there is a growing interest in the application of machine learning (ML) techniques to address clinical problems, the use of deep-learning in healthcare have just gained attention recently. Deep learning, such as deep neural network (DNN), has achieved impressive results in the areas of speech recognition, computer vision, and natural language processing in recent years. However, deep learning is often difficult to comprehend due to the complexities in its framework. Furthermore, this method has not yet been demonstrated to achieve a better performance comparing to other conventional ML algorithms in disease prediction tasks using EMCs. In this study, we utilize a large population-based EMC database of around 800,000 patients to compare DNN with three other ML approaches for predicting 5-year stroke occurrence. The result shows that DNN and gradient boosting decision tree (GBDT) can result in similarly high prediction accuracies that are better compared to logistic regression (LR) and support vector machine (SVM) approaches. Meanwhile, DNN achieves optimal results by using lesser amounts of patient data when comparing to GBDT method.
Robust Speech Enhancement Using Two-Stage Filtered Minima Controlled Recursive Averaging
NASA Astrophysics Data System (ADS)
Ghourchian, Negar; Selouani, Sid-Ahmed; O'Shaughnessy, Douglas
In this paper we propose an algorithm for estimating noise in highly non-stationary noisy environments, which is a challenging problem in speech enhancement. This method is based on minima-controlled recursive averaging (MCRA) whereby an accurate, robust and efficient noise power spectrum estimation is demonstrated. We propose a two-stage technique to prevent the appearance of musical noise after enhancement. This algorithm filters the noisy speech to achieve a robust signal with minimum distortion in the first stage. Subsequently, it estimates the residual noise using MCRA and removes it with spectral subtraction. The proposed Filtered MCRA (FMCRA) performance is evaluated using objective tests on the Aurora database under various noisy environments. These measures indicate the higher output SNR and lower output residual noise and distortion.
Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias
2016-02-01
Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds on their ability to detect mismatches between concurrently presented auditory and visual vowels and related their performance to their productive abilities and later vocabulary size. Results show that infants' ability to detect mismatches between auditory and visually presented vowels differs depending on the vowels involved. Furthermore, infants' sensitivity to mismatches is modulated by their current articulatory knowledge and correlates with their vocabulary size at 12 months of age. This suggests that-aside from infants' ability to match nonnative audiovisual cues (Pons et al., 2009)-their ability to match native auditory and visual cues continues to develop during the first year of life. Our findings point to a potential role of salient vowel cues and productive abilities in the development of audiovisual speech perception, and further indicate a relation between infants' early sensitivity to audiovisual speech cues and their later language development. PsycINFO Database Record (c) 2016 APA, all rights reserved.
Study of wavelet packet energy entropy for emotion classification in speech and glottal signals
NASA Astrophysics Data System (ADS)
He, Ling; Lech, Margaret; Zhang, Jing; Ren, Xiaomei; Deng, Lihua
2013-07-01
The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).
Use of speech-to-text technology for documentation by healthcare providers.
Ajami, Sima
2016-01-01
Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Morris, Meg E; Perry, Alison; Bilney, Belinda; Curran, Andrea; Dodd, Karen; Wittwer, Joanne E; Dalton, Gregory W
2006-09-01
This article describes a systematic review and critical evaluation of the international literature on the effects of physical therapy, speech pathology, and occupational therapy for people with motor neuron disease (PwMND). The results were interpreted using the framework of the International Classification of Functioning, Disability and Health. This enabled us to summarize therapy outcomes at the level of body structure and function, activity limitations, participation restrictions, and quality of life. Databases searched included MEDLINE, PUBMED, CINAHL, PSYCInfo, Data base of Abstracts of Reviews of Effectiveness (DARE), The Physiotherapy Evidence data base (PEDro), Evidence Based Medicine Reviews (EMBASE), the Cochrane database of systematic reviews, and the Cochrane Controlled Trials Register. Evidence was graded according to the Harbour and Miller classification. Most of the evidence was found to be at the level of "clinical opinion" rather than of controlled clinical trials. Several nonrandomized small group and "observational studies" provided low-level evidence to support physical therapy for improving muscle strength and pulmonary function. There was also some evidence to support the effectiveness of speech pathology interventions for dysarthria. The search identified a small number of studies on occupational therapy for PwMND, which were small, noncontrolled pre-post-designs or clinical reports.
An integrated tool for the diagnosis of voice disorders.
Godino-Llorente, Juan I; Sáenz-Lechón, Nicolás; Osma-Ruiz, Víctor; Aguilera-Navarro, Santiago; Gómez-Vilda, Pedro
2006-04-01
A PC-based integrated aid tool has been developed for the analysis and screening of pathological voices. With it the user can simultaneously record speech, electroglottographic (EGG), and videoendoscopic signals, and synchronously edit them to select the most significant segments. These multimedia data are stored on a relational database, together with a patient's personal information, anamnesis, diagnosis, visits, explorations and any other comment the specialist may wish to include. The speech and EGG waveforms are analysed by means of temporal representations and the quantitative measurements of parameters such as spectrograms, frequency and amplitude perturbation measurements, harmonic energy, noise, etc. are calculated using digital signal processing techniques, giving an idea of the degree of hoarseness and quality of the voice register. Within this framework, the system uses a standard protocol to evaluate and build complete databases of voice disorders. The target users of this system are speech and language therapists and ear nose and throat (ENT) clinicians. The application can be easily configured to cover the needs of both groups of professionals. The software has a user-friendly Windows style interface. The PC should be equipped with standard sound and video capture cards. Signals are captured using common transducers: a microphone, an electroglottograph and a fiberscope or telelaryngoscope. The clinical usefulness of the system is addressed in a comprehensive evaluation section.
Hong Kong: Ten Years After the Handover
2007-06-29
Liberties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Freedom of Speech and Assembly...surrounding the proposed anti-sedition legislation in 2003. The freedoms of speech and assembly appear to have been largely respected by the Chinese and...undermined the civil liberties of Hong Kong residents. However, there have been perceived threats of the freedom of speech and assembly, and erosion of press
Talker Differences in Clear and Conversational Speech: Acoustic Characteristics of Vowels
ERIC Educational Resources Information Center
Ferguson, Sarah Hargus; Kewley-Port, Diane
2007-01-01
Purpose: To determine the specific acoustic changes that underlie improved vowel intelligibility in clear speech. Method: Seven acoustic metrics were measured for conversational and clear vowels produced by 12 talkers--6 who previously were found (S. H. Ferguson, 2004) to produce a large clear speech vowel intelligibility effect for listeners with…
ERIC Educational Resources Information Center
Altvater-Mackensen, Nicole; Grossmann, Tobias
2015-01-01
Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…
Modulation, Adaptation, and Control of Orofacial Pathways in Healthy Adults
ERIC Educational Resources Information Center
Estep, Meredith E.
2009-01-01
Although the healthy adult possesses a large repertoire of coordinative strategies for oromotor behaviors, a range of nonverbal, speech-like movements can be observed during speech. The extent of overlap among sensorimotor speech and nonspeech neural correlates and the role of neuromodulatory inputs generated during oromotor behaviors are unknown.…
Discourse Analysis and Language Learning [Summary of a Symposium].
ERIC Educational Resources Information Center
Hatch, Evelyn
1981-01-01
A symposium on discourse analysis and language learning is summarized. Discourse analysis can be divided into six fields of research: syntax, the amount of syntactic organization required for different types of discourse, large speech events, intra-sentential cohesion in text, speech acts, and unequal power discourse. Research on speech events and…
Biofeedback in dysphonia - progress and challenges.
Amorim, Geová Oliveira de; Balata, Patrícia Maria Mendes; Vieira, Laís Guimarães; Moura, Thaís; Silva, Hilton Justino da
There is evidence that all the complex machinery involved in speech acts along with the auditory system, and their adjustments can be altered. To present the evidence of biofeedback application for treatment of vocal disorders, emphasizing the muscle tension dysphonia. A systematic review was conducted in Scielo, Lilacs, PubMed and Web of Sciences databases, using the combination of descriptors, and admitting as inclusion criteria: articles published in journals with editorial committee, reporting cases or experimental or quasi-experimental research on the use of biofeedback in real time as additional source of treatment monitoring of muscle tension dysphonia or for vocal training. Thirty-three articles were identified in databases, and seven were included in the qualitative synthesis. The beginning of electromyographic biofeedback studies applied to speech therapy were promising and pointed to a new method that enabled good results in muscle tension dysphonia. Nonetheless, the discussion of the results lacked physiological evidence that could serve as their basis. The search for such explanations has become a challenge for speech therapists, and determined two research lines: one dedicated to the improvement of the electromyographic biofeedback methodology for voice disorders, to reduce confounding variables, and the other dedicated to the research of neural processes involved in changing the muscle engram of normal and dysphonic patients. There is evidence that the electromyographic biofeedback promotes changes in the neural networks responsible for speech, and can change behavior for vocal emissions with quality. Copyright © 2017 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Tsai, Ching-Shu; Chen, Vincent Chin-Hung; Yang, Yao-Hsu; Hung, Tai-Hsin; Lu, Mong-Liang; Huang, Kuo-You; Gossop, Michael
2017-01-01
Manifestations of Mycoplasma pneumoniae infection can range from self-limiting upper respiratory symptoms to various neurological complications, including speech and language impairment. But an association between Mycoplasma pneumoniae infection and speech and language impairment has not been sufficiently explored. In this study, we aim to investigate the association between Mycoplasma pneumoniae infection and subsequent speech and language impairment in a nationwide population-based sample using Taiwan's National Health Insurance Research Database. We identified 5,406 children with Mycoplasma pneumoniae infection (International Classification of Disease, Revision 9, Clinical Modification code 4830) and compared to 21,624 age-, sex-, urban- and income-matched controls on subsequent speech and language impairment. The mean follow-up interval for all subjects was 6.44 years (standard deviation = 2.42 years); the mean latency period between the initial Mycoplasma pneumoniae infection and presence of speech and language impairment was 1.96 years (standard deviation = 1.64 years). The results showed that Mycoplasma pneumoniae infection was significantly associated with greater incidence of speech and language impairment [hazard ratio (HR) = 1.49, 95% CI: 1.23-1.80]. In addition, significantly increased hazard ratio of subsequent speech and language impairment in the groups younger than 6 years old and no significant difference in the groups over the age of 6 years were found (HR = 1.43, 95% CI:1.09-1.88 for age 0-3 years group; HR = 1.67, 95% CI: 1.25-2.23 for age 4-5 years group; HR = 1.14, 95% CI: 0.54-2.39 for age 6-7 years group; and HR = 0.83, 95% CI:0.23-2.92 for age 8-18 years group). In conclusion, Mycoplasma pneumoniae infection is temporally associated with incident speech and language impairment.
The impact of extrinsic demographic factors on Cantonese speech acquisition.
To, Carol K S; Cheung, Pamela S P; McLeod, Sharynne
2013-05-01
This study modeled the associations between extrinsic demographic factors and children's speech acquisition in Hong Kong Cantonese. The speech of 937 Cantonese-speaking children aged 2;4 to 6;7 in Hong Kong was assessed using a standardized speech test. Demographic information regarding household income, paternal education, maternal education, presence of siblings and having a domestic helper as the main caregiver was collected via parent questionnaires. After controlling for age and sex, higher maternal education and higher household income were significantly associated with better speech skills; however, these variables explained a negligible amount of variance. Paternal education, number of siblings and having a foreign domestic helper did not associate with a child's speech acquisition. Extrinsic factors only exerted minimal influence on children's speech acquisition. A large amount of unexplained variance in speech ability still warrants further research.
Error biases in inner and overt speech: evidence from tongue twisters.
Corley, Martin; Brocklehurst, Paul H; Moat, H Susannah
2011-01-01
To compare the properties of inner and overt speech, Oppenheim and Dell (2008) counted participants' self-reported speech errors when reciting tongue twisters either overtly or silently and found a bias toward substituting phonemes that resulted in words in both conditions, but a bias toward substituting similar phonemes only when speech was overt. Here, we report 3 experiments revisiting their conclusion that inner speech remains underspecified at the subphonemic level, which they simulated within an activation-feedback framework. In 2 experiments, participants recited tongue twisters that could result in the errorful substitutions of similar or dissimilar phonemes to form real words or nonwords. Both experiments included an auditory masking condition, to gauge the possible impact of loss of auditory feedback on the accuracy of self-reporting of speech errors. In Experiment 1, the stimuli were composed entirely from real words, whereas, in Experiment 2, half the tokens used were nonwords. Although masking did not have any effects, participants were more likely to report substitutions of similar phonemes in both experiments, in inner as well as overt speech. This pattern of results was confirmed in a 3rd experiment using the real-word materials from Oppenheim and Dell (in press). In addition to these findings, a lexical bias effect found in Experiments 1 and 3 disappeared in Experiment 2. Our findings support a view in which plans for inner speech are indeed specified at the feature level, even when there is no intention to articulate words overtly, and in which editing of the plan for errors is implicated. (PsycINFO Database Record (c) 2010 APA, all rights reserved).
Characterizing resonant component in speech: A different view of tracking fundamental frequency
NASA Astrophysics Data System (ADS)
Dong, Bin
2017-05-01
Inspired by the nonlinearity and nonstationarity and the modulations in speech, Hilbert-Huang Transform and cyclostationarity analysis are employed to investigate the speech resonance in vowel in sequence. Cyclostationarity analysis is not directly manipulated on the target vowel, but on its intrinsic mode functions one by one. Thanks to the equivalence between the fundamental frequency in speech and the cyclic frequency in cyclostationarity analysis, the modulation intensity distributions of the intrinsic mode functions provide much information for the estimation of the fundamental frequency. To highlight the relationship between frequency and time, the pseudo-Hilbert spectrum is proposed to replace the Hilbert spectrum here. After contrasting the pseudo-Hilbert spectra of and the modulation intensity distributions of the intrinsic mode functions, it finds that there is usually one intrinsic mode function which works as the fundamental component of the vowel. Furthermore, the fundamental frequency of the vowel can be determined by tracing the pseudo-Hilbert spectrum of its fundamental component along the time axis. The later method is more robust to estimate the fundamental frequency, when meeting nonlinear components. Two vowels [a] and [i], picked up from a speech database FAU Aibo Emotion Corpus, are applied to validate the above findings.
Stuttering: Clinical and research update.
Perez, Hector R; Stoeckle, James H
2016-06-01
To provide an update on the epidemiology, genetics, pathophysiology, diagnosis, and treatment of developmental stuttering. The MEDLINE and Cochrane databases were searched for past and recent studies on the epidemiology, genetics, pathophysiology, diagnosis, and treatment of developmental stuttering. Most recommendations are based on small studies, limited-quality evidence, or consensus. Stuttering is a speech disorder, common in persons of all ages, that affects normal fluency and time patterning of speech. Stuttering has been associated with differences in brain anatomy, functioning, and dopamine regulation thought to be due to genetic causes. Attention to making a correct diagnosis or referral in children is important because there is growing consensus that early intervention with speech therapy for children who stutter is critical. For adults, stuttering can be associated with substantial psychosocial morbidity including social anxiety and low quality of life. Pharmacologic treatment has received attention in recent years, but clinical evidence is limited. The mainstay of treatment for children and adults remains speech therapy. A growing body of research has attempted to uncover the pathophysiology of stuttering. Referral for speech therapy remains the best option for children and adults. Copyright© the College of Family Physicians of Canada.
Contextual modulation of reading rate for direct versus indirect speech quotations.
Yao, Bo; Scheepers, Christoph
2011-12-01
In human communication, direct speech (e.g., Mary said: "I'm hungry") is perceived to be more vivid than indirect speech (e.g., Mary said [that] she was hungry). However, the processing consequences of this distinction are largely unclear. In two experiments, participants were asked to either orally (Experiment 1) or silently (Experiment 2, eye-tracking) read written stories that contained either a direct speech or an indirect speech quotation. The context preceding those quotations described a situation that implied either a fast-speaking or a slow-speaking quoted protagonist. It was found that this context manipulation affected reading rates (in both oral and silent reading) for direct speech quotations, but not for indirect speech quotations. This suggests that readers are more likely to engage in perceptual simulations of the reported speech act when reading direct speech as opposed to meaning-equivalent indirect speech quotations, as part of a more vivid representation of the former. Copyright © 2011 Elsevier B.V. All rights reserved.
Long short-term memory for speaker generalization in supervised speech separation
Chen, Jitong; Wang, DeLiang
2017-01-01
Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation. PMID:28679261
Design of an efficient music-speech discriminator.
Tardón, Lorenzo J; Sammartino, Simone; Barbancho, Isabel
2010-01-01
In this paper, the problem of the design of a simple and efficient music-speech discriminator for large audio data sets in which advanced music playing techniques are taught and voice and music are intrinsically interleaved is addressed. In the process, a number of features used in speech-music discrimination are defined and evaluated over the available data set. Specifically, the data set contains pieces of classical music played with different and unspecified instruments (or even lyrics) and the voice of a teacher (a top music performer) or even the overlapped voice of the translator and other persons. After an initial test of the performance of the features implemented, a selection process is started, which takes into account the type of classifier selected beforehand, to achieve good discrimination performance and computational efficiency, as shown in the experiments. The discrimination application has been defined and tested on a large data set supplied by Fundacion Albeniz, containing a large variety of classical music pieces played with different instrument, which include comments and speeches of famous performers.
Robust audio-visual speech recognition under noisy audio-video conditions.
Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji
2014-02-01
This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Audiology and speech-language pathology practice in Saudi Arabia.
Alanazi, Ahmad A
2017-01-01
Audiology and speech-language pathology (SLP) are relatively new professions in Saudi Arabia. The idea of establishing new audiology and SLP programs in some education facilities has become popular across Saudi Arabia; yet, only four undergraduate and graduate programs are currently available. This study aimed to explore the fields of audiology and SLP in Saudi Arabia, obtain demography of audiologists and Speech-language pathologists (SLPs), understand their current practices, and identify their perspective on what both professions need to improve. A cross-sectional mixed methods study design was used to address the aim of this study. Two online surveys were prepared and distributed to reach a large number of audiologists and SLPs. Both surveys consisted of close- and open-ended questions and primarily focused on three categories demography, audiology or SLP practices, and audiologists' or SLPs' perspective on their professions in Saudi Arabia. A total of 23 audiologists and 37 SLPs completed the surveys (age range = 21-50 years). The majority of respondents were from Riyadh with different academic qualifications and working experiences. Various practices were noticed among audiologists and SLPs who mainly worked in hospitals. Several suggestions regarding the development of audiology and SLP education and practice in Saudi Arabia are discussed. This study provides useful information about audiology and SLP education and practices in Saudi Arabia. Collaborative work between stakeholders to achieve high-quality educational and practical standards is critical. National database, clinical guidelines and policies should be developed, employed, and supervised. Further research is needed to improve education and practice of both professions in Saudi Arabia.
Wang, Kun-Ching
2015-01-14
The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.
Early Sign Language Exposure and Cochlear Implantation Benefits.
Geers, Ann E; Mitchell, Christine M; Warner-Czyz, Andrea; Wang, Nae-Yuh; Eisenberg, Laurie S
2017-07-01
Most children with hearing loss who receive cochlear implants (CI) learn spoken language, and parents must choose early on whether to use sign language to accompany speech at home. We address whether parents' use of sign language before and after CI positively influences auditory-only speech recognition, speech intelligibility, spoken language, and reading outcomes. Three groups of children with CIs from a nationwide database who differed in the duration of early sign language exposure provided in their homes were compared in their progress through elementary grades. The groups did not differ in demographic, auditory, or linguistic characteristics before implantation. Children without early sign language exposure achieved better speech recognition skills over the first 3 years postimplant and exhibited a statistically significant advantage in spoken language and reading near the end of elementary grades over children exposed to sign language. Over 70% of children without sign language exposure achieved age-appropriate spoken language compared with only 39% of those exposed for 3 or more years. Early speech perception predicted speech intelligibility in middle elementary grades. Children without sign language exposure produced speech that was more intelligible (mean = 70%) than those exposed to sign language (mean = 51%). This study provides the most compelling support yet available in CI literature for the benefits of spoken language input for promoting verbal development in children implanted by 3 years of age. Contrary to earlier published assertions, there was no advantage to parents' use of sign language either before or after CI. Copyright © 2017 by the American Academy of Pediatrics.
Marsh, John E; Yang, Jingqi; Qualter, Pamela; Richardson, Cassandra; Perham, Nick; Vachon, François; Hughes, Robert W
2018-06-01
Task-irrelevant speech impairs short-term serial recall appreciably. On the interference-by-process account, the processing of physical (i.e., precategorical) changes in speech yields order cues that conflict with the serial-ordering process deployed to perform the serial recall task. In this view, the postcategorical properties (e.g., phonology, meaning) of speech play no role. The present study reassessed the implications of recent demonstrations of auditory postcategorical distraction in serial recall that have been taken as support for an alternative, attentional-diversion, account of the irrelevant speech effect. Focusing on the disruptive effect of emotionally valent compared with neutral words on serial recall, we show that the distracter-valence effect is eliminated under conditions-high task-encoding load-thought to shield against attentional diversion whereas the general effect of speech (neutral words compared with quiet) remains unaffected (Experiment 1). Furthermore, the distracter-valence effect generalizes to a task that does not require the processing of serial order-the missing-item task-whereas the effect of speech per se is attenuated in this task (Experiment 2). We conclude that postcategorical auditory distraction phenomena in serial short-term memory (STM) are incidental: they are observable in such a setting but, unlike the acoustically driven irrelevant speech effect, are not integral to it. As such, the findings support a duplex-mechanism account over a unitary view of auditory distraction. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Automatic intelligibility classification of sentence-level pathological speech
Kim, Jangwon; Kumar, Naveen; Tsiartas, Andreas; Li, Ming; Narayanan, Shrikanth S.
2014-01-01
Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist experts in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we propose novel sentence-level features to capture abnormal variation in the prosodic, voice quality and pronunciation aspects in pathological speech. In addition, we propose a post-classification posterior smoothing scheme which refines the posterior of a test sample based on the posteriors of other test samples. Finally, we perform feature-level fusions and subsystem decision fusion for arriving at a final intelligibility decision. The performances are tested on two pathological speech datasets, the NKI CCRT Speech Corpus (advanced head and neck cancer) and the TORGO database (cerebral palsy or amyotrophic lateral sclerosis), by evaluating classification accuracy without overlapping subjects’ data among training and test partitions. Results show that the feature sets of each of the voice quality subsystem, prosodic subsystem, and pronunciation subsystem, offer significant discriminating power for binary intelligibility classification. We observe that the proposed posterior smoothing in the acoustic space can further reduce classification errors. The smoothed posterior score fusion of subsystems shows the best classification performance (73.5% for unweighted, and 72.8% for weighted, average recalls of the binary classes). PMID:25414544
A narrow band pattern-matching model of vowel perception
NASA Astrophysics Data System (ADS)
Hillenbrand, James M.; Houde, Robert A.
2003-02-01
The purpose of this paper is to propose and evaluate a new model of vowel perception which assumes that vowel identity is recognized by a template-matching process involving the comparison of narrow band input spectra with a set of smoothed spectral-shape templates that are learned through ordinary exposure to speech. In the present simulation of this process, the input spectra are computed over a sufficiently long window to resolve individual harmonics of voiced speech. Prior to template creation and pattern matching, the narrow band spectra are amplitude equalized by a spectrum-level normalization process, and the information-bearing spectral peaks are enhanced by a ``flooring'' procedure that zeroes out spectral values below a threshold function consisting of a center-weighted running average of spectral amplitudes. Templates for each vowel category are created simply by averaging the narrow band spectra of like vowels spoken by a panel of talkers. In the present implementation, separate templates are used for men, women, and children. The pattern matching is implemented with a simple city-block distance measure given by the sum of the channel-by-channel differences between the narrow band input spectrum (level-equalized and floored) and each vowel template. Spectral movement is taken into account by computing the distance measure at several points throughout the course of the vowel. The input spectrum is assigned to the vowel template that results in the smallest difference accumulated over the sequence of spectral slices. The model was evaluated using a large database consisting of 12 vowels in /h
What Makes a Caseload (Un) Manageable? School-Based Speech-Language Pathologists Speak
ERIC Educational Resources Information Center
Katz, Lauren A.; Maag, Abby; Fallon, Karen A.; Blenkarn, Katie; Smith, Megan K.
2010-01-01
Purpose: Large caseload sizes and a shortage of speech-language pathologists (SLPs) are ongoing concerns in the field of speech and language. This study was conducted to identify current mean caseload size for school-based SLPs, a threshold at which caseload size begins to be perceived as unmanageable, and variables contributing to school-based…
ERIC Educational Resources Information Center
Bailey, Dallin J.; Dromey, Christopher
2015-01-01
Purpose: The purpose of this study was to examine divided attention over a large age range by looking at the effects of 3 nonspeech tasks on concurrent speech motor performance. The nonspeech tasks were designed to facilitate measurement of bidirectional interference, allowing examination of their sensitivity to speech activity. A cross-sectional…
Contextual Modulation of Reading Rate for Direct versus Indirect Speech Quotations
ERIC Educational Resources Information Center
Yao, Bo; Scheepers, Christoph
2011-01-01
In human communication, direct speech (e.g., "Mary said: "I'm hungry"") is perceived to be more vivid than indirect speech (e.g., "Mary said [that] she was hungry"). However, the processing consequences of this distinction are largely unclear. In two experiments, participants were asked to either orally (Experiment 1) or silently (Experiment 2,…
Determining the energetic and informational components of speech-on-speech masking
Kidd, Gerald; Mason, Christine R.; Swaminathan, Jayaganesh; Roverud, Elin; Clayton, Kameron K.; Best, Virginia
2016-01-01
Identification of target speech was studied under masked conditions consisting of two or four independent speech maskers. In the reference conditions, the maskers were colocated with the target, the masker talkers were the same sex as the target, and the masker speech was intelligible. The comparison conditions, intended to provide release from masking, included different-sex target and masker talkers, time-reversal of the masker speech, and spatial separation of the maskers from the target. Significant release from masking was found for all comparison conditions. To determine whether these reductions in masking could be attributed to differences in energetic masking, ideal time-frequency segregation (ITFS) processing was applied so that the time-frequency units where the masker energy dominated the target energy were removed. The remaining target-dominated “glimpses” were reassembled as the stimulus. Speech reception thresholds measured using these resynthesized ITFS-processed stimuli were the same for the reference and comparison conditions supporting the conclusion that the amount of energetic masking across conditions was the same. These results indicated that the large release from masking found under all comparison conditions was due primarily to a reduction in informational masking. Furthermore, the large individual differences observed generally were correlated across the three masking release conditions. PMID:27475139
SAM: speech-aware applications in medicine to support structured data entry.
Wormek, A. K.; Ingenerf, J.; Orthner, H. F.
1997-01-01
In the last two years, improvement in speech recognition technology has directed the medical community's interest to porting and using such innovations in clinical systems. The acceptance of speech recognition systems in clinical domains increases with recognition speed, large medical vocabulary, high accuracy, continuous speech recognition, and speaker independence. Although some commercial speech engines approach these requirements, the greatest benefit can be achieved in adapting a speech recognizer to a specific medical application. The goals of our work are first, to develop a speech-aware core component which is able to establish connections to speech recognition engines of different vendors. This is realized in SAM. Second, with applications based on SAM we want to support the physician in his/her routine clinical care activities. Within the STAMP project (STAndardized Multimedia report generator in Pathology), we extend SAM by combining a structured data entry approach with speech recognition technology. Another speech-aware application in the field of Diabetes care is connected to a terminology server. The server delivers a controlled vocabulary which can be used for speech recognition. PMID:9357730
Improving Acoustic Models by Watching Television
NASA Technical Reports Server (NTRS)
Witbrock, Michael J.; Hauptmann, Alexander G.
1998-01-01
Obtaining sufficient labelled training data is a persistent difficulty for speech recognition research. Although well transcribed data is expensive to produce, there is a constant stream of challenging speech data and poor transcription broadcast as closed-captioned television. We describe a reliable unsupervised method for identifying accurately transcribed sections of these broadcasts, and show how these segments can be used to train a recognition system. Starting from acoustic models trained on the Wall Street Journal database, a single iteration of our training method reduced the word error rate on an independent broadcast television news test set from 62.2% to 59.5%.
An Ecosystem of Intelligent ICT Tools for Speech-Language Therapy Based on a Formal Knowledge Model.
Robles-Bykbaev, Vladimir; López-Nores, Martín; Pazos-Arias, José; Quisi-Peralta, Diego; García-Duque, Jorge
2015-01-01
The language and communication constitute the development mainstays of several intellectual and cognitive skills in humans. However, there are millions of people around the world who suffer from several disabilities and disorders related with language and communication, while most of the countries present a lack of corresponding services related with health care and rehabilitation. On these grounds, we are working to develop an ecosystem of intelligent ICT tools to support speech and language pathologists, doctors, students, patients and their relatives. This ecosystem has several layers and components, integrating Electronic Health Records management, standardized vocabularies, a knowledge database, an ontology of concepts from the speech-language domain, and an expert system. We discuss the advantages of such an approach through experiments carried out in several institutions assisting children with a wide spectrum of disabilities.
Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology
2015-01-01
Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes. PMID:26011789
Campus Free Speech Presents Both Legal and PR Challenges for Colleges
ERIC Educational Resources Information Center
Nguyen, AiVi; Dragga, Anthony
2016-01-01
Free speech is fast becoming a hot-button issue at colleges across the country, with campus protests often mirroring those of the public-at-large on issues such as racism or tackling institution-specific matters such as college governance. On the surface, the issue of campus free speech may seem like a purely legal concern, yet in reality,…
ERIC Educational Resources Information Center
Ferati, Mexhid Adem
2012-01-01
To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…
When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.
Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola
2017-11-01
Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.
Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A
2017-09-18
Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p < .01) and with mean length of utterance produced during a written picture description (r = .96, p < .01). Correlations between inner speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Iuzzini-Seigel, Jenya; Hogan, Tiffany P; Green, Jordan R
2017-05-24
The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and speech delay. Participants included 48 children ranging between 4;7 to 17;8 (years;months) with CAS (n = 10), CAS + language impairment (n = 10), speech delay (n = 10), language impairment (n = 9), or typical development (n = 9). Speech inconsistency was assessed at phonemic and token-to-token levels using a variety of stimuli. Children with CAS and CAS + language impairment performed equivalently on all inconsistency assessments. Children with language impairment evidenced high levels of speech inconsistency on the phrase "buy Bobby a puppy." Token-to-token inconsistency of monosyllabic words and the phrase "buy Bobby a puppy" was sensitive and specific in differentiating children with CAS and speech delay, whereas inconsistency calculated on other stimuli (e.g., multisyllabic words) was less efficacious in differentiating between these disorders. Speech inconsistency is a core feature of CAS and is efficacious in differentiating between children with CAS and speech delay; however, sensitivity and specificity are stimuli dependent.
Cortical Integration of Audio-Visual Information
Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.
2013-01-01
We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442
Martinelli, Eugenio; Mencattini, Arianna; Daprati, Elena; Di Natale, Corrado
2016-01-01
Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present 'intelligent personal assistants', and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants' emotional state, selective/differential data collection based on emotional content, etc.).
The process of spoken word recognition in the face of signal degradation.
Farris-Trimble, Ashley; McMurray, Bob; Cigrand, Nicole; Tomblin, J Bruce
2014-02-01
Though much is known about how words are recognized, little research has focused on how a degraded signal affects the fine-grained temporal aspects of real-time word recognition. The perception of degraded speech was examined in two populations with the goal of describing the time course of word recognition and lexical competition. Thirty-three postlingually deafened cochlear implant (CI) users and 57 normal hearing (NH) adults (16 in a CI-simulation condition) participated in a visual world paradigm eye-tracking task in which their fixations to a set of phonologically related items were monitored as they heard one item being named. Each degraded-speech group was compared with a set of age-matched NH participants listening to unfiltered speech. CI users and the simulation group showed a delay in activation relative to the NH listeners, and there is weak evidence that the CI users showed differences in the degree of peak and late competitor activation. In general, though, the degraded-speech groups behaved statistically similarly with respect to activation levels. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Speech-Enabled Interfaces for Travel Information Systems with Large Grammars
NASA Astrophysics Data System (ADS)
Zhao, Baoli; Allen, Tony; Bargiela, Andrzej
This paper introduces three grammar-segmentation methods capable of handling the large grammar issues associated with producing a real-time speech-enabled VXML bus travel application for London. Large grammars tend to produce relatively slow recognition interfaces and this work shows how this limitation can be successfully addressed. Comparative experimental results show that the novel last-word recognition based grammar segmentation method described here achieves an optimal balance between recognition rate, speed of processing and naturalness of interaction.
Wang, Kun-Ching
2015-01-01
The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590
Integrated Speech and Language Technology for Intelligence, Surveillance, and Reconnaissance (ISR)
2017-07-01
applying submodularity techniques to address computing challenges posed by large datasets in speech and language processing. MT and speech tools were...aforementioned research-oriented activities, the IT system administration team provided necessary support to laboratory computing and network operations...operations of SCREAM Lab computer systems and networks. Other miscellaneous activities in relation to Task Order 29 are presented in an additional fourth
Development and Perceptual Evaluation of Amplitude-Based F0 Control in Electrolarynx Speech
ERIC Educational Resources Information Center
Saikachi, Yoko; Stevens, Kenneth N.; Hillman, Robert E.
2009-01-01
Purpose: Current electrolarynx (EL) devices produce a mechanical speech quality that has been largely attributed to the lack of natural fundamental frequency (F0) variation. In order to improve the quality of EL speech, in the present study the authors aimed to develop and evaluate an automatic F0 control scheme, in which F0 was modulated based on…
ERIC Educational Resources Information Center
McConnellogue, Sheila
2011-01-01
There is a large population of children with speech, language and communication needs who have additional special educational needs (SEN). Whilst professional collaboration between education and health professionals is recommended to ensure an integrated delivery of statutory services for this population of children, formal frameworks should be…
Acoustic analysis of speech under stress.
Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish
2015-01-01
When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation.
Sennes, Luiz Ubirajara
2016-01-01
Maintaining oral function in patients undergoing glossectomy boosts interventions such as prosthetic rehabilitation. However, current literature still fails in the presentation of results of prosthetic rehabilitation in relation to speech or swallowing. The objective of this research is to evaluate the effectiveness of prosthetic rehabilitation on voice, speech, and swallowing in patients undergoing glossectomy by performing a systematic literature review and meta-analysis of individual cases. Studies were identified by relevant electronic database and included all dates available. The criteria used were sample with any n; resection due to malignant tumors, restricted to tongue and/or floor of mouth; type of prosthetic rehabilitation; and description of the oral functions outcomes with prosthesis. For the meta-analysis of individual data, associations between the variables of interest and the type of prosthesis were evaluated. Thirty-three of 471 articles met the selection criteria. Results on speech and/or voice and swallowing were reported in 27 and 28 articles, respectively. There were improvement of speech intelligibility and swallowing in 96 patients and in 73 patients, respectively, with prosthesis. Based on the available evidences, this article showed that prosthetic rehabilitation was able to improve oral functions and can be a strategy used with surgical reconstruction in selected cases. PMID:28042295
Machado-Nascimento, Nárli; Melo E Kümmer, Arthur; Lemos, Stela Maris Aguiar
2016-01-01
To systematically review the scientific production on the relationship between Attention Deficit Hyperactivity Disorder (ADHD) and Speech-language Pathology and to methodologically analyze the observational studies on the theme. Systematic review of the literature conducted at the databases Medical Literature Analysis and Retrieval System online (MEDLINE, USA), Literature of Latin America and the Caribbean Health Sciences (LILACS, Brazil) and Spanish Bibliographic Index of Health Sciences (IBECS, Spain) using the descriptors: "Language", "Language Development", "Attention Deficit Hyperactivity Disorder", "ADHD" and "Auditory Perception". Articles published between 2008 and 2013. Inclusion criteria: full articles published in national and international journals from 2008 to 2013. Exclusion criteria: articles not focused on the speech-language pathology alterations present in the attention deficit hyperactivity disorder. The articles were read in full and the data were extracted for characterization of methodology and content. The 23 articles found were separated according to two themes: Speech-language Pathology and Attention Deficit Hyperactivity Disorder. The study of the scientific production revealed that the alterations most commonly discussed were reading disorders and that there are few reports on the relationship between auditory processing and these disorders, as well as on the role of the speech-language pathologist in the evaluation and treatment of children with Attention Deficit Hyperactivity Disorder.
Why are background telephone conversations distracting?
Marsh, John E; Ljung, Robert; Jahncke, Helena; MacCutcheon, Douglas; Pausch, Florian; Ball, Linden J; Vachon, François
2018-06-01
Telephone conversation is ubiquitous within the office setting. Overhearing a telephone conversation-whereby only one of the two speakers is heard-is subjectively more annoying and objectively more distracting than overhearing a full conversation. The present study sought to determine whether this "halfalogue" effect is attributable to unexpected offsets and onsets within the background speech (acoustic unexpectedness) or to the tendency to predict the unheard part of the conversation (semantic [un]predictability), and whether these effects can be shielded against through top-down cognitive control. In Experiment 1, participants performed an office-related task in quiet or in the presence of halfalogue and dialogue background speech. Irrelevant speech was either meaningful or meaningless speech. The halfalogue effect was only present for the meaningful speech condition. Experiment 2 addressed whether higher task-engagement could shield against the halfalogue effect by manipulating the font of the to-be-read material. Although the halfalogue effect was found with an easy-to-read font (fluent text), the use of a difficult-to-read font (disfluent text) eliminated the effect. The halfalogue effect is thus attributable to the semantic (un)predictability, not the acoustic unexpectedness, of background telephone conversation and can be prevented by simple means such as increasing the level of engagement required by the focal task. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
New Developments in Understanding the Complexity of Human Speech Production.
Simonyan, Kristina; Ackermann, Hermann; Chang, Edward F; Greenlee, Jeremy D
2016-11-09
Speech is one of the most unique features of human communication. Our ability to articulate our thoughts by means of speech production depends critically on the integrity of the motor cortex. Long thought to be a low-order brain region, exciting work in the past years is overturning this notion. Here, we highlight some of major experimental advances in speech motor control research and discuss the emerging findings about the complexity of speech motocortical organization and its large-scale networks. This review summarizes the talks presented at a symposium at the Annual Meeting of the Society of Neuroscience; it does not represent a comprehensive review of contemporary literature in the broader field of speech motor control. Copyright © 2016 the authors 0270-6474/16/3611440-09$15.00/0.
Simonyan, Kristina; Fuertinger, Stefan
2015-04-01
Speech production is one of the most complex human behaviors. Although brain activation during speaking has been well investigated, our understanding of interactions between the brain regions and neural networks remains scarce. We combined seed-based interregional correlation analysis with graph theoretical analysis of functional MRI data during the resting state and sentence production in healthy subjects to investigate the interface and topology of functional networks originating from the key brain regions controlling speech, i.e., the laryngeal/orofacial motor cortex, inferior frontal and superior temporal gyri, supplementary motor area, cingulate cortex, putamen, and thalamus. During both resting and speaking, the interactions between these networks were bilaterally distributed and centered on the sensorimotor brain regions. However, speech production preferentially recruited the inferior parietal lobule (IPL) and cerebellum into the large-scale network, suggesting the importance of these regions in facilitation of the transition from the resting state to speaking. Furthermore, the cerebellum (lobule VI) was the most prominent region showing functional influences on speech-network integration and segregation. Although networks were bilaterally distributed, interregional connectivity during speaking was stronger in the left vs. right hemisphere, which may have underlined a more homogeneous overlap between the examined networks in the left hemisphere. Among these, the laryngeal motor cortex (LMC) established a core network that fully overlapped with all other speech-related networks, determining the extent of network interactions. Our data demonstrate complex interactions of large-scale brain networks controlling speech production and point to the critical role of the LMC, IPL, and cerebellum in the formation of speech production network. Copyright © 2015 the American Physiological Society.
NASA Technical Reports Server (NTRS)
2014-01-01
Topics include: Real-Time Minimization of Tracking Error for Aircraft Systems; Detecting an Extreme Minority Class in Hyperspectral Data Using Machine Learning; KSC Spaceport Weather Data Archive; Visualizing Acquisition, Processing, and Network Statistics Through Database Queries; Simulating Data Flow via Multiple Secure Connections; Systems and Services for Near-Real-Time Web Access to NPP Data; CCSDS Telemetry Decoder VHDL Core; Thermal Response of a High-Power Switch to Short Pulses; Solar Panel and System Design to Reduce Heating and Optimize Corridors for Lower-Risk Planetary Aerobraking; Low-Cost, Very Large Diamond-Turned Metal Mirror; Very-High-Load-Capacity Air Bearing Spindle for Large Diamond Turning Machines; Elevated-Temperature, Highly Emissive Coating for Energy Dissipation of Large Surfaces; Catalyst for Treatment and Control of Post-Combustion Emissions; Thermally Activated Crack Healing Mechanism for Metallic Materials; Subsurface Imaging of Nanocomposites; Self-Healing Glass Sealants for Solid Oxide Fuel Cells and Electrolyzer Cells; Micromachined Thermopile Arrays with Novel Thermo - electric Materials; Low-Cost, High-Performance MMOD Shielding; Head-Mounted Display Latency Measurement Rig; Workspace-Safe Operation of a Force- or Impedance-Controlled Robot; Cryogenic Mixing Pump with No Moving Parts; Seal Design Feature for Redundancy Verification; Dexterous Humanoid Robot; Tethered Vehicle Control and Tracking System; Lunar Organic Waste Reformer; Digital Laser Frequency Stabilization via Cavity Locking Employing Low-Frequency Direct Modulation; Deep UV Discharge Lamps in Capillary Quartz Tubes with Light Output Coupled to an Optical Fiber; Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems, Version II; Advanced Sensor Technology for Algal Biotechnology; High-Speed Spectral Mapper; "Ascent - Commemorating Shuttle" - A NASA Film and Multimedia Project DVD; High-Pressure, Reduced-Kinetics Mechanism for N-Hexadecane Oxidation; Method of Error Floor Mitigation in Low-Density Parity-Check Codes; X-Ray Flaw Size Parameter for POD Studies; Large Eddy Simulation Composition Equations for Two-Phase Fully Multicomponent Turbulent Flows; Scheduling Targeted and Mapping Observations with State, Resource, and Timing Constraints;
DETECTION AND IDENTIFICATION OF SPEECH SOUNDS USING CORTICAL ACTIVITY PATTERNS
Centanni, T.M.; Sloan, A.M.; Reed, A.C.; Engineer, C.T.; Rennaker, R.; Kilgard, M.P.
2014-01-01
We have developed a classifier capable of locating and identifying speech sounds using activity from rat auditory cortex with an accuracy equivalent to behavioral performance without the need to specify the onset time of the speech sounds. This classifier can identify speech sounds from a large speech set within 40 ms of stimulus presentation. To compare the temporal limits of the classifier to behavior, we developed a novel task that requires rats to identify individual consonant sounds from a stream of distracter consonants. The classifier successfully predicted the ability of rats to accurately identify speech sounds for syllable presentation rates up to 10 syllables per second (up to 17.9 ± 1.5 bits/sec), which is comparable to human performance. Our results demonstrate that the spatiotemporal patterns generated in primary auditory cortex can be used to quickly and accurately identify consonant sounds from a continuous speech stream without prior knowledge of the stimulus onset times. Improved understanding of the neural mechanisms that support robust speech processing in difficult listening conditions could improve the identification and treatment of a variety of speech processing disorders. PMID:24286757
The predictive roles of neural oscillations in speech motor adaptability.
Sengupta, Ranit; Nasir, Sazzad M
2016-06-01
The human speech system exhibits a remarkable flexibility by adapting to alterations in speaking environments. While it is believed that speech motor adaptation under altered sensory feedback involves rapid reorganization of speech motor networks, the mechanisms by which different brain regions communicate and coordinate their activity to mediate adaptation remain unknown, and explanations of outcome differences in adaption remain largely elusive. In this study, under the paradigm of altered auditory feedback with continuous EEG recordings, the differential roles of oscillatory neural processes in motor speech adaptability were investigated. The predictive capacities of different EEG frequency bands were assessed, and it was found that theta-, beta-, and gamma-band activities during speech planning and production contained significant and reliable information about motor speech adaptability. It was further observed that these bands do not work independently but interact with each other suggesting an underlying brain network operating across hierarchically organized frequency bands to support motor speech adaptation. These results provide novel insights into both learning and disorders of speech using time frequency analysis of neural oscillations. Copyright © 2016 the American Physiological Society.
Lexical effects on speech production and intelligibility in Parkinson's disease
NASA Astrophysics Data System (ADS)
Chiu, Yi-Fang
Individuals with Parkinson's disease (PD) often have speech deficits that lead to reduced speech intelligibility. Previous research provides a rich database regarding the articulatory deficits associated with PD including restricted vowel space (Skodda, Visser, & Schlegel, 2011) and flatter formant transitions (Tjaden & Wilding, 2004; Walsh & Smith, 2012). However, few studies consider the effect of higher level structural variables of word usage frequency and the number of similar sounding words (i.e. neighborhood density) on lower level articulation or on listeners' perception of dysarthric speech. The purpose of the study is to examine the interaction of lexical properties and speech articulation as measured acoustically in speakers with PD and healthy controls (HC) and the effect of lexical properties on the perception of their speech. Individuals diagnosed with PD and age-matched healthy controls read sentences with words that varied in word frequency and neighborhood density. Acoustic analysis was performed to compare second formant transitions in diphthongs, an indicator of the dynamics of tongue movement during speech production, across different lexical characteristics. Young listeners transcribed the spoken sentences and the transcription accuracy was compared across lexical conditions. The acoustic results indicate that both PD and HC speakers adjusted their articulation based on lexical properties but the PD group had significant reductions in second formant transitions compared to HC. Both groups of speakers increased second formant transitions for words with low frequency and low density, but the lexical effect is diphthong dependent. The change in second formant slope was limited in the PD group when the required formant movement for the diphthong is small. The data from listeners' perception of the speech by PD and HC show that listeners identified high frequency words with greater accuracy suggesting the use of lexical knowledge during the recognition process. The relationship between acoustic results and perceptual accuracy is limited in this study suggesting that listeners incorporate acoustic and non-acoustic information to maximize speech intelligibility.
Scaling and universality in the human voice.
Luque, Jordi; Luque, Bartolo; Lacasa, Lucas
2015-04-06
Speech is a distinctive complex feature of human capabilities. In order to understand the physics underlying speech production, in this work, we empirically analyse the statistics of large human speech datasets ranging several languages. We first show that during speech, the energy is unevenly released and power-law distributed, reporting a universal robust Gutenberg-Richter-like law in speech. We further show that such 'earthquakes in speech' show temporal correlations, as the interevent statistics are again power-law distributed. As this feature takes place in the intraphoneme range, we conjecture that the process responsible for this complex phenomenon is not cognitive, but it resides in the physiological (mechanical) mechanisms of speech production. Moreover, we show that these waiting time distributions are scale invariant under a renormalization group transformation, suggesting that the process of speech generation is indeed operating close to a critical point. These results are put in contrast with current paradigms in speech processing, which point towards low dimensional deterministic chaos as the origin of nonlinear traits in speech fluctuations. As these latter fluctuations are indeed the aspects that humanize synthetic speech, these findings may have an impact in future speech synthesis technologies. Results are robust and independent of the communication language or the number of speakers, pointing towards a universal pattern and yet another hint of complexity in human speech. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Teaching About the Constitution.
ERIC Educational Resources Information Center
White, Charles S.
1988-01-01
Reviews "The U.S. Constitution Then and Now," a two-unit program using the integrated database and word processing capabilities of AppleWorks. For grades 7-12, the units simulate the constitutional convention and the principles of free speech and privacy. Concludes that with adequate time, the program can provide a potentially powerful…
Communication Apprehension. Focused Access to Selected Topics (FAST) Bibliography No. 15.
ERIC Educational Resources Information Center
Shermis, Michael
This annotated bibliography contains 31 references of articles and papers in the ERIC database that deal with communication apprehension (CA). The first section provides strategies for instructors and students to alleviate communication apprehension, speech anxiety, stage fright, and other problems people have with public speaking. The second…
Evidence-Based Clinical Voice Assessment: A Systematic Review
ERIC Educational Resources Information Center
Roy, Nelson; Barkmeier-Kraemer, Julie; Eadie, Tanya; Sivasankar, M. Preeti; Mehta, Daryush; Paul, Diane; Hillman, Robert
2013-01-01
Purpose: To determine what research evidence exists to support the use of voice measures in the clinical assessment of patients with voice disorders. Method: The American Speech-Language-Hearing Association (ASHA) National Center for Evidence-Based Practice in Communication Disorders staff searched 29 databases for peer-reviewed English-language…
12p13.33 microdeletion including ELKS/ERC1, a new locus associated with childhood apraxia of speech
Thevenon, Julien; Callier, Patrick; Andrieux, Joris; Delobel, Bruno; David, Albert; Sukno, Sylvie; Minot, Delphine; Mosca Anne, Laure; Marle, Nathalie; Sanlaville, Damien; Bonnet, Marlène; Masurel-Paulet, Alice; Levy, Fabienne; Gaunt, Lorraine; Farrell, Sandra; Le Caignec, Cédric; Toutain, Annick; Carmignac, Virginie; Mugneret, Francine; Clayton-Smith, Jill; Thauvin-Robinet, Christel; Faivre, Laurence
2013-01-01
Speech sound disorders are heterogeneous conditions, and sporadic and familial cases have been described. However, monogenic inheritance explains only a small proportion of such disorders, in particular in cases with childhood apraxia of speech (CAS). Deletions of <5 Mb involving the 12p13.33 locus is one of the least commonly deleted subtelomeric regions. Only four patients have been reported with such a deletion diagnosed with fluorescence in situ hybridisation telomere analysis or array CGH. To further delineate this rare microdeletional syndrome, a French collaboration together with a search in the Decipher database allowed us to gather nine new patients with a 12p13.33 subtelomeric or interstitial rearrangement identified by array CGH. Speech delay was found in all patients, which could be defined as CAS when patients had been evaluated by a speech therapist (5/9 patients). Intellectual deficiency was found in 5/9 patients only, and often associated with psychiatric manifestations of various severity. Two such deletions were inherited from an apparently healthy parent, but reevaluation revealed abnormal speech production at least in childhood, suggesting variable expressivity. The ELKS/ERC1 gene, which encodes for a synaptic factor, is found in the smallest region of overlap. These results reinforce the hypothesis that deletions of the 12p13.33 locus may be responsible for variable phenotypes including CAS associated with neurobehavioural troubles and that the presence of CAS justifies a genetic work-up. PMID:22713806
Noise-robust speech recognition through auditory feature detection and spike sequence decoding.
Schafer, Phillip B; Jin, Dezhe Z
2014-03-01
Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
Accurate visible speech synthesis based on concatenating variable length motion capture data.
Ma, Jiyong; Cole, Ron; Pellom, Bryan; Ward, Wayne; Wise, Barbara
2006-01-01
We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is desrcribed. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergarten through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.
Adaptation to spectrally-rotated speech.
Green, Tim; Rosen, Stuart; Faulkner, Andrew; Paterson, Ruth
2013-08-01
Much recent interest surrounds listeners' abilities to adapt to various transformations that distort speech. An extreme example is spectral rotation, in which the spectrum of low-pass filtered speech is inverted around a center frequency (2 kHz here). Spectral shape and its dynamics are completely altered, rendering speech virtually unintelligible initially. However, intonation, rhythm, and contrasts in periodicity and aperiodicity are largely unaffected. Four normal hearing adults underwent 6 h of training with spectrally-rotated speech using Continuous Discourse Tracking. They and an untrained control group completed pre- and post-training speech perception tests, for which talkers differed from the training talker. Significantly improved recognition of spectrally-rotated sentences was observed for trained, but not untrained, participants. However, there were no significant improvements in the identification of medial vowels in /bVd/ syllables or intervocalic consonants. Additional tests were performed with speech materials manipulated so as to isolate the contribution of various speech features. These showed that preserving intonational contrasts did not contribute to the comprehension of spectrally-rotated speech after training, and suggested that improvements involved adaptation to altered spectral shape and dynamics, rather than just learning to focus on speech features relatively unaffected by the transformation.
Stylistic gait synthesis based on hidden Markov models
NASA Astrophysics Data System (ADS)
Tilmanne, Joëlle; Moinet, Alexis; Dutoit, Thierry
2012-12-01
In this work we present an expressive gait synthesis system based on hidden Markov models (HMMs), following and modifying a procedure originally developed for speaking style adaptation, in speech synthesis. A large database of neutral motion capture walk sequences was used to train an HMM of average walk. The model was then used for automatic adaptation to a particular style of walk using only a small amount of training data from the target style. The open source toolkit that we adapted for motion modeling also enabled us to take into account the dynamics of the data and to model accurately the duration of each HMM state. We also address the assessment issue and propose a procedure for qualitative user evaluation of the synthesized sequences. Our tests show that the style of these sequences can easily be recognized and look natural to the evaluators.
Liu, Xiaoluan; Xu, Yi
2015-01-01
This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role.
Liu, Xiaoluan; Xu, Yi
2015-01-01
This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role. PMID:26217252
Cortical activation patterns correlate with speech understanding after cochlear implantation
Olds, Cristen; Pollonini, Luca; Abaya, Homer; Larky, Jannine; Loy, Megan; Bortfeld, Heather; Beauchamp, Michael S.; Oghalai, John S.
2015-01-01
Objectives Cochlear implants are a standard therapy for deafness, yet the ability of implanted patients to understand speech varies widely. To better understand this variability in outcomes, we used functional near-infrared spectroscopy (fNIRS) to image activity within regions of the auditory cortex and compare the results to behavioral measures of speech perception. Design We studied 32 deaf adults hearing through cochlear implants and 35 normal-hearing controls. We used fNIRS to measure responses within the lateral temporal lobe and the superior temporal gyrus to speech stimuli of varying intelligibility. The speech stimuli included normal speech, channelized speech (vocoded into 20 frequency bands), and scrambled speech (the 20 frequency bands were shuffled in random order). We also used environmental sounds as a control stimulus. Behavioral measures consisted of the Speech Reception Threshold, CNC words, and AzBio Sentence tests measured in quiet. Results Both control and implanted participants with good speech perception exhibited greater cortical activations to natural speech than to unintelligible speech. In contrast, implanted participants with poor speech perception had large, indistinguishable cortical activations to all stimuli. The ratio of cortical activation to normal speech to that of scrambled speech directly correlated with the CNC Words and AzBio Sentences scores. This pattern of cortical activation was not correlated with auditory threshold, age, side of implantation, or time after implantation. Turning off the implant reduced cortical activations in all implanted participants. Conclusions Together, these data indicate that the responses we measured within the lateral temporal lobe and the superior temporal gyrus correlate with behavioral measures of speech perception, demonstrating a neural basis for the variability in speech understanding outcomes after cochlear implantation. PMID:26709749
Grossman, Murray; Powers, John; Ash, Sherry; McMillan, Corey; Burkholder, Lisa; Irwin, David; Trojanowski, John Q.
2012-01-01
Non-fluent/agrammatic primary progressive aphasia (naPPA) is a progressive neurodegenerative condition most prominently associated with slowed, effortful speech. A clinical imaging marker of naPPA is disease centered in the left inferior frontal lobe. We used multimodal imaging to assess large-scale neural networks underlying effortful expression in 15 patients with sporadic naPPA due to frontotemporal lobar degeneration (FTLD) spectrum pathology. Effortful speech in these patients is related in part to impaired grammatical processing, and to phonologic speech errors. Gray matter (GM) imaging shows frontal and anterior-superior temporal atrophy, most prominently in the left hemisphere. Diffusion tensor imaging reveals reduced fractional anisotropy in several white matter (WM) tracts mediating projections between left frontal and other GM regions. Regression analyses suggest disruption of three large-scale GM-WM neural networks in naPPA that support fluent, grammatical expression. These findings emphasize the role of large-scale neural networks in language, and demonstrate associated language deficits in naPPA. PMID:23218686
Griffiths, Sarah; Barnes, Rebecca; Britten, Nicky; Wilkinson, Ray
2011-01-01
Around 70% of people who develop Parkinson's disease (PD) experience speech and voice changes. Clinicians often find that when asked about their primary communication concerns, PD clients will talk about the difficulties they have 'getting into' conversations. This is an important area for clients and it has implications for quality of life and clinical management. To review the extant literature on PD and communication impairments in order to reveal key topic areas, the range of methodologies applied, and any gaps in knowledge relating to PD and social interaction and how these might be usefully addressed. A systematic search of a number of key databases and available grey literatures regarding PD and communication impairment was conducted (including motor speech changes, intelligibility, cognitive/language changes) to obtain a sense of key areas and methodologies applied. Research applying conversation analysis in the field of communication disability was also reviewed to illustrate the value of this methodology in uncovering common interactional difficulties, and in revealing the use of strategic collaborative competencies in naturally occurring conversation. In addition, available speech and language therapy assessment and intervention approaches to PD were examined with a view to their effectiveness in promoting individualized intervention planning and advice-giving for everyday interaction. A great deal has been written about the deficits underpinning communication changes in PD and the impact of communication disability on the self and others as measured in a clinical setting. Less is known about what happens for this client group in everyday conversations outside of the clinic. Current speech and language therapy assessments and interventions focus on the individual and are largely impairment based or focused on compensatory speaker-oriented techniques. A conversation analysis approach would complement basic research on what actually happens in everyday conversation for people with PD and their co-participants. The potential benefits of a conversation analysis approach to communication disability in PD include enabling a shift in clinical focus from individual impairment onto strategic collaborative competencies. This would have implications for client-centred intervention planning and the development of new and complementary clinical resources addressing participation. The impact would be new and improved support for those living with the condition as well as their families and carers. © 2011 Royal College of Speech & Language Therapists.
NASA Technical Reports Server (NTRS)
2002-01-01
A system that retrieves problem reports from a NASA database is described. The database is queried with natural language questions. Part-of-speech tags are first assigned to each word in the question using a rule based tagger. A partial parse of the question is then produced with independent sets of deterministic finite state a utomata. Using partial parse information, a look up strategy searches the database for problem reports relevant to the question. A bigram stemmer and irregular verb conjugates have been incorporated into the system to improve accuracy. The system is evaluated by a set of fifty five questions posed by NASA engineers. A discussion of future research is also presented.
Five Lectures on Artificial Intelligence
1974-09-01
large systems The current projects on speech understanding (which I will describe iater) are an exception to this, dealing explicitly with the problem...learns that "Fred lives in Sydney", we must find some new fact to resolve the tension — 1 SPEECH UNDERSTANDING SYSTEMS perhaps he lives in a zco It is...possible Speech Understanding Systems Most of the problems described above might be characterized as relating to the chunking of knowledge. Such ideas are
Communication in a noisy environment: Perception of one's own voice and speech enhancement
NASA Astrophysics Data System (ADS)
Le Cocq, Cecile
Workers in noisy industrial environments are often confronted to communication problems. Lost of workers complain about not being able to communicate easily with their coworkers when they wear hearing protectors. In consequence, they tend to remove their protectors, which expose them to the risk of hearing loss. In fact this communication problem is a double one: first the hearing protectors modify one's own voice perception; second they interfere with understanding speech from others. This double problem is examined in this thesis. When wearing hearing protectors, the modification of one's own voice perception is partly due to the occlusion effect which is produced when an earplug is inserted in the car canal. This occlusion effect has two main consequences: first the physiological noises in low frequencies are better perceived, second the perception of one's own voice is modified. In order to have a better understanding of this phenomenon, the literature results are analyzed systematically, and a new method to quantify the occlusion effect is developed. Instead of stimulating the skull with a bone vibrator or asking the subject to speak as is usually done in the literature, it has been decided to excite the buccal cavity with an acoustic wave. The experiment has been designed in such a way that the acoustic wave which excites the buccal cavity does not excite the external car or the rest of the body directly. The measurement of the hearing threshold in open and occluded car has been used to quantify the subjective occlusion effect for an acoustic wave in the buccal cavity. These experimental results as well as those reported in the literature have lead to a better understanding of the occlusion effect and an evaluation of the role of each internal path from the acoustic source to the internal car. The speech intelligibility from others is altered by both the high sound levels of noisy industrial environments and the speech signal attenuation due to hearing protectors. A possible solution to this problem is to denoise the speech signal and transmit it under the hearing protector. Lots of denoising techniques are available and are often used for denoising speech in telecommunication. In the framework of this thesis, denoising by wavelet thresholding is considered. A first study on "classical" wavelet denoising technics is conducted in order to evaluate their performance in noisy industrial environments. The tested speech signals are altered by industrial noises according to a wide range of signal to noise ratios. The speech denoised signals are evaluated with four criteria. A large database is obtained and analyzed with a selection algorithm which has been designed for this purpose. This first study has lead to the identification of the influence from the different parameters of the wavelet denoising method on its quality and has identified the "classical" method which has given the best performances in terms of denoising quality. This first study has also generated ideas for designing a new thresholding rule suitable for speech wavelet denoising in an industrial noisy environment. In a second study, this new thresholding rule is presented and evaluated. Its performances are better than the "classical" method found in the first study when the signal to noise ratio from the speech signal is between --10 dB and 15 dB.
Alomar, Soha; King, Nicolas K K; Tam, Joseph; Bari, Ausaf A; Hamani, Clement; Lozano, Andres M
2017-01-01
The thalamus has been a surgical target for the treatment of various movement disorders. Commonly used therapeutic modalities include ablative and nonablative procedures. A major clinical side effect of thalamic surgery is the appearance of speech problems. This review summarizes the data on the development of speech problems after thalamic surgery. A systematic review and meta-analysis was performed using nine databases, including Medline, Web of Science, and Cochrane Library. We also checked for articles by searching citing and cited articles. We retrieved studies between 1960 and September 2014. Of a total of 2,320 patients, 19.8% (confidence interval: 14.8-25.9) had speech difficulty after thalamotomy. Speech difficulty occurred in 15% (confidence interval: 9.8-22.2) of those treated with a unilaterally and 40.6% (confidence interval: 29.5-52.8) of those treated bilaterally. Speech impairment was noticed 2- to 3-fold more commonly after left-sided procedures (40.7% vs. 15.2%). Of the 572 patients that underwent DBS, 19.4% (confidence interval: 13.1-27.8) experienced speech difficulty. Subgroup analysis revealed that this complication occurs in 10.2% (confidence interval: 7.4-13.9) of patients treated unilaterally and 34.6% (confidence interval: 21.6-50.4) treated bilaterally. After thalamotomy, the risk was higher in Parkinson's patients compared to patients with essential tremor: 19.8% versus 4.5% in the unilateral group and 42.5% versus 13.9% in the bilateral group. After DBS, this rate was higher in essential tremor patients. Both lesioning and stimulation thalamic surgery produce adverse effects on speech. Left-sided and bilateral procedures are approximately 3-fold more likely to cause speech difficulty. This effect was higher after thalamotomy compared to DBS. In the thalamotomy group, the risk was higher in Parkinson's patients, whereas in the DBS group it was higher in patients with essential tremor. Understanding the pathophysiology of speech disturbance after thalamic procedures is a priority. © 2017 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
Speech and orthodontic appliances: a systematic literature review.
Chen, Junyu; Wan, Jia; You, Lun
2018-01-23
Various types of orthodontic appliances can lead to speech difficulties. However, speech difficulties caused by orthodontic appliances have not been sufficiently investigated by an evidence-based method. The aim of this study is to outline the scientific evidence and mechanism of the speech difficulties caused by orthodontic appliances. Randomized-controlled clinical trials (RCT), controlled clinical trials, and cohort studies focusing on the effect of orthodontic appliances on speech were included. A systematic search was conducted by an electronic search in PubMed, EMBASE, and the Cochrane Library databases, complemented by a manual search. The types of orthodontic appliances, the affected sounds, and duration period of the speech disturbances were extracted. The ROBINS-I tool was applied to evaluate the quality of non-randomized studies, and the bias of RCT was assessed based on the Cochrane Handbook for Systematic Reviews of Interventions. No meta-analyses could be performed due to the heterogeneity in the study designs and treatment modalities. Among 448 screened articles, 13 studies were included (n = 297 patients). Different types of orthodontic appliances such as fixed appliances, orthodontic retainers and palatal expanders could influence the clarity of speech. The /i/, /a/, and /e/ vowels as well as /s/, /z/, /l/, /t/, /d/, /r/, and /ʃ/ consonants could be distorted by appliances. Although most speech impairments could return to normal within weeks, speech distortion of the /s/ sound might last for more than 3 months. The low evidence level grading and heterogeneity were the two main limitations in this systematic review. Lingual fixed appliances, palatal expanders, and Hawley retainers have an evident influence on speech production. The /i/, /s/, /t/, and /d/ sounds are the primarily affected ones. The results of this systematic review should be interpreted with caution and more high-quality RCTs with larger sample sizes and longer follow-up periods are needed. The protocol for this systematic review (CRD42017056573) was registered in the International Prospective Register of Systematic Reviews (PROSPERO). © The Author(s) 2017. Published by Oxford University Press on behalf of the European Orthodontic Society. All rights reserved. For permissions, please email: journals.permissions@oup.com
Perceived gender in clear and conversational speech
NASA Astrophysics Data System (ADS)
Booz, Jaime A.
Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated with making adjustments to improve speech clarity have a larger impact on perceived femininity than others. Using a clear speech strategy alone may not be sufficient for a male speaker to be perceived as female, but could be used as one of many tools to help speakers achieve more "feminine" speech, in conjunction with more specific strategies targeting the acoustic parameters outlined in this study.
Speech therapy for children with dysarthria acquired before three years of age.
Pennington, Lindsay; Parker, Naomi K; Kelly, Helen; Miller, Nick
2016-07-18
Children with motor impairments often have the motor speech disorder dysarthria, a condition which effects the tone, strength and co-ordination of any or all of the muscles used for speech. Resulting speech difficulties can range from mild, with slightly slurred articulation and breathy voice, to profound, with an inability to produce any recognisable words. Children with dysarthria are often prescribed communication aids to supplement their natural forms of communication. However, there is variation in practice regarding the provision of therapy focusing on voice and speech production. Descriptive studies have suggested that therapy may improve speech, but its effectiveness has not been evaluated. To assess whether any speech and language therapy intervention aimed at improving the speech of children with dysarthria is more effective in increasing children's speech intelligibility or communicative participation than no intervention at all , and to compare the efficacy of individual types of speech language therapy in improving the speech intelligibility or communicative participation of children with dysarthria. We searched the Cochrane Central Register of Controlled Trials (CENTRAL; 2015 , Issue 7 ), MEDLINE, EMBASE, CINAHL , LLBA, ERIC, PsychInfo, Web of Science, Scopus, UK National Research Register and Dissertation Abstracts up to July 2015, handsearched relevant journals published between 1980 and July 2015, and searched proceedings of relevant conferences between 1996 to 2015. We placed no restrictions on the language or setting of the studies. A previous version of this review considered studies published up to April 2009. In this update we searched for studies published from April 2009 to July 2015. We considered randomised controlled trials and studies using quasi-experimental designs in which children were allocated to groups using non-random methods. One author (LP) conducted searches of all databases, journals and conference reports. All searches included a reliability check in which a second review author independently checked a random sample comprising 15% of all identified reports. We planned that two review authors would independently assess the quality and extract data from eligible studies. No randomised controlled trials or group studies were identified. This review found no evidence from randomised trials of the effectiveness of speech and language therapy interventions to improve the speech of children with early acquired dysarthria. Rigorous, fully powered randomised controlled trials are needed to investigate if the positive changes in children's speech observed in phase I and phase II studies are generalisable to the population of children with early acquired dysarthria served by speech and language therapy services. Research should examine change in children's speech production and intelligibility. It must also investigate children's participation in social and educational activities, and their quality of life, as well as the cost and acceptability of interventions.
Perceptual Learning of Speech under Optimal and Adverse Conditions
Zhang, Xujin; Samuel, Arthur G.
2014-01-01
Humans have a remarkable ability to understand spoken language despite the large amount of variability in speech. Previous research has shown that listeners can use lexical information to guide their interpretation of atypical sounds in speech (Norris, McQueen, & Cutler, 2003). This kind of lexically induced perceptual learning enables people to adjust to the variations in utterances due to talker-specific characteristics, such as individual identity and dialect. The current study investigated perceptual learning in two optimal conditions: conversational speech (Experiment 1) vs. clear speech (Experiment 2), and three adverse conditions: noise (Experiment 3a) vs. two cognitive loads (Experiments 4a & 4b). Perceptual learning occurred in the two optimal conditions and in the two cognitive load conditions, but not in the noise condition. Furthermore, perceptual learning occurred only in the first of two sessions for each participant, and only for atypical /s/ sounds and not for atypical /f/ sounds. This pattern of learning and non-learning reflects a balance between flexibility and stability that the speech system must have to deal with speech variability in the diverse conditions that speech is encountered. PMID:23815478
NASA Astrophysics Data System (ADS)
Kim, Yunjung; Weismer, Gary; Kent, Ray D.
2005-09-01
In previous work [J. Acoust. Soc. Am. 117, 2605 (2005)], we reported on formant trajectory characteristics of a relatively large number of speakers with dysarthria and near-normal speech intelligibility. The purpose of that analysis was to begin a documentation of the variability, within relatively homogeneous speech-severity groups, of acoustic measures commonly used to predict across-speaker variation in speech intelligibility. In that study we found that even with near-normal speech intelligibility (90%-100%), many speakers had reduced formant slopes for some words and distributional characteristics of acoustic measures that were different than values obtained from normal speakers. In the current report we extend those findings to a group of speakers with dysarthria with somewhat poorer speech intelligibility than the original group. Results are discussed in terms of the utility of certain acoustic measures as indices of speech intelligibility, and as explanatory data for theories of dysarthria. [Work supported by NIH Award R01 DC00319.
Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals
Haro, Martín; Serrà, Joan; Herrera, Perfecto; Corral, Álvaro
2012-01-01
Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources. PMID:22479497
Tysome, James R; Moorthy, Ram; Lee, Ambrose; Jiang, Dan; O'Connor, Alec Fitzgerald
2010-12-01
A systematic review to determine whether middle ear implants (MEIs) improve hearing as much as hearing aids. Databases included MEDLINE, EMBASE, DARE, and Cochrane searched with no language restrictions from 1950 or the start date of each database. Initial search found 644 articles, of which 17 met the inclusion criteria of MEI in adults with a sensorineural hearing loss, where hearing outcomes and patient-reported outcome measures (PROMs) compared MEI with conventional hearing aids (CHAs). Study quality assessment included whether ethical approval was gained, the study was prospective, eligibility criteria specified, a power calculation made and appropriate controls, outcome measures, and analysis performed. Middle ear implant outcome analysis included residual hearing, complications, and comparison to CHA in terms of functional gain, speech perception in quiet and in noise, and validated PROM questionnaires. Because of heterogeneity of outcome measures, comparisons were made by structured review. The quality of studies was moderate to poor with short follow-up. The evidence supports the use of MEI because, overall, they do not decrease residual hearing, result in a functional gain in hearing comparable to CHA, and may improve perception of speech in noise and sound quality. We recommend the publication of long-term results comparing MEI with CHA, reporting a minimum of functional gain, speech perception in quiet and in noise, complications, and a validated PROM to guide the engineering of the new generation of MEI in the future.
Martinelli, Eugenio; Mencattini, Arianna; Di Natale, Corrado
2016-01-01
Humans can communicate their emotions by modulating facial expressions or the tone of their voice. Albeit numerous applications exist that enable machines to read facial emotions and recognize the content of verbal messages, methods for speech emotion recognition are still in their infancy. Yet, fast and reliable applications for emotion recognition are the obvious advancement of present ‘intelligent personal assistants’, and may have countless applications in diagnostics, rehabilitation and research. Taking inspiration from the dynamics of human group decision-making, we devised a novel speech emotion recognition system that applies, for the first time, a semi-supervised prediction model based on consensus. Three tests were carried out to compare this algorithm with traditional approaches. Labeling performances relative to a public database of spontaneous speeches are reported. The novel system appears to be fast, robust and less computationally demanding than traditional methods, allowing for easier implementation in portable voice-analyzers (as used in rehabilitation, research, industry, etc.) and for applications in the research domain (such as real-time pairing of stimuli to participants’ emotional state, selective/differential data collection based on emotional content, etc.). PMID:27563724
Cummine, Jacqueline; Cribben, Ivor; Luu, Connie; Kim, Esther; Bahktiari, Reyhaneh; Georgiou, George; Boliek, Carol A
2016-05-01
The neural circuitry associated with language processing is complex and dynamic. Graphical models are useful for studying complex neural networks as this method provides information about unique connectivity between regions within the context of the entire network of interest. Here, the authors explored the neural networks during covert reading to determine the role of feedforward and feedback loops in covert speech production. Brain activity of skilled adult readers was assessed in real word and pseudoword reading tasks with functional MRI (fMRI). The authors provide evidence for activity coherence in the feedforward system (inferior frontal gyrus-supplementary motor area) during real word reading and in the feedback system (supramarginal gyrus-precentral gyrus) during pseudoword reading. Graphical models provided evidence of an extensive, highly connected, neural network when individuals read real words that relied on coordination of the feedforward system. In contrast, when individuals read pseudowords the authors found a limited/restricted network that relied on coordination of the feedback system. Together, these results underscore the importance of considering multiple pathways and articulatory loops during language tasks and provide evidence for a print-to-speech neural network. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Dynamic action units slip in speech production errors ☆
Goldstein, Louis; Pouplier, Marianne; Chen, Larissa; Saltzman, Elliot; Byrd, Dani
2008-01-01
In the past, the nature of the compositional units proposed for spoken language has largely diverged from the types of control units pursued in the domains of other skilled motor tasks. A classic source of evidence as to the units structuring speech has been patterns observed in speech errors – “slips of the tongue”. The present study reports, for the first time, on kinematic data from tongue and lip movements during speech errors elicited in the laboratory using a repetition task. Our data are consistent with the hypothesis that speech production results from the assembly of dynamically defined action units – gestures – in a linguistically structured environment. The experimental results support both the presence of gestural units and the dynamical properties of these units and their coordination. This study of speech articulation shows that it is possible to develop a principled account of spoken language within a more general theory of action. PMID:16822494
[Improving the speech with a prosthetic construction].
Stalpers, M J; Engelen, M; van der Stappen, J A A M; Weijs, W L J; Takes, R P; van Heumen, C C M
2016-03-01
A 12-year-old boy had problems with his speech due to a defect in the soft palate. This defect was caused by the surgical removal of a synovial sarcoma. Testing with a nasometer revealed hypernasality above normal values. Given the size and severity of the defect in the soft palate, the possibility of improving the speech with speech therapy was limited. At a centre for special dentistry an attempt was made with a prosthetic construction to improve the performance of the palate and, in that way, the speech. This construction consisted of a denture with an obturator attached to it. With it, an effective closure of the palate could be achieved. New measurements with acoustic nasometry showed scores within the normal values. The nasality in the speech largely disappeared. The obturator is an effective and relatively easy solution for palatal insufficiency resulting from surgical resection. Intrusive reconstructive surgery can be avoided in this way.
How should a speech recognizer work?
Scharenborg, Odette; Norris, Dennis; Bosch, Louis; McQueen, James M
2005-11-12
Although researchers studying human speech recognition (HSR) and automatic speech recognition (ASR) share a common interest in how information processing systems (human or machine) recognize spoken language, there is little communication between the two disciplines. We suggest that this lack of communication follows largely from the fact that research in these related fields has focused on the mechanics of how speech can be recognized. In Marr's (1982) terms, emphasis has been on the algorithmic and implementational levels rather than on the computational level. In this article, we provide a computational-level analysis of the task of speech recognition, which reveals the close parallels between research concerned with HSR and ASR. We illustrate this relation by presenting a new computational model of human spoken-word recognition, built using techniques from the field of ASR that, in contrast to current existing models of HSR, recognizes words from real speech input. 2005 Lawrence Erlbaum Associates, Inc.
Altvater-Mackensen, Nicole; Grossmann, Tobias
2015-01-01
Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential looking paradigm, 44 German 6-month olds' ability to detect mismatches between concurrently presented auditory and visual native vowels was tested. Outcomes were related to mothers' speech style and interactive behavior assessed during free play with their infant, and to infant-specific factors assessed through a questionnaire. Results show that mothers' and infants' social behavior modulated infants' preference for matching audiovisual speech. Moreover, infants' audiovisual speech perception correlated with later vocabulary size, suggesting a lasting effect on language development. © 2014 The Authors. Child Development © 2014 Society for Research in Child Development, Inc.
A characterization of verb use in Turkish agrammatic narrative speech.
Arslan, Seçkin; Bamyacı, Elif; Bastiaanse, Roelien
2016-01-01
This study investigates the characteristics of narrative-speech production and the use of verbs in Turkish agrammatic speakers (n = 10) compared to non-brain-damaged controls (n = 10). To elicit narrative-speech samples, personal interviews and storytelling tasks were conducted. Turkish has a large and regular verb inflection paradigm where verbs are inflected for evidentiality (i.e. direct versus indirect evidence available to the speaker). Particularly, we explored the general characteristics of the speech samples (e.g. utterance length) and the uses of lexical, finite and non-finite verbs and direct and indirect evidentials. The results show that speech rate is slow, verbs per utterance are lower than normal and the verb diversity is reduced in the agrammatic speakers. Verb inflection is relatively intact; however, a trade-off pattern between inflection for direct evidentials and verb diversity is found. The implications of the data are discussed in connection with narrative-speech production studies on other languages.
Electrocorticographic representations of segmental features in continuous speech
Lotte, Fabien; Brumberg, Jonathan S.; Brunner, Peter; Gunduz, Aysegul; Ritaccio, Anthony L.; Guan, Cuntai; Schalk, Gerwin
2015-01-01
Acoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used to inform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface [electrocorticography (ECoG)] to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place of articulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These results provide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates. PMID:25759647
The Impacts of Language Background and Language-Related Disorders in Auditory Processing Assessment
ERIC Educational Resources Information Center
Loo, Jenny Hooi Yin; Bamiou, Doris-Eva; Rosen, Stuart
2013-01-01
Purpose: To examine the impact of language background and language-related disorders (LRDs--dyslexia and/or language impairment) on performance in English speech and nonspeech tests of auditory processing (AP) commonly used in the clinic. Method: A clinical database concerning 133 multilingual children (mostly with English as an additional…
The Impact of Vocal Hyperfunction on Relative Fundamental Frequency during Voicing Offset and Onset
ERIC Educational Resources Information Center
Stepp, Cara E.; Hillman, Robert E.; Heaton, James T.
2010-01-01
Purpose: This study tested the hypothesis that individuals with vocal hyperfunction would show decreases in relative fundamental frequency (RFF) surrounding a voiceless consonant. Method: This retrospective study of 2 clinical databases used speech samples from 15 control participants and women with hyperfunction-related voice disorders: 82 prior…
Li, Junfeng; Yang, Lin; Zhang, Jianping; Yan, Yonghong; Hu, Yi; Akagi, Masato; Loizou, Philipos C
2011-05-01
A large number of single-channel noise-reduction algorithms have been proposed based largely on mathematical principles. Most of these algorithms, however, have been evaluated with English speech. Given the different perceptual cues used by native listeners of different languages including tonal languages, it is of interest to examine whether there are any language effects when the same noise-reduction algorithm is used to process noisy speech in different languages. A comparative evaluation and investigation is taken in this study of various single-channel noise-reduction algorithms applied to noisy speech taken from three languages: Chinese, Japanese, and English. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five single-channel noise-reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluation showed that the majority of noise-reduction algorithms did not improve speech intelligibility. Consistent with a previous study with the English language, the Wiener filtering algorithm produced small, but statistically significant, improvements in intelligibility for car and white noise conditions. Significant differences between the performances of noise-reduction algorithms across the three languages were observed.
Pring, Tim
2016-07-01
The inverse-care law suggests that fewer healthcare resources are available in deprived areas where health needs are greatest. To examine the provision of paediatric speech and language services across London boroughs and to relate provision to the level of deprivation of the boroughs. Information on the employment of paediatric speech and language therapists was obtained from London boroughs by freedom-of-information requests. The relationship between the number of therapists and the index of multiple deprivation for the borough was examined. Twenty-nine of 32 boroughs responded. A positive relationship between provision and need was obtained, suggesting that the inverse-care law does not apply. However, large inequalities of provision were found particularly among the more socially deprived boroughs. In some instances boroughs had five times as many therapists per child as other boroughs. The data reveal that large differences in speech and language therapy provision exist across boroughs. The reasons for these inequalities are unclear, but the lack of comparative information across boroughs is likely to be unhelpful in planning equitable services. The use of freedom of information in assessing health inequalities is stressed and its future availability is desirable. © 2016 Royal College of Speech and Language Therapists.
Voice recognition products-an occupational risk for users with ULDs?
Williams, N R
2003-10-01
Voice recognition systems (VRS) allow speech to be converted both directly into text-which appears on the screen of a computer-and to direct equipment to perform specific functions. Suggested applications are many and varied, including increasing efficiency in the reporting of radiographs, allowing directed surgery and enabling individuals with upper limb disorders (ULDs) who cannot use other input devices, such as keyboards and mice, to carry out word processing and other activities. Aim This paper describes four cases of vocal dysfunction related to the use of such software, which have been identified from the database of the Voice and Speech Laboratory of the Massachusetts Eye and Ear infirmary (MEEI). The database was searched using key words 'voice recognition' and four cases were identified from a total of 4800. In all cases, the VRS was supplied to assist individuals with ULDs who could not use conventional input devices. Case reports illustrate time of onset and symptoms experienced. The cases illustrate the need for risk assessment and consideration of the ergonomic aspects of voice use prior to such adaptations being used, particularly in those who already experience work-related ULDs.
Content-based TV sports video retrieval using multimodal analysis
NASA Astrophysics Data System (ADS)
Yu, Yiqing; Liu, Huayong; Wang, Hongbin; Zhou, Dongru
2003-09-01
In this paper, we propose content-based video retrieval, which is a kind of retrieval by its semantical contents. Because video data is composed of multimodal information streams such as video, auditory and textual streams, we describe a strategy of using multimodal analysis for automatic parsing sports video. The paper first defines the basic structure of sports video database system, and then introduces a new approach that integrates visual stream analysis, speech recognition, speech signal processing and text extraction to realize video retrieval. The experimental results for TV sports video of football games indicate that the multimodal analysis is effective for video retrieval by quickly browsing tree-like video clips or inputting keywords within predefined domain.
2013-03-20
Wakefield of the University of Michigan as Co-PI. This extended activity produced a large number of products and accomplishments; however, this report...speech communication will be expanded to provide a robust modeling and prediction capability for tasks involving speech production and speech and non...preparations made to move to the newer Cocoa API instead of the previous Carbon API. In the following sections, an extended treatment will be
Non-right handed primary progressive apraxia of speech.
Botha, Hugo; Duffy, Joseph R; Whitwell, Jennifer L; Strand, Edythe A; Machulda, Mary M; Spychalla, Anthony J; Tosakulwong, Nirubol; Senjem, Matthew L; Knopman, David S; Petersen, Ronald C; Jack, Clifford R; Lowe, Val J; Josephs, Keith A
2018-07-15
In recent years a large and growing body of research has greatly advanced our understanding of primary progressive apraxia of speech. Handedness has emerged as one potential marker of selective vulnerability in degenerative diseases. This study evaluated the clinical and imaging findings in non-right handed compared to right handed participants in a prospective cohort diagnosed with primary progressive apraxia of speech. A total of 30 participants were included. Compared to the expected rate in the population, there was a higher prevalence of non-right handedness among those with primary progressive apraxia of speech (6/30, 20%). Small group numbers meant that these results did not reach statistical significance, although the effect sizes were moderate-to-large. There were no clinical differences between right handed and non-right handed participants. Bilateral hypometabolism was seen in primary progressive apraxia of speech compared to controls, with non-right handed participants showing more right hemispheric involvement. This is the first report of a higher rate of non-right handedness in participants with isolated apraxia of speech, which may point to an increased vulnerability for developing this disorder among non-right handed participants. This challenges prior hypotheses about a relative protective effect of non-right handedness for tau-related neurodegeneration. We discuss potential avenues for future research to investigate the relationship between handedness and motor disorders more generally. Copyright © 2018 Elsevier B.V. All rights reserved.
Large Vocabulary Audio-Visual Speech Recognition
2002-06-12
www.is.cs.cmu.edu Email: waibel(a)cs.cmu~edu Inttractive Systenms Labs ttoctis Ssstms Labs Meeting Browser - -- Interpreting Human Communication "Why did...Speech Interacti Stams Labs t-cive Systms Focus of Attention Tracking Conclusion - Complete Model of Human Communication is Needed - Include all
Differentiating primary progressive aphasias in a brief sample of connected speech
Evans, Emily; O'Shea, Jessica; Powers, John; Boller, Ashley; Weinberg, Danielle; Haley, Jenna; McMillan, Corey; Irwin, David J.; Rascovsky, Katya; Grossman, Murray
2013-01-01
Objective: A brief speech expression protocol that can be administered and scored without special training would aid in the differential diagnosis of the 3 principal forms of primary progressive aphasia (PPA): nonfluent/agrammatic PPA, logopenic variant PPA, and semantic variant PPA. Methods: We used a picture-description task to elicit a short speech sample, and we evaluated impairments in speech-sound production, speech rate, lexical retrieval, and grammaticality. We compared the results with those obtained by a longer, previously validated protocol and further validated performance with multimodal imaging to assess the neuroanatomical basis of the deficits. Results: We found different patterns of impaired grammar in each PPA variant, and additional language production features were impaired in each: nonfluent/agrammatic PPA was characterized by speech-sound errors; logopenic variant PPA by dysfluencies (false starts and hesitations); and semantic variant PPA by poor retrieval of nouns. Strong correlations were found between this brief speech sample and a lengthier narrative speech sample. A composite measure of grammaticality and other measures of speech production were correlated with distinct regions of gray matter atrophy and reduced white matter fractional anisotropy in each PPA variant. Conclusions: These findings provide evidence that large-scale networks are required for fluent, grammatical expression; that these networks can be selectively disrupted in PPA syndromes; and that quantitative analysis of a brief speech sample can reveal the corresponding distinct speech characteristics. PMID:23794681
Role of Visual Speech in Phonological Processing by Children With Hearing Loss
Jerger, Susan; Tye-Murray, Nancy; Abdi, Hervé
2011-01-01
Purpose This research assessed the influence of visual speech on phonological processing by children with hearing loss (HL). Method Children with HL and children with normal hearing (NH) named pictures while attempting to ignore auditory or audiovisual speech distractors whose onsets relative to the pictures were either congruent, conflicting in place of articulation, or conflicting in voicing—for example, the picture “pizza” coupled with the distractors “peach,” “teacher,” or “beast,” respectively. Speed of picture naming was measured. Results The conflicting conditions slowed naming, and phonological processing by children with HL displayed the age-related shift in sensitivity to visual speech seen in children with NH, although with developmental delay. Younger children with HL exhibited a disproportionately large influence of visual speech and a negligible influence of auditory speech, whereas older children with HL showed a robust influence of auditory speech with no benefit to performance from adding visual speech. The congruent conditions did not speed naming in children with HL, nor did the addition of visual speech influence performance. Unexpectedly, the /∧/-vowel congruent distractors slowed naming in children with HL and decreased articulatory proficiency. Conclusions Results for the conflicting conditions are consistent with the hypothesis that speech representations in children with HL (a) are initially disproportionally structured in terms of visual speech and (b) become better specified with age in terms of auditorily encoded information. PMID:19339701
Hung, Tai-Hsin; Chen, Vincent Chin-Hung; Yang, Yao-Hsu; Tsai, Ching-Shu; Lu, Mong-Liang; McIntyre, Roger S; Lee, Yena; Huang, Kuo-You
2018-06-01
Delay and impairment in Speech and language are common developmental problems in younger populations. Hitherto, there has been minimal study of the association between common childhood infections (e.g. enterovirus [EV]) and speech and language. The impetus for evaluating this association is provided by evidence linking inflammation to neurodevelopmental disorders. Herein we sought to determine whether an association exists between EV infection and subsequent diagnoses of speech and language impairments in a nationwide population-based sample in Taiwan. Our study acquired data from the Taiwan National Health Insurance Research Database. The sample was comprised of individuals under 18 years of age with newly diagnosed EV infection during the period from January 1998 to December 2011. 39669 eligible cases were compared to matched controls and assessed during the study period for incident cases of speech and language impairments. Cox regression analyses were applied, adjusting for sex, age and other physical and mental problems. In the fully adjusted Cox regression model for hazard ratios, EV infection as positively associated with speech and language impairments (HR = 1.14, 95% CI: 1.06-1.22) after adjusting for age, sex and other confounds. Compared to the control group, the hazard ratio for speech and language impairments was 1.12 (95% CI: 1.03-1.21) amongst the group of EV infection without hospitalization, and 1.26 (95% CI: 1.10-1.45) amongst the group of EV infection with hospitalization. EV infection is temporally associated with incident speech and language impairments. Our findings herein provide rationale for educating families that EV infection may be associated with subsequent speech and language problems in susceptible individuals and that monitoring for such a presentation would be warranted. WHAT THIS PAPER ADDS?: Speech and language impairments associated with central nervous system infections have been reported in the literature. EV are medically important human pathogens and associated with select neuropsychiatric diseases. Notwithstanding, relatively few reports have mentioned the effects of EV infection on speech and language problems. Our study used a nationwide longitudinal dataset and identified that children with EV infection have a greater risk for speech and language impairments as compared with control group. Infected children combined other comorbidities or risk factors might have greater possibility to develop speech problems. Clinicians should be vigilant for the onset of language developmental abnormalities of preschool children with EV infection. Copyright © 2018 Elsevier Ltd. All rights reserved.
Mistaking minds and machines: How speech affects dehumanization and anthropomorphism.
Schroeder, Juliana; Epley, Nicholas
2016-11-01
Treating a human mind like a machine is an essential component of dehumanization, whereas attributing a humanlike mind to a machine is an essential component of anthropomorphism. Here we tested how a cue closely connected to a person's actual mental experience-a humanlike voice-affects the likelihood of mistaking a person for a machine, or a machine for a person. We predicted that paralinguistic cues in speech are particularly likely to convey the presence of a humanlike mind, such that removing voice from communication (leaving only text) would increase the likelihood of mistaking the text's creator for a machine. Conversely, adding voice to a computer-generated script (resulting in speech) would increase the likelihood of mistaking the text's creator for a human. Four experiments confirmed these hypotheses, demonstrating that people are more likely to infer a human (vs. computer) creator when they hear a voice expressing thoughts than when they read the same thoughts in text. Adding human visual cues to text (i.e., seeing a person perform a script in a subtitled video clip), did not increase the likelihood of inferring a human creator compared with only reading text, suggesting that defining features of personhood may be conveyed more clearly in speech (Experiments 1 and 2). Removing the naturalistic paralinguistic cues that convey humanlike capacity for thinking and feeling, such as varied pace and intonation, eliminates the humanizing effect of speech (Experiment 4). We discuss implications for dehumanizing others through text-based media, and for anthropomorphizing machines through speech-based media. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Unilateral Vocal Fold Paralysis: A Systematic Review of Speech-Language Pathology Management.
Walton, Chloe; Conway, Erin; Blackshaw, Helen; Carding, Paul
2017-07-01
Dysphonia due to unilateral vocal fold paralysis (UVFP) can be characterized by hoarseness and weakness, resulting in a significant impact on patients' activity and participation. Voice therapy provided by a speech-language pathologist is designed to maximize vocal function and improve quality of life. The purpose of this paper is to systematically review literature surrounding the effectiveness of speech-language pathology intervention for the management of UVFP in adults. This is a systematic review. Electronic databases were searched using a range of key terms including dysphonia, vocal fold paralysis, and speech-language pathology. Eligible articles were extracted and reviewed by the authors for risk of bias, methodology, treatment efficacy, and clinical outcomes. Of the 3311 articles identified, 12 met the inclusion criteria: seven case series and five comparative studies. All 12 studies subjectively reported positive effects following the implementation of voice therapy for UVFP; however, the heterogeneity of participant characteristics, voice therapy, and voice outcome resulted in a low level of evidence. There is presently a lack of methodological rigor and clinical efficacy in the speech-language pathology management of dysphonia arising from UVFP in adults. Reasons for this reduced efficacy can be attributed to the following: (1) no standardized speech-language pathology intervention; (2) no consistency of assessment battery; (3) the variable etiology and clinical presentation of UVFP; and (4) inconsistent timing, frequency, and intensity of treatment. Further research is required to develop the evidence for the management of UVFP incorporating controlled treatment protocols and more rigorous clinical methodology. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Revisiting the "enigma" of musicians with dyslexia: Auditory sequencing and speech abilities.
Zuk, Jennifer; Bishop-Liebler, Paula; Ozernov-Palchik, Ola; Moore, Emma; Overy, Katie; Welch, Graham; Gaab, Nadine
2017-04-01
Previous research has suggested a link between musical training and auditory processing skills. Musicians have shown enhanced perception of auditory features critical to both music and speech, suggesting that this link extends beyond basic auditory processing. It remains unclear to what extent musicians who also have dyslexia show these specialized abilities, considering often-observed persistent deficits that coincide with reading impairments. The present study evaluated auditory sequencing and speech discrimination in 52 adults comprised of musicians with dyslexia, nonmusicians with dyslexia, and typical musicians. An auditory sequencing task measuring perceptual acuity for tone sequences of increasing length was administered. Furthermore, subjects were asked to discriminate synthesized syllable continua varying in acoustic components of speech necessary for intraphonemic discrimination, which included spectral (formant frequency) and temporal (voice onset time [VOT] and amplitude envelope) features. Results indicate that musicians with dyslexia did not significantly differ from typical musicians and performed better than nonmusicians with dyslexia for auditory sequencing as well as discrimination of spectral and VOT cues within syllable continua. However, typical musicians demonstrated superior performance relative to both groups with dyslexia for discrimination of syllables varying in amplitude information. These findings suggest a distinct profile of speech processing abilities in musicians with dyslexia, with specific weaknesses in discerning amplitude cues within speech. Because these difficulties seem to remain persistent in adults with dyslexia despite musical training, this study only partly supports the potential for musical training to enhance the auditory processing skills known to be crucial for literacy in individuals with dyslexia. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Brumberg, Jonathan S; Krusienski, Dean J; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L; Schalk, Gerwin
2016-01-01
How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain.
Brumberg, Jonathan S.; Krusienski, Dean J.; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L.; Schalk, Gerwin
2016-01-01
How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain. PMID:27875590
Vogel, Adam P; Shirbin, Christopher; Churchyard, Andrew J; Stout, Julie C
2012-12-01
Speech disturbances (e.g., altered prosody) have been described in symptomatic Huntington's Disease (HD) individuals, however, the extent to which speech changes in gene positive pre-manifest (PreHD) individuals is largely unknown. The speech of individuals carrying the mutant HTT gene is a behavioural/motor/cognitive marker demonstrating some potential as an objective indicator of early HD onset and disease progression. Speech samples were acquired from 30 individuals carrying the mutant HTT gene (13 PreHD, 17 early stage HD) and 15 matched controls. Participants read a passage, produced a monologue and said the days of the week. Data were analysed acoustically for measures of timing, frequency and intensity. There was a clear effect of group across most acoustic measures, so that speech performance differed in-line with disease progression. Comparisons across groups revealed significant differences between the control and the early stage HD group on measures of timing (e.g., speech rate). Participants carrying the mutant HTT gene presented with slower rates of speech, took longer to say words and produced greater silences between and within words compared to healthy controls. Importantly, speech rate showed a significant correlation to burden of disease scores. The speech of early stage HD differed significantly from controls. The speech of PreHD, although not reaching significance, tended to lie between the performance of controls and early stage HD. This suggests that changes in speech production appear to be developing prior to diagnosis. Copyright © 2012 Elsevier Ltd. All rights reserved.
Smith, Sandra Nelson; Lucas, Laura
2016-01-01
Objectives: A systematic review of the literature and meta-analysis was conducted to assess the nature and quality of the evidence for the use of hearing instruments in adults with a unilateral severe to profound sensorineural hearing loss. Design: The PubMed, EMBASE, MEDLINE, Cochrane, CINAHL, and DARE databases were searched with no restrictions on language. The search included articles from the start of each database until February 11, 2015. Studies were included that (a) assessed the impact of any form of hearing instrument, including devices that reroute signals between the ears or restore aspects of hearing to a deaf ear, in adults with a sensorineural severe to profound loss in one ear and normal or near-normal hearing in the other ear; (b) compared different devices or compared a device with placebo or the unaided condition; (c) measured outcomes in terms of speech perception, spatial listening, or quality of life; (d) were prospective controlled or observational studies. Studies that met prospectively defined criteria were subjected to random effects meta-analyses. Results: Twenty-seven studies reported in 30 articles were included. The evidence was graded as low-to-moderate quality having been obtained primarily from observational before-after comparisons. The meta-analysis identified statistically significant benefits to speech perception in noise for devices that rerouted the speech signals of interest from the worse ear to the better ear using either air or bone conduction (mean benefit, 2.5 dB). However, these devices also degraded speech understanding significantly and to a similar extent (mean deficit, 3.1 dB) when noise was rerouted to the better ear. Data on the effects of cochlear implantation on speech perception could not be pooled as the prospectively defined criteria for meta-analysis were not met. Inconsistency in the assessment of outcomes relating to sound localization also precluded the synthesis of evidence across studies. Evidence for the relative efficacy of different devices was sparse but a statistically significant advantage was observed for rerouting speech signals using abutment-mounted bone conduction devices when compared with outcomes after preoperative trials of air conduction devices when speech and noise were colocated (mean benefit, 1.5 dB). Patients reported significant improvements in hearing-related quality of life with both rerouting devices and following cochlear implantation. Only two studies measured health-related quality of life and findings were inconclusive. Conclusions: Devices that reroute sounds from an ear with a severe to profound hearing loss to an ear with minimal hearing loss may improve speech perception in noise when signals of interest are located toward the impaired ear. However, the same device may also degrade speech perception as all signals are rerouted indiscriminately, including noise. Although the restoration of functional hearing in both ears through cochlear implantation could be expected to provide benefits to speech perception, the inability to synthesize evidence across existing studies means that such a conclusion cannot yet be made. For the same reason, it remains unclear whether cochlear implantation can improve the ability to localize sounds despite restoring bilateral input. Prospective controlled studies that measure outcomes consistently and control for selection and observation biases are required to improve the quality of the evidence for the provision of hearing instruments to patients with unilateral deafness and to support any future recommendations for the clinical management of these patients. PMID:27232073
Free Speech Advocates at Berkeley.
ERIC Educational Resources Information Center
Watts, William A.; Whittaker, David
1966-01-01
This study compares highly committed members of the Free Speech Movement (FSM) at Berkeley with the student population at large on 3 sociopsychological foci: general biographical data, religious orientation, and rigidity-flexibility. Questionnaires were administered to 172 FSM members selected by chance from the 10 to 1200 who entered and…
Speech and Speech-Related Quality of Life After Late Palate Repair: A Patient's Perspective.
Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex
2015-07-01
Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P < 0.01) and that nasal regurgitation became worse (P < 0.01) for some patients after surgery. A total of 78% of the patients were still satisfied with the surgery and all of the patients reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.
Erb, Julia; Ludwig, Alexandra Annemarie; Kunke, Dunja; Fuchs, Michael; Obleser, Jonas
2018-04-24
Psychoacoustic tests assessed shortly after cochlear implantation are useful predictors of the rehabilitative speech outcome. While largely independent, both spectral and temporal resolution tests are important to provide an accurate prediction of speech recognition. However, rapid tests of temporal sensitivity are currently lacking. Here, we propose a simple amplitude modulation rate discrimination (AMRD) paradigm that is validated by predicting future speech recognition in adult cochlear implant (CI) patients. In 34 newly implanted patients, we used an adaptive AMRD paradigm, where broadband noise was modulated at the speech-relevant rate of ~4 Hz. In a longitudinal study, speech recognition in quiet was assessed using the closed-set Freiburger number test shortly after cochlear implantation (t0) as well as the open-set Freiburger monosyllabic word test 6 months later (t6). Both AMRD thresholds at t0 (r = -0.51) and speech recognition scores at t0 (r = 0.56) predicted speech recognition scores at t6. However, AMRD and speech recognition at t0 were uncorrelated, suggesting that those measures capture partially distinct perceptual abilities. A multiple regression model predicting 6-month speech recognition outcome with deafness duration and speech recognition at t0 improved from adjusted R = 0.30 to adjusted R = 0.44 when AMRD threshold was added as a predictor. These findings identify AMRD thresholds as a reliable, nonredundant predictor above and beyond established speech tests for CI outcome. This AMRD test could potentially be developed into a rapid clinical temporal-resolution test to be integrated into the postoperative test battery to improve the reliability of speech outcome prognosis.
Filling the Information Void: Adapting the Information Operation (IO) Message in Post-Hostility Iraq
2005-05-26
more elaborate, with Pizza Huts and Burger Kings, and so large that one, called Anaconda, has nine bus routes to move the troops around inside the...circulation of 50,000 weekly.71 Based on American principles of freedom of speech and the value of debate, one may conclude that the proliferation of news...democratic values . According to the World Press Freedom Committee, “as always, the proper answer to bad speech is more speech, not the silencing of
Park, Hyojin; Ince, Robin A A; Schyns, Philippe G; Thut, Gregor; Gross, Joachim
2015-06-15
Humans show a remarkable ability to understand continuous speech even under adverse listening conditions. This ability critically relies on dynamically updated predictions of incoming sensory information, but exactly how top-down predictions improve speech processing is still unclear. Brain oscillations are a likely mechanism for these top-down predictions [1, 2]. Quasi-rhythmic components in speech are known to entrain low-frequency oscillations in auditory areas [3, 4], and this entrainment increases with intelligibility [5]. We hypothesize that top-down signals from frontal brain areas causally modulate the phase of brain oscillations in auditory cortex. We use magnetoencephalography (MEG) to monitor brain oscillations in 22 participants during continuous speech perception. We characterize prominent spectral components of speech-brain coupling in auditory cortex and use causal connectivity analysis (transfer entropy) to identify the top-down signals driving this coupling more strongly during intelligible speech than during unintelligible speech. We report three main findings. First, frontal and motor cortices significantly modulate the phase of speech-coupled low-frequency oscillations in auditory cortex, and this effect depends on intelligibility of speech. Second, top-down signals are significantly stronger for left auditory cortex than for right auditory cortex. Third, speech-auditory cortex coupling is enhanced as a function of stronger top-down signals. Together, our results suggest that low-frequency brain oscillations play a role in implementing predictive top-down control during continuous speech perception and that top-down control is largely directed at left auditory cortex. This suggests a close relationship between (left-lateralized) speech production areas and the implementation of top-down control in continuous speech perception. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Park, Hyojin; Ince, Robin A.A.; Schyns, Philippe G.; Thut, Gregor; Gross, Joachim
2015-01-01
Summary Humans show a remarkable ability to understand continuous speech even under adverse listening conditions. This ability critically relies on dynamically updated predictions of incoming sensory information, but exactly how top-down predictions improve speech processing is still unclear. Brain oscillations are a likely mechanism for these top-down predictions [1, 2]. Quasi-rhythmic components in speech are known to entrain low-frequency oscillations in auditory areas [3, 4], and this entrainment increases with intelligibility [5]. We hypothesize that top-down signals from frontal brain areas causally modulate the phase of brain oscillations in auditory cortex. We use magnetoencephalography (MEG) to monitor brain oscillations in 22 participants during continuous speech perception. We characterize prominent spectral components of speech-brain coupling in auditory cortex and use causal connectivity analysis (transfer entropy) to identify the top-down signals driving this coupling more strongly during intelligible speech than during unintelligible speech. We report three main findings. First, frontal and motor cortices significantly modulate the phase of speech-coupled low-frequency oscillations in auditory cortex, and this effect depends on intelligibility of speech. Second, top-down signals are significantly stronger for left auditory cortex than for right auditory cortex. Third, speech-auditory cortex coupling is enhanced as a function of stronger top-down signals. Together, our results suggest that low-frequency brain oscillations play a role in implementing predictive top-down control during continuous speech perception and that top-down control is largely directed at left auditory cortex. This suggests a close relationship between (left-lateralized) speech production areas and the implementation of top-down control in continuous speech perception. PMID:26028433
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore » decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.« less
Automated Cough Assessment on a Mobile Platform
2014-01-01
The development of an Automated System for Asthma Monitoring (ADAM) is described. This consists of a consumer electronics mobile platform running a custom application. The application acquires an audio signal from an external user-worn microphone connected to the device analog-to-digital converter (microphone input). This signal is processed to determine the presence or absence of cough sounds. Symptom tallies and raw audio waveforms are recorded and made easily accessible for later review by a healthcare provider. The symptom detection algorithm is based upon standard speech recognition and machine learning paradigms and consists of an audio feature extraction step followed by a Hidden Markov Model based Viterbi decoder that has been trained on a large database of audio examples from a variety of subjects. Multiple Hidden Markov Model topologies and orders are studied. Performance of the recognizer is presented in terms of the sensitivity and the rate of false alarm as determined in a cross-validation test. PMID:25506590
Speech Comprehension Difficulties in Chronic Tinnitus and Its Relation to Hyperacusis
Vielsmeier, Veronika; Kreuzer, Peter M.; Haubner, Frank; Steffens, Thomas; Semmler, Philipp R. O.; Kleinjung, Tobias; Schlee, Winfried; Langguth, Berthold; Schecklmann, Martin
2016-01-01
Objective: Many tinnitus patients complain about difficulties regarding speech comprehension. In spite of the high clinical relevance little is known about underlying mechanisms and predisposing factors. Here, we performed an exploratory investigation in a large sample of tinnitus patients to (1) estimate the prevalence of speech comprehension difficulties among tinnitus patients, to (2) compare subjective reports of speech comprehension difficulties with behavioral measurements in a standardized speech comprehension test and to (3) explore underlying mechanisms by analyzing the relationship between speech comprehension difficulties and peripheral hearing function (pure tone audiogram), as well as with co-morbid hyperacusis as a central auditory processing disorder. Subjects and Methods: Speech comprehension was assessed in 361 tinnitus patients presenting between 07/2012 and 08/2014 at the Interdisciplinary Tinnitus Clinic at the University of Regensburg. The assessment included standard audiological assessments (pure tone audiometry, tinnitus pitch, and loudness matching), the Goettingen sentence test (in quiet) for speech audiometric evaluation, two questions about hyperacusis, and two questions about speech comprehension in quiet and noisy environments (“How would you rate your ability to understand speech?”; “How would you rate your ability to follow a conversation when multiple people are speaking simultaneously?”). Results: Subjectively-reported speech comprehension deficits are frequent among tinnitus patients, especially in noisy environments (cocktail party situation). 74.2% of all investigated patients showed disturbed speech comprehension (indicated by values above 21.5 dB SPL in the Goettingen sentence test). Subjective speech comprehension complaints (both for general and in noisy environment) were correlated with hearing level and with audiologically-assessed speech comprehension ability. In contrast, co-morbid hyperacusis was only correlated with speech comprehension difficulties in noisy environments, but not with speech comprehension difficulties in general. Conclusion: Speech comprehension deficits are frequent among tinnitus patients. Whereas speech comprehension deficits in quiet environments are primarily due to peripheral hearing loss, speech comprehension deficits in noisy environments are related to both peripheral hearing loss and dysfunctional central auditory processing. Disturbed speech comprehension in noisy environments might be modulated by a central inhibitory deficit. In addition, attentional and cognitive aspects may play a role. PMID:28018209
Speech Comprehension Difficulties in Chronic Tinnitus and Its Relation to Hyperacusis.
Vielsmeier, Veronika; Kreuzer, Peter M; Haubner, Frank; Steffens, Thomas; Semmler, Philipp R O; Kleinjung, Tobias; Schlee, Winfried; Langguth, Berthold; Schecklmann, Martin
2016-01-01
Objective: Many tinnitus patients complain about difficulties regarding speech comprehension. In spite of the high clinical relevance little is known about underlying mechanisms and predisposing factors. Here, we performed an exploratory investigation in a large sample of tinnitus patients to (1) estimate the prevalence of speech comprehension difficulties among tinnitus patients, to (2) compare subjective reports of speech comprehension difficulties with behavioral measurements in a standardized speech comprehension test and to (3) explore underlying mechanisms by analyzing the relationship between speech comprehension difficulties and peripheral hearing function (pure tone audiogram), as well as with co-morbid hyperacusis as a central auditory processing disorder. Subjects and Methods: Speech comprehension was assessed in 361 tinnitus patients presenting between 07/2012 and 08/2014 at the Interdisciplinary Tinnitus Clinic at the University of Regensburg. The assessment included standard audiological assessments (pure tone audiometry, tinnitus pitch, and loudness matching), the Goettingen sentence test (in quiet) for speech audiometric evaluation, two questions about hyperacusis, and two questions about speech comprehension in quiet and noisy environments ("How would you rate your ability to understand speech?"; "How would you rate your ability to follow a conversation when multiple people are speaking simultaneously?"). Results: Subjectively-reported speech comprehension deficits are frequent among tinnitus patients, especially in noisy environments (cocktail party situation). 74.2% of all investigated patients showed disturbed speech comprehension (indicated by values above 21.5 dB SPL in the Goettingen sentence test). Subjective speech comprehension complaints (both for general and in noisy environment) were correlated with hearing level and with audiologically-assessed speech comprehension ability. In contrast, co-morbid hyperacusis was only correlated with speech comprehension difficulties in noisy environments, but not with speech comprehension difficulties in general. Conclusion: Speech comprehension deficits are frequent among tinnitus patients. Whereas speech comprehension deficits in quiet environments are primarily due to peripheral hearing loss, speech comprehension deficits in noisy environments are related to both peripheral hearing loss and dysfunctional central auditory processing. Disturbed speech comprehension in noisy environments might be modulated by a central inhibitory deficit. In addition, attentional and cognitive aspects may play a role.
The Small College Administrative Environment.
ERIC Educational Resources Information Center
Buzza, Bonnie Wilson
Environmental differences for speech departments at large and small colleges are not simply of scale; there are qualitative as well as quantitative differences. At small colleges, faculty are hired as teachers, rather than as researchers. Because speech teachers at small colleges must be generalists, and because it is often difficult to replace…
Recognizing Speech under a Processing Load: Dissociating Energetic from Informational Factors
ERIC Educational Resources Information Center
Mattys, Sven L.; Brooks, Joanna; Cooke, Martin
2009-01-01
Effects of perceptual and cognitive loads on spoken-word recognition have so far largely escaped investigation. This study lays the foundations of a psycholinguistic approach to speech recognition in adverse conditions that draws upon the distinction between energetic masking, i.e., listening environments leading to signal degradation, and…
ERIC Educational Resources Information Center
Hamblin, DeAnna; Bartlett, Marilyn J.
2013-01-01
The authors note that when it comes to balancing free speech and schools' responsibilities, the online world is largely uncharted waters. Questions remain about the rights of both students and teachers in the world of social media. Although the lower courts have ruled that students' freedom of speech rights offer them some protection for…
Fifty years of progress in speech and speaker recognition
NASA Astrophysics Data System (ADS)
Furui, Sadaoki
2004-10-01
Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.
Havstam, Christina; Sandberg, Annika Dahlgren; Lohmander, Anette
2011-04-01
Many children born with cleft palate have impaired speech during their pre-school years, but usually the speech difficulties are transient and resolved by later childhood. This study investigated communication attitude with the Swedish version of the Communication Attitude Test (CAT-S) in 54 10-year-olds with cleft (lip and) palate. In addition, environmental factors were assessed via parent questionnaire. These data were compared to speech assessments by experienced listeners, who rated the children's velopharyngeal function, articulation, intelligibility, and general impression of speech at ages 5, 7, and 10 years. The children with clefts scored significantly higher on the CAT-S compared to reference data, indicating a more negative communication attitude on group level but with large individual variation. All speech variables, except velopharyngeal function at earlier ages, as well as the parent questionnaire scores, correlated significantly with the CAT-S scores. Although there was a relationship between speech and communication attitude, not all children with impaired speech developed negative communication attitudes. The assessment of communication attitude can make an important contribution to our understanding of the communicative situation for children with cleft (lip and) palate and give important indications for intervention.
Civier, Oren; Tasko, Stephen M.; Guenther, Frank H.
2010-01-01
This paper investigates the hypothesis that stuttering may result in part from impaired readout of feedforward control of speech, which forces persons who stutter (PWS) to produce speech with a motor strategy that is weighted too much toward auditory feedback control. Over-reliance on feedback control leads to production errors which, if they grow large enough, can cause the motor system to “reset” and repeat the current syllable. This hypothesis is investigated using computer simulations of a “neurally impaired” version of the DIVA model, a neural network model of speech acquisition and production. The model’s outputs are compared to published acoustic data from PWS’ fluent speech, and to combined acoustic and articulatory movement data collected from the dysfluent speech of one PWS. The simulations mimic the errors observed in the PWS subject’s speech, as well as the repairs of these errors. Additional simulations were able to account for enhancements of fluency gained by slowed/prolonged speech and masking noise. Together these results support the hypothesis that many dysfluencies in stuttering are due to a bias away from feedforward control and toward feedback control. PMID:20831971
Yoon, Yang-soo; Li, Yongxin; Kang, Hou-Yong; Fu, Qian-Jie
2011-01-01
Objective The full benefit of bilateral cochlear implants may depend on the unilateral performance with each device, the speech materials, processing ability of the user, and/or the listening environment. In this study, bilateral and unilateral speech performances were evaluated in terms of recognition of phonemes and sentences presented in quiet or in noise. Design Speech recognition was measured for unilateral left, unilateral right, and bilateral listening conditions; speech and noise were presented at 0° azimuth. The “binaural benefit” was defined as the difference between bilateral performance and unilateral performance with the better ear. Study Sample 9 adults with bilateral cochlear implants participated. Results On average, results showed a greater binaural benefit in noise than in quiet for all speech tests. More importantly, the binaural benefit was greater when unilateral performance was similar across ears. As the difference in unilateral performance between ears increased, the binaural advantage decreased; this functional relationship was observed across the different speech materials and noise levels even though there was substantial intra- and inter-subject variability. Conclusions The results indicate that subjects who show symmetry in speech recognition performance between implanted ears in general show a large binaural benefit. PMID:21696329
Neural systems for speech and song in autism
Pantazatos, Spiro P.; Schneider, Harry
2012-01-01
Despite language disabilities in autism, music abilities are frequently preserved. Paradoxically, brain regions associated with these functions typically overlap, enabling investigation of neural organization supporting speech and song in autism. Neural systems sensitive to speech and song were compared in low-functioning autistic and age-matched control children using passive auditory stimulation during functional magnetic resonance and diffusion tensor imaging. Activation in left inferior frontal gyrus was reduced in autistic children relative to controls during speech stimulation, but was greater than controls during song stimulation. Functional connectivity for song relative to speech was also increased between left inferior frontal gyrus and superior temporal gyrus in autism, and large-scale connectivity showed increased frontal–posterior connections. Although fractional anisotropy of the left arcuate fasciculus was decreased in autistic children relative to controls, structural terminations of the arcuate fasciculus in inferior frontal gyrus were indistinguishable between autistic and control groups. Fractional anisotropy correlated with activity in left inferior frontal gyrus for both speech and song conditions. Together, these findings indicate that in autism, functional systems that process speech and song were more effectively engaged for song than for speech and projections of structural pathways associated with these functions were not distinguishable from controls. PMID:22298195
Development of a Low-Cost, Noninvasive, Portable Visual Speech Recognition Program.
Kohlberg, Gavriel D; Gal, Ya'akov Kobi; Lalwani, Anil K
2016-09-01
Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low-cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech. A Microsoft Kinect-based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements. The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16). We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients. © The Author(s) 2016.
Effect of technological advances on cochlear implant performance in adults.
Lenarz, Minoo; Joseph, Gert; Sönmez, Hasibe; Büchner, Andreas; Lenarz, Thomas
2011-12-01
To evaluate the effect of technological advances in the past 20 years on the hearing performance of a large cohort of adult cochlear implant (CI) patients. Individual, retrospective, cohort study. According to technological developments in electrode design and speech-processing strategies, we defined five virtual intervals on the time scale between 1984 and 2008. A cohort of 1,005 postlingually deafened adults was selected for this study, and their hearing performance with a CI was evaluated retrospectively according to these five technological intervals. The test battery was composed of four standard German speech tests: Freiburger monosyllabic test, speech tracking test, Hochmair-Schulz-Moser (HSM) sentence test in quiet, and HSM sentence test in 10 dB noise. The direct comparison of the speech perception in postlingually deafened adults, who were implanted during different technological periods, reveals an obvious improvement in the speech perception in patients who benefited from the recent electrode designs and speech-processing strategies. The major influence of technological advances on CI performance seems to be on speech perception in noise. Better speech perception in noisy surroundings is strong proof for demonstrating the success rate of new electrode designs and speech-processing strategies. Standard (internationally comparable) speech tests in noise should become an obligatory part of the postoperative test battery for adult CI patients. Copyright © 2011 The American Laryngological, Rhinological, and Otological Society, Inc.
Varley, Rosemary; Cowell, Patricia E; Dyson, Lucy; Inglis, Lesley; Roper, Abigail; Whiteside, Sandra P
2016-03-01
There is currently little evidence on effective interventions for poststroke apraxia of speech. We report outcomes of a trial of self-administered computer therapy for apraxia of speech. Effects of speech intervention on naming and repetition of treated and untreated words were compared with those of a visuospatial sham program. The study used a parallel-group, 2-period, crossover design, with participants receiving 2 interventions. Fifty participants with chronic and stable apraxia of speech were randomly allocated to 1 of 2 order conditions: speech-first condition versus sham-first condition. Period 1 design was equivalent to a randomized controlled trial. We report results for this period and profile the effect of the period 2 crossover. Period 1 results revealed significant improvement in naming and repetition only in the speech-first group. The sham-first group displayed improvement in speech production after speech intervention in period 2. Significant improvement of treated words was found in both naming and repetition, with little generalization to structurally similar and dissimilar untreated words. Speech gains were largely maintained after withdrawal of intervention. There was a significant relationship between treatment dose and response. However, average self-administered dose was modest for both groups. Future software design would benefit from incorporation of social and gaming components to boost motivation. Single-word production can be improved in chronic apraxia of speech with behavioral intervention. Self-administered computerized therapy is a promising method for delivering high-intensity speech/language rehabilitation. URL: http://orcid.org/0000-0002-1278-0601. Unique identifier: ISRCTN88245643. © 2016 American Heart Association, Inc.
Lau, Johnny King L; Humphreys, Glyn W; Douis, Hassan; Balani, Alex; Bickerton, Wai-Ling; Rotshtein, Pia
2015-01-01
We report a lesion-symptom mapping analysis of visual speech production deficits in a large group (280) of stroke patients at the sub-acute stage (<120 days post-stroke). Performance on object naming was evaluated alongside three other tests of visual speech production, namely sentence production to a picture, sentence reading and nonword reading. A principal component analysis was performed on all these tests' scores and revealed a 'shared' component that loaded across all the visual speech production tasks and a 'unique' component that isolated object naming from the other three tasks. Regions for the shared component were observed in the left fronto-temporal cortices, fusiform gyrus and bilateral visual cortices. Lesions in these regions linked to both poor object naming and impairment in general visual-speech production. On the other hand, the unique naming component was potentially associated with the bilateral anterior temporal poles, hippocampus and cerebellar areas. This is in line with the models proposing that object naming relies on a left-lateralised language dominant system that interacts with a bilateral anterior temporal network. Neuropsychological deficits in object naming can reflect both the increased demands specific to the task and the more general difficulties in language processing.
Havas, David A; Chapp, Christopher B
2016-01-01
How does language influence the emotions and actions of large audiences? Functionally, emotions help address environmental uncertainty by constraining the body to support adaptive responses and social coordination. We propose emotions provide a similar function in language processing by constraining the mental simulation of language content to facilitate comprehension, and to foster alignment of mental states in message recipients. Consequently, we predicted that emotion-inducing language should be found in speeches specifically designed to create audience alignment - stump speeches of United States presidential candidates. We focused on phrases in the past imperfective verb aspect ("a bad economy was burdening us") that leave a mental simulation of the language content open-ended, and thus unconstrained, relative to past perfective sentences ("we were burdened by a bad economy"). As predicted, imperfective phrases appeared more frequently in stump versus comparison speeches, relative to perfective phrases. In a subsequent experiment, participants rated phrases from presidential speeches as more emotionally intense when written in the imperfective aspect compared to the same phrases written in the perfective aspect, particularly for sentences perceived as negative in valence. These findings are consistent with the notion that emotions have a role in constraining the comprehension of language, a role that may be used in communication with large audiences.
Neural Oscillations Carry Speech Rhythm through to Comprehension
Peelle, Jonathan E.; Davis, Matthew H.
2012-01-01
A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251
Selective spatial attention modulates bottom-up informational masking of speech
Carlile, Simon; Corkhill, Caitlin
2015-01-01
To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention. PMID:25727100
Selective spatial attention modulates bottom-up informational masking of speech.
Carlile, Simon; Corkhill, Caitlin
2015-03-02
To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention.
APEX/SPIN: a free test platform to measure speech intelligibility.
Francart, Tom; Hofmann, Michael; Vanthornhout, Jonas; Van Deun, Lieselot; van Wieringen, Astrid; Wouters, Jan
2017-02-01
Measuring speech intelligibility in quiet and noise is important in clinical practice and research. An easy-to-use free software platform for conducting speech tests is presented, called APEX/SPIN. The APEX/SPIN platform allows the use of any speech material in combination with any noise. A graphical user interface provides control over a large range of parameters, such as number of loudspeakers, signal-to-noise ratio and parameters of the procedure. An easy-to-use graphical interface is provided for calibration and storage of calibration values. To validate the platform, perception of words in quiet and sentences in noise were measured both with APEX/SPIN and with an audiometer and CD player, which is a conventional setup in current clinical practice. Five normal-hearing listeners participated in the experimental evaluation. Speech perception results were similar for the APEX/SPIN platform and conventional procedures. APEX/SPIN is a freely available and open source platform that allows the administration of all kinds of custom speech perception tests and procedures.
The Nationwide Speech Project: A multi-talker multi-dialect speech corpus
NASA Astrophysics Data System (ADS)
Clopper, Cynthia G.; Pisoni, David B.
2004-05-01
Most research on regional phonological variation relies on field recordings of interview speech. Recent research on the perception of dialect variation by naive listeners, however, has relied on read sentence materials in order to control for phonological and lexical content and syntax. The Nationwide Speech Project corpus was designed to obtain a large amount of speech from a number of talkers representing different regional varieties of American English. Five male and five female talkers from each of six different dialect regions in the United States were recorded reading isolated words, sentences, and passages, and in conversations with the experimenter. The talkers ranged in age from 18 and 25 years old and they were all monolingual native speakers of American English. They had lived their entire life in one dialect region and both of their parents were raised in the same region. Results of an acoustic analysis of the vowel spaces of the talkers included in the Nationwide Speech Project will be presented. [Work supported by NIH.
Determining the importance of fundamental hearing aid attributes.
Meister, Hartmut; Lausberg, Isabel; Kiessling, Juergen; Walger, Martin; von Wedel, Hasso
2002-07-01
To determine the importance of fundamental hearing aid attributes and to elicit measures of satisfaction and dissatisfaction. A prospective study based on a survey using a decompositional approach of preference measurement (conjoint analysis). Ear, nose, and throat university hospitals in Cologne and Giessen; various branches of hearing aid dispensers. A random sample of 175 experienced hearing aid users aged 20 to 91 years (mean age, 61 yr) recruited at two different sites. Relative importance of different hearing aid attributes, satisfaction and dissatisfaction with hearing aid attributes. Of the six fundamental hearing aid attributes assessed by the hearing aid users, the two features concerning speech perception attained the highest relative importance (25% speech in quiet, 27% speech in noise). The remaining four attributes (sound quality, handling, feedback, localization) had significantly lower values in a narrow range of 10 to 12%. Comparison of different subgroups of hearing aid wearers based on sociodemographic and user-specific data revealed a large interindividual scatter of the preferences for the attributes. A similar examination with 25 clinicians revealed overestimation of the importance of the attributes commonly associated with problems. Moreover, examination of satisfaction showed that speech in noise was the most frequent source of dissatisfaction (30% of all statements), whereas the subjects were satisfied with speech in quiet. The results emphasize the high importance of attributes related to speech perception. Speech discrimination in noise was the most important but also the most frequent source of negative statements. This attribute will be the outstanding parameter of future developments. Appropriate handling becomes an important factor for elderly subjects. However, because of the large interindividual scatter of data, the preferences of different hearing aid users were hardly predictable, giving evidence of multifactorial influences.
Adaptive plasticity in speech perception: Effects of external information and internal predictions.
Guediche, Sara; Fiez, Julie A; Holt, Lori L
2016-07-01
When listeners encounter speech under adverse listening conditions, adaptive adjustments in perception can improve comprehension over time. In some cases, these adaptive changes require the presence of external information that disambiguates the distorted speech signals, whereas in other cases mere exposure is sufficient. Both external (e.g., written feedback) and internal (e.g., prior word knowledge) sources of information can be used to generate predictions about the correct mapping of a distorted speech signal. We hypothesize that these predictions provide a basis for determining the discrepancy between the expected and actual speech signal that can be used to guide adaptive changes in perception. This study provides the first empirical investigation that manipulates external and internal factors through (a) the availability of explicit external disambiguating information via the presence or absence of postresponse orthographic information paired with a repetition of the degraded stimulus, and (b) the accuracy of internally generated predictions; an acoustic distortion is introduced either abruptly or incrementally. The results demonstrate that the impact of external information on adaptive plasticity is contingent upon whether the intelligibility of the stimuli permits accurate internally generated predictions during exposure. External information sources enhance adaptive plasticity only when input signals are severely degraded and cannot reliably access internal predictions. This is consistent with a computational framework for adaptive plasticity in which error-driven supervised learning relies on the ability to compute sensory prediction error signals from both internal and external sources of information. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.
Weninger, Felix; Eyben, Florian; Schuller, Björn W; Mortillaro, Marcello; Scherer, Klaus R
2013-01-01
WITHOUT DOUBT, THERE IS EMOTIONAL INFORMATION IN ALMOST ANY KIND OF SOUND RECEIVED BY HUMANS EVERY DAY: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow's pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of "the sound that something makes," in order to evaluate the system's auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.
McCreery, Ryan W.; Venediktov, Rebecca A.; Coleman, Jaumeiko J.; Leech, Hillary M.
2013-01-01
Purpose Two clinical questions were developed: one addressing the comparison of linear amplification with compression limiting to linear amplification with peak clipping, and the second comparing wide dynamic range compression with linear amplification for outcomes of audibility, speech recognition, speech and language, and self- or parent report in children with hearing loss. Method Twenty-six databases were systematically searched for studies addressing a clinical question and meeting all inclusion criteria. Studies were evaluated for methodological quality, and effect sizes were reported or calculated when possible. Results The literature search resulted in the inclusion of 8 studies. All 8 studies included comparisons of wide dynamic range compression to linear amplification, and 2 of the 8 studies provided comparisons of compression limiting versus peak clipping. Conclusions Moderate evidence from the included studies demonstrated that audibility was improved and speech recognition was either maintained or improved with wide dynamic range compression as compared with linear amplification. No significant differences were observed between compression limiting and peak clipping on outcomes (i.e., speech recognition and self-/parent report) reported across the 2 studies. Preference ratings appear to be influenced by participant characteristics and environmental factors. Further research is needed before conclusions can confidently be drawn. PMID:22858616
ERIC Educational Resources Information Center
Westerhausen, René; Bless, Josef J.; Passow, Susanne; Kompus, Kristiina; Hugdahl, Kenneth
2015-01-01
The ability to use cognitive-control functions to regulate speech perception is thought to be crucial in mastering developmental challenges, such as language acquisition during childhood or compensation for sensory decline in older age, enabling interpersonal communication and meaningful social interactions throughout the entire life span.…
Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech
ERIC Educational Resources Information Center
Meltzner, Geoffrey S.; Hillman, Robert E.
2005-01-01
A large percentage of patients who have undergone laryngectomy to treat advanced laryngeal cancer rely on an electrolarynx (EL) to communicate verbally. Although serviceable, EL speech is plagued by shortcomings in both sound quality and intelligibility. This study sought to better quantify the relative contributions of previously identified…
Validation of Automated Scoring of Oral Reading
ERIC Educational Resources Information Center
Balogh, Jennifer; Bernstein, Jared; Cheng, Jian; Van Moere, Alistair; Townshend, Brent; Suzuki, Masanori
2012-01-01
A two-part experiment is presented that validates a new measurement tool for scoring oral reading ability. Data collected by the U.S. government in a large-scale literacy assessment of adults were analyzed by a system called VersaReader that uses automatic speech recognition and speech processing technologies to score oral reading fluency. In the…
A Survey of Speech Programs in Community Colleges.
ERIC Educational Resources Information Center
Meyer, Arthur C.
The rapid growth of community colleges in the last decade resulted in large numbers of students enrolled in programs previously unavailable to them in a single comprehensive institution. The purpose of this study was to gather and analyze data to provide information about the speech programs that community colleges created or expanded as a result…
Humes, Larry E.; Kidd, Gary R.; Lentz, Jennifer J.
2013-01-01
This study was designed to address individual differences in aided speech understanding among a relatively large group of older adults. The group of older adults consisted of 98 adults (50 female and 48 male) ranging in age from 60 to 86 (mean = 69.2). Hearing loss was typical for this age group and about 90% had not worn hearing aids. All subjects completed a battery of tests, including cognitive (6 measures), psychophysical (17 measures), and speech-understanding (9 measures), as well as the Speech, Spatial, and Qualities of Hearing (SSQ) self-report scale. Most of the speech-understanding measures made use of competing speech and the non-speech psychophysical measures were designed to tap phenomena thought to be relevant for the perception of speech in competing speech (e.g., stream segregation, modulation-detection interference). All measures of speech understanding were administered with spectral shaping applied to the speech stimuli to fully restore audibility through at least 4000 Hz. The measures used were demonstrated to be reliable in older adults and, when compared to a reference group of 28 young normal-hearing adults, age-group differences were observed on many of the measures. Principal-components factor analysis was applied successfully to reduce the number of independent and dependent (speech understanding) measures for a multiple-regression analysis. Doing so yielded one global cognitive-processing factor and five non-speech psychoacoustic factors (hearing loss, dichotic signal detection, multi-burst masking, stream segregation, and modulation detection) as potential predictors. To this set of six potential predictor variables were added subject age, Environmental Sound Identification (ESI), and performance on the text-recognition-threshold (TRT) task (a visual analog of interrupted speech recognition). These variables were used to successfully predict one global aided speech-understanding factor, accounting for about 60% of the variance. PMID:24098273
Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias
2016-01-01
The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286
Flaherty, Mary; Dent, Micheal L.; Sawusch, James R.
2017-01-01
The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with “d” or “t” and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal. PMID:28562597
Flaherty, Mary; Dent, Micheal L; Sawusch, James R
2017-01-01
The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
Data-Driven Subclassification of Speech Sound Disorders in Preschool Children
Vick, Jennell C.; Campbell, Thomas F.; Shriberg, Lawrence D.; Green, Jordan R.; Truemper, Klaus; Rusiewicz, Heather Leavy; Moore, Christopher A.
2015-01-01
Purpose The purpose of the study was to determine whether distinct subgroups of preschool children with speech sound disorders (SSD) could be identified using a subgroup discovery algorithm (SUBgroup discovery via Alternate Random Processes, or SUBARP). Of specific interest was finding evidence of a subgroup of SSD exhibiting performance consistent with atypical speech motor control. Method Ninety-seven preschool children with SSD completed speech and nonspeech tasks. Fifty-three kinematic, acoustic, and behavioral measures from these tasks were input to SUBARP. Results Two distinct subgroups were identified from the larger sample. The 1st subgroup (76%; population prevalence estimate = 67.8%–84.8%) did not have characteristics that would suggest atypical speech motor control. The 2nd subgroup (10.3%; population prevalence estimate = 4.3%– 16.5%) exhibited significantly higher variability in measures of articulatory kinematics and poor ability to imitate iambic lexical stress, suggesting atypical speech motor control. Both subgroups were consistent with classes of SSD in the Speech Disorders Classification System (SDCS; Shriberg et al., 2010a). Conclusion Characteristics of children in the larger subgroup were consistent with the proportionally large SDCS class termed speech delay; characteristics of children in the smaller subgroup were consistent with the SDCS subtype termed motor speech disorder—not otherwise specified. The authors identified candidate measures to identify children in each of these groups. PMID:25076005
Civier, Oren; Tasko, Stephen M; Guenther, Frank H
2010-09-01
This paper investigates the hypothesis that stuttering may result in part from impaired readout of feedforward control of speech, which forces persons who stutter (PWS) to produce speech with a motor strategy that is weighted too much toward auditory feedback control. Over-reliance on feedback control leads to production errors which if they grow large enough, can cause the motor system to "reset" and repeat the current syllable. This hypothesis is investigated using computer simulations of a "neurally impaired" version of the DIVA model, a neural network model of speech acquisition and production. The model's outputs are compared to published acoustic data from PWS' fluent speech, and to combined acoustic and articulatory movement data collected from the dysfluent speech of one PWS. The simulations mimic the errors observed in the PWS subject's speech, as well as the repairs of these errors. Additional simulations were able to account for enhancements of fluency gained by slowed/prolonged speech and masking noise. Together these results support the hypothesis that many dysfluencies in stuttering are due to a bias away from feedforward control and toward feedback control. The reader will be able to (a) describe the contribution of auditory feedback control and feedforward control to normal and stuttered speech production, (b) summarize the neural modeling approach to speech production and its application to stuttering, and (c) explain how the DIVA model accounts for enhancements of fluency gained by slowed/prolonged speech and masking noise.
Modeling Driving Performance Using In-Vehicle Speech Data From a Naturalistic Driving Study.
Kuo, Jonny; Charlton, Judith L; Koppel, Sjaan; Rudin-Brown, Christina M; Cross, Suzanne
2016-09-01
We aimed to (a) describe the development and application of an automated approach for processing in-vehicle speech data from a naturalistic driving study (NDS), (b) examine the influence of child passenger presence on driving performance, and (c) model this relationship using in-vehicle speech data. Parent drivers frequently engage in child-related secondary behaviors, but the impact on driving performance is unknown. Applying automated speech-processing techniques to NDS audio data would facilitate the analysis of in-vehicle driver-child interactions and their influence on driving performance. Speech activity detection and speaker diarization algorithms were applied to audio data from a Melbourne-based NDS involving 42 families. Multilevel models were developed to evaluate the effect of speech activity and the presence of child passengers on driving performance. Speech activity was significantly associated with velocity and steering angle variability. Child passenger presence alone was not associated with changes in driving performance. However, speech activity in the presence of two child passengers was associated with the most variability in driving performance. The effects of in-vehicle speech on driving performance in the presence of child passengers appear to be heterogeneous, and multiple factors may need to be considered in evaluating their impact. This goal can potentially be achieved within large-scale NDS through the automated processing of observational data, including speech. Speech-processing algorithms enable new perspectives on driving performance to be gained from existing NDS data, and variables that were once labor-intensive to process can be readily utilized in future research. © 2016, Human Factors and Ergonomics Society.
Implementation of the Intelligent Voice System for Kazakh
NASA Astrophysics Data System (ADS)
Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.
2014-04-01
Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.
Stromal-epithelial dynamics in response to fractionated radiotherapy
NASA Astrophysics Data System (ADS)
Rong, Panying
The speech of individuals with velopharyngeal incompetency (VPI) is characterized by hypernasality, a speech quality related to excessive emission of acoustic energy through the nose, as caused by failure of velopharyngeal closure. As an attempt to reduce hypernasality and, in turn, improve the quality of VPI-related hypernasal speech, this study is dedicated to developing an approach that uses speech-dependent articulatory adjustments to reduce hypernasality caused by excessive velopharyngeal opening. A preliminary study has been done to derive such articulatory adjustments for hypernasal /i/ vowels based on the simulation of an articulatorymodel (Speech Processing and Synthesis Toolboxes, Childers (2000)). Both nasal /i/ vowels with and without articulatory adjustments were synthesized by the model. Spectral analysis found that nasal acoustic features were attenuated and oral formant structures were restored after articulatory adjustments. In addition, comparisons of perceptual ratings of nasality between the two types of nasal vowels showed the articulatory adjustments generated by the model significantly reduced the perception of nasality for nasal /i/ vowels. Such articulatory adjustments for nasal /i/ have two patterns: 1) a consistent adjustment pattern, which corresponds an expansion at the velopharynx, and 2) some speech-dependent fine-tuning adjustment patterns, including adjustments in the lip area and the upper pharynx. The long-term goal of this study is to apply this approach of articulatory adjustment as a therapeutic tool in clinical speech treatment to detect and correct the maladaptive articulatory behaviors developed spontaneously by speakers with VPI on individual bases. This study constructed a speaker-adaptive articulatory model on the basis of the framework of Childers's vocal tract model to simulate articulatory adjustments aiming at compensating for the acoustic outcome caused by velopharyngeal opening and reducing nasality. To construct such a speaker-adaptive articulatory model, (1) an articulatory-acoustic-aerodynamic database was recorded using the articulography and aerodynamic instruments to provide point-wise articulatory data to be fitted into the framework of Childers's standard vocal tract model; (2) the length and transverse dimension of the vocal tract were adjusted to fit individual speaker by minimizing the acoustic discrepancy between the model simulation and the target derived from acoustic signal in the database using the simulated annealing algorithm; (3) the articulatory space of the model was adjusted to fit individual articulatory features by adapting the movement ranges of all articulators. With the speaker-adaptive articulatory model, the articulatory configurations of the oral and nasal vowels in the database were simulated and synthesized. Given the acoustic targets derived from the oral vowels in the database, speech-dependent articulatory adjustments were simulated to compensate for the acoustic outcome caused by VPO. The resultant articulatory configurations corresponds to nasal vowels with articulatory adjustment, which were synthesized to serve as the perceptual stimuli for a listening task of nasality rating. The oral and nasal vowels synthesized based on the oral and nasal vowel targets in the database also served as the perceptual stimuli. The results suggest both acoustic and perceptual effects of the mode-generated articulatory adjustment on the nasal vowels /a/, /i/ and /u/. In terms of acoustics, the articulatory adjustment (1) restores the altered formant structures due to nasal coupling, including shifted formant frequency, attenuated formant intensity and expanded formant bandwidth and (2) attenuates the peaks and zeros caused by nasal resonances. Perceptually, the articulatory adjustment generated by the speaker-adaptive model significantly reduces the perceived nasality for all three vowels (/a/, /i/, /u/). The acoustic and perceptual effects of articulatory adjustment suggest achievement of the acoustic goal of compensating for the acoustic discrepancy caused by VPO and the auditory goal of reducing the perception of nasality. Such a finding is consistent with motor equivalence (Hughes and Abbs, 1976; Maeda, 1990), which enables inter-articulator coordination to compensate for the deviation from the acoustic/auditory goal caused by the shifted position of an articulator. The articulatory adjustment responsible for the acoustic and perceptual effects as described above was decomposed into a set of empirical orthogonal modes (Story and Titze, 1998). Both gross articulatory patterns and fine-tuning adjustments were found in the principal orthogonal modes, which lead to the acoustic compensation and reduction of nasality. For /a/ and /i/, a direct relationship was found among the acoustic features, nasality, and articulatory adjustment patterns. Specifically, the articulatory adjustments indicated by the principal orthogonal modes of the adjusted nasal /a/ and /i/ were directly correlated with the attenuation of the acoustic cues of nasality (i.e., shifting of F1 and F2 frequencies) and the reduction of nasality rating. For /u/, such a direct relationship among the acoustic features, nasality and articulatory adjustment was not as prominent, suggesting the possibility of additional acoustic correlates of nasality other than F1 and F2. The findings of this study demonstrate the possibility of using articulatory adjustment to reduce the perception of nasality through model simulation. A speaker-adaptive articulatory model is able to simulate individual-based articulatory adjustment strategies that can be applied in clinical settings to serve as the articulatory targets for correction of the maladaptive articulatory behaviors developed spontaneously by speakers with hypernasal speech. Such a speaker-adaptive articulatory model provides an intuitive way of articulatory learning and self-training for speakers with VPI to learn appropriate articulatory strategies through model-speaker interaction.
Integrating hidden Markov model and PRAAT: a toolbox for robust automatic speech transcription
NASA Astrophysics Data System (ADS)
Kabir, A.; Barker, J.; Giurgiu, M.
2010-09-01
An automatic time-aligned phone transcription toolbox of English speech corpora has been developed. Especially the toolbox would be very useful to generate robust automatic transcription and able to produce phone level transcription using speaker independent models as well as speaker dependent models without manual intervention. The system is based on standard Hidden Markov Models (HMM) approach and it was successfully experimented over a large audiovisual speech corpus namely GRID corpus. One of the most powerful features of the toolbox is the increased flexibility in speech processing where the speech community would be able to import the automatic transcription generated by HMM Toolkit (HTK) into a popular transcription software, PRAAT, and vice-versa. The toolbox has been evaluated through statistical analysis on GRID data which shows that automatic transcription deviates by an average of 20 ms with respect to manual transcription.
Phonetic Modification of Vowel Space in Storybook Speech to Infants up to 2 Years of Age
Burnham, Evamarie B.; Wieland, Elizabeth A.; Kondaurova, Maria V.; McAuley, J. Devin; Bergeson, Tonya R.
2015-01-01
Purpose A large body of literature has indicated vowel space area expansion in infant-directed (ID) speech compared with adult-directed (AD) speech, which may promote language acquisition. The current study tested whether this expansion occurs in storybook speech read to infants at various points during their first 2 years of life. Method In 2 studies, mothers read a storybook containing target vowels in ID and AD speech conditions. Study 1 was longitudinal, with 11 mothers recorded when their infants were 3, 6, and 9 months old. Study 2 was cross-sectional, with 48 mothers recorded when their infants were 3, 9, 13, or 20 months old (n = 12 per group). The 1st and 2nd formants of vowels /i/, /ɑ/, and /u/ were measured, and vowel space area and dispersion were calculated. Results Across both studies, 1st and/or 2nd formant frequencies shifted systematically for /i/ and /u/ vowels in ID compared with AD speech. No difference in vowel space area or dispersion was found. Conclusions The results suggest that a variety of communication and situational factors may affect phonetic modifications in ID speech, but that vowel space characteristics in speech to infants stay consistent across the first 2 years of life. PMID:25659121
Strand, Edythe A; McCauley, Rebecca J; Weigand, Stephen D; Stoeckel, Ruth E; Baas, Becky S
2013-04-01
In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Participants were 81 children between 36 and 79 months of age who were referred to the Mayo Clinic for diagnosis of speech sound disorders. Children were given the DEMSS and a standard speech and language test battery as part of routine evaluations. Subsequently, intrajudge, interjudge, and test-retest reliability were evaluated for a subset of participants. Construct validity was explored for all 81 participants through the use of agglomerative cluster analysis, sensitivity measures, and likelihood ratios. The mean percentage of agreement for 171 judgments was 89% for test-retest reliability, 89% for intrajudge reliability, and 91% for interjudge reliability. Agglomerative hierarchical cluster analysis showed that total DEMSS scores largely differentiated clusters of children with CAS vs. mild CAS vs. other speech disorders. Positive and negative likelihood ratios and measures of sensitivity and specificity suggested that the DEMSS does not overdiagnose CAS but sometimes fails to identify children with CAS. The value of the DEMSS in differential diagnosis of severe speech impairments was supported on the basis of evidence of reliability and validity.
2000-03-01
Prieur de la Crte d’Or Ciudad Universitaria 94114 Arcueil Cedex 28040 Madrid France Spain Mr. John J. Grieco Dr. Dough Reynolds AFRL/IFEC Information...CANADA HONGRIE PORTUGAL Directeur - Recherche et d~veloppement - Department for Scientific Estado Maior da Forqa Afrea Communications et gestion de
Chaspari, Theodora; Soldatos, Constantin; Maragos, Petros
2015-01-01
The development of ecologically valid procedures for collecting reliable and unbiased emotional data towards computer interfaces with social and affective intelligence targeting patients with mental disorders. Following its development, presented with, the Athens Emotional States Inventory (AESI) proposes the design, recording and validation of an audiovisual database for five emotional states: anger, fear, joy, sadness and neutral. The items of the AESI consist of sentences each having content indicative of the corresponding emotion. Emotional content was assessed through a survey of 40 young participants with a questionnaire following the Latin square design. The emotional sentences that were correctly identified by 85% of the participants were recorded in a soundproof room with microphones and cameras. A preliminary validation of AESI is performed through automatic emotion recognition experiments from speech. The resulting database contains 696 recorded utterances in Greek language by 20 native speakers and has a total duration of approximately 28 min. Speech classification results yield accuracy up to 75.15% for automatically recognizing the emotions in AESI. These results indicate the usefulness of our approach for collecting emotional data with reliable content, balanced across classes and with reduced environmental variability.
MPEG-7 audio-visual indexing test-bed for video retrieval
NASA Astrophysics Data System (ADS)
Gagnon, Langis; Foucher, Samuel; Gouaillier, Valerie; Brun, Christelle; Brousseau, Julie; Boulianne, Gilles; Osterrath, Frederic; Chapdelaine, Claude; Dutrisac, Julie; St-Onge, Francis; Champagne, Benoit; Lu, Xiaojian
2003-12-01
This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 Audio-Visual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual content like face recognition, motion activity, speech recognition and semantic clustering. The MPEG-7/XML encoding of the film database is done off-line. The description decomposition is based on a temporal decomposition into visual segments (shots), key frames and audio/speech sub-segments. The visible outcome will be a web site that allows video retrieval using a proprietary XQuery-based search engine and accessible to members at the Canadian National Film Board (NFB) Cineroute site. For example, end-user will be able to ask to point on movie shots in the database that have been produced in a specific year, that contain the face of a specific actor who tells a specific word and in which there is no motion activity. Video streaming is performed over the high bandwidth CA*net network deployed by CANARIE, a public Canadian Internet development organization.
Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech.
Gerratt, Bruce R; Kreiman, Jody; Garellek, Marc
2016-10-01
The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.
Speech and swallowing disorders in Parkinson disease.
Sapir, Shimon; Ramig, Lorraine; Fox, Cynthia
2008-06-01
To review recent research and clinical studies pertaining to the nature, diagnosis, and treatment of speech and swallowing disorders in Parkinson disease. Although some studies indicate improvement in voice and speech with dopamine therapy and deep brain stimulation of the subthalamic nucleus, others show minimal or adverse effects. Repetitive transcranial magnetic stimulation of the mouth motor cortex and injection of collagen in the vocal folds have preliminary data supporting improvement in phonation in people with Parkinson disease. Treatments focusing on vocal loudness, specifically LSVT LOUD (Lee Silverman Voice Treatment), have been effective for the treatment of speech disorders in Parkinson disease. Changes in brain activity due to LSVT LOUD provide preliminary evidence for neural plasticity. Computer-based technology makes the Lee Silverman Voice Treatment available to a large number of users. A rat model for studying neuropharmacologic effects on vocalization in Parkinson disease has been developed. New diagnostic methods of speech and swallowing are also available as the result of recent studies. Speech rehabilitation with the LSVT LOUD is highly efficacious and scientifically tested. There is a need for more studies to improve understanding, diagnosis, prevention, and treatment of speech and swallowing disorders in Parkinson disease.
Smiljanić, Rajka; Bradlow, Ann R.
2011-01-01
This study investigated how native language background interacts with speaking style adaptations in determining levels of speech intelligibility. The aim was to explore whether native and high proficiency non-native listeners benefit similarly from native and non-native clear speech adjustments. The sentence-in-noise perception results revealed that fluent non-native listeners gained a large clear speech benefit from native clear speech modifications. Furthermore, proficient non-native talkers in this study implemented conversational-to-clear speaking style modifications in their second language (L2) that resulted in significant intelligibility gain for both native and non-native listeners. The results of the accentedness ratings obtained for native and non-native conversational and clear speech sentences showed that while intelligibility was improved, the presence of foreign accent remained constant in both speaking styles. This suggests that objective intelligibility and subjective accentedness are two independent dimensions of non-native speech. Overall, these results provide strong evidence that greater experience in L2 processing leads to improved intelligibility in both production and perception domains. These results also demonstrated that speaking style adaptations along with less signal distortion can contribute significantly towards successful native and non-native interactions. PMID:22225056
Carrigg, Bronwyn; Parry, Louise; Baker, Elise; Shriberg, Lawrence D; Ballard, Kirrie J
2016-10-05
This study describes the phenotype in a large family with a strong, multigenerational history of severe speech sound disorder (SSD) persisting into adolescence and adulthood in approximately half the cases. Aims were to determine whether a core phenotype, broader than speech, separated persistent from resolved SSD cases; and to ascertain the uniqueness of the phenotype relative to published cases. Eleven members of the PM family (9-55 years) were assessed across cognitive, language, literacy, speech, phonological processing, numeracy, and motor domains. Between group comparisons were made using the Mann-Whitney U-test (p < 0.01). Participant performances were compared to normative data using standardized tests and to the limited published data on persistent SSD phenotypes. Significant group differences were evident on multiple speech, language, literacy, phonological processing, and verbal intellect measures without any overlapping scores. Persistent cases performed within the impaired range on multiple measures. Phonological memory impairment and subtle literacy weakness were present in resolved SSD cases. A core phenotype distinguished persistent from resolved SSD cases that was characterized by a multiple verbal trait disorder, including Childhood Apraxia of Speech. Several phenotypic differences differentiated the persistent SSD phenotype in the PM family from the few previously reported studies of large families with SSD, including the absence of comorbid dysarthria and marked orofacial apraxia. This study highlights how comprehensive phenotyping can advance the behavioral study of disorders, in addition to forming a solid basis for future genetic and neural studies. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Priming of Non-Speech Vocalizations in Male Adults: The Influence of the Speaker's Gender
ERIC Educational Resources Information Center
Fecteau, Shirley; Armony, Jorge L.; Joanette, Yves; Belin, Pascal
2004-01-01
Previous research reported a priming effect for voices. However, the type of information primed is still largely unknown. In this study, we examined the influence of speaker's gender and emotional category of the stimulus on priming of non-speech vocalizations in 10 male participants, who performed a gender identification task. We found a…
The Haskins Optically Corrected Ultrasound System
ERIC Educational Resources Information Center
Whalen, D. H.; Iskarous, Khalil; Tiede, Mark K.; Ostry, David J.; Lehnert-LeHouillier, Heike; Vatikiotis-Bateson, Eric; Hailey, Donald S.
2005-01-01
The tongue is critical in the production of speech, yet its nature has made it difficult to measure. Not only does its ability to attain complex shapes make it difficult to track, it is also largely hidden from view during speech. The present article describes a new combination of optical tracking and ultrasound imaging that allows for a…
ERIC Educational Resources Information Center
Radford, Julie
2010-01-01
Repair practices used by teachers who work with children with specific speech and language difficulties (SSLDs) have hitherto remained largely unexplored. Such classrooms therefore offer a new context for researching repairs and considering how they compare with non-SSLD interactions. Repair trajectories are of interest because they are dialogic…
ERIC Educational Resources Information Center
Chasaide, Ailbhe Ni; Davis, Eugene
The data processing system used at Trinity College's Centre for Language and Communication Studies (Ireland) enables computer-automated collection and analysis of phonetic data and has many advantages for research on speech production. The system allows accurate handling of large quantities of data, eliminates many of the limitations of manual…
Bivariate Genetic Analyses of Stuttering and Nonfluency in a Large Sample of 5-Year-Old Twins
ERIC Educational Resources Information Center
van Beijsterveldt, Catharina Eugenie Maria; Felsenfeld, Susan; Boomsma, Dorret Irene
2010-01-01
Purpose: Behavioral genetic studies of speech fluency have focused on participants who present with clinical stuttering. Knowledge about genetic influences on the development and regulation of normal speech fluency is limited. The primary aims of this study were to identify the heritability of stuttering and high nonfluency and to assess the…
Remote Capture of Human Voice Acoustical Data by Telephone: A Methods Study
ERIC Educational Resources Information Center
Cannizzaro, Michael S.; Reilly, Nicole; Mundt, James C.; Snyder, Peter J.
2005-01-01
In this pilot study we sought to determine the reliability and validity of collecting speech and voice acoustical data via telephone transmission for possible future use in large clinical trials. Simultaneous recordings of each participant's speech and voice were made at the point of participation, the local recording (LR), and over a telephone…
Revealing the dual streams of speech processing.
Fridriksson, Julius; Yourganov, Grigori; Bonilha, Leonardo; Basilakos, Alexandra; Den Ouden, Dirk-Bart; Rorden, Christopher
2016-12-27
Several dual route models of human speech processing have been proposed suggesting a large-scale anatomical division between cortical regions that support motor-phonological aspects vs. lexical-semantic aspects of speech processing. However, to date, there is no complete agreement on what areas subserve each route or the nature of interactions across these routes that enables human speech processing. Relying on an extensive behavioral and neuroimaging assessment of a large sample of stroke survivors, we used a data-driven approach using principal components analysis of lesion-symptom mapping to identify brain regions crucial for performance on clusters of behavioral tasks without a priori separation into task types. Distinct anatomical boundaries were revealed between a dorsal frontoparietal stream and a ventral temporal-frontal stream associated with separate components. Collapsing over the tasks primarily supported by these streams, we characterize the dorsal stream as a form-to-articulation pathway and the ventral stream as a form-to-meaning pathway. This characterization of the division in the data reflects both the overlap between tasks supported by the two streams as well as the observation that there is a bias for phonological production tasks supported by the dorsal stream and lexical-semantic comprehension tasks supported by the ventral stream. As such, our findings show a division between two processing routes that underlie human speech processing and provide an empirical foundation for studying potential computational differences that distinguish between the two routes.
NASA Astrophysics Data System (ADS)
Ghoraani, Behnaz; Krishnan, Sridhar
2009-12-01
The number of people affected by speech problems is increasing as the modern world places increasing demands on the human voice via mobile telephones, voice recognition software, and interpersonal verbal communications. In this paper, we propose a novel methodology for automatic pattern classification of pathological voices. The main contribution of this paper is extraction of meaningful and unique features using Adaptive time-frequency distribution (TFD) and nonnegative matrix factorization (NMF). We construct Adaptive TFD as an effective signal analysis domain to dynamically track the nonstationarity in the speech and utilize NMF as a matrix decomposition (MD) technique to quantify the constructed TFD. The proposed method extracts meaningful and unique features from the joint TFD of the speech, and automatically identifies and measures the abnormality of the signal. Depending on the abnormality measure of each signal, we classify the signal into normal or pathological. The proposed method is applied on the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database which consists of 161 pathological and 51 normal speakers, and an overall classification accuracy of 98.6% was achieved.
Video-assisted segmentation of speech and audio track
NASA Astrophysics Data System (ADS)
Pandit, Medha; Yusoff, Yusseri; Kittler, Josef; Christmas, William J.; Chilton, E. H. S.
1999-08-01
Video database research is commonly concerned with the storage and retrieval of visual information invovling sequence segmentation, shot representation and video clip retrieval. In multimedia applications, video sequences are usually accompanied by a sound track. The sound track contains potential cues to aid shot segmentation such as different speakers, background music, singing and distinctive sounds. These different acoustic categories can be modeled to allow for an effective database retrieval. In this paper, we address the problem of automatic segmentation of audio track of multimedia material. This audio based segmentation can be combined with video scene shot detection in order to achieve partitioning of the multimedia material into semantically significant segments.
Masking effects of speech and music: does the masker's hierarchical structure matter?
Shi, Lu-Feng; Law, Yvonne
2010-04-01
Speech and music are time-varying signals organized by parallel hierarchical rules. Through a series of four experiments, this study compared the masking effects of single-talker speech and instrumental music on speech perception while manipulating the complexity of hierarchical and temporal structures of the maskers. Listeners' word recognition was found to be similar between hierarchically intact and disrupted speech or classical music maskers (Experiment 1). When sentences served as the signal, significantly greater masking effects were observed with disrupted than intact speech or classical music maskers (Experiment 2), although not with jazz or serial music maskers, which differed from the classical music masker in their hierarchical structures (Experiment 3). Removing the classical music masker's temporal dynamics or partially restoring it affected listeners' sentence recognition; yet, differences in performance between intact and disrupted maskers remained robust (Experiment 4). Hence, the effect of structural expectancy was largely present across maskers when comparing them before and after their hierarchical structure was purposefully disrupted. This effect seemed to lend support to the auditory stream segregation theory.
A Comparative Analysis of Pitch Detection Methods Under the Influence of Different Noise Conditions.
Sukhostat, Lyudmila; Imamverdiyev, Yadigar
2015-07-01
Pitch is one of the most important components in various speech processing systems. The aim of this study was to evaluate different pitch detection methods in terms of various noise conditions. Prospective study. For evaluation of pitch detection algorithms, time-domain, frequency-domain, and hybrid methods were considered by using Keele and CSTR speech databases. Each of them has its own advantages and disadvantages. Experiments have shown that BaNa method achieves the highest pitch detection accuracy. The development of methods for pitch detection, which are robust to additive noise at different signal-to-noise ratio, is an important field of research with many opportunities for enhancement the modern methods. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Anagnostopoulos, Christos Nikolaos; Vovoli, Eftichia
An emotion recognition framework based on sound processing could improve services in human-computer interaction. Various quantitative speech features obtained from sound processing of acting speech were tested, as to whether they are sufficient or not to discriminate between seven emotions. Multilayered perceptrons were trained to classify gender and emotions on the basis of a 24-input vector, which provide information about the prosody of the speaker over the entire sentence using statistics of sound features. Several experiments were performed and the results were presented analytically. Emotion recognition was successful when speakers and utterances were “known” to the classifier. However, severe misclassifications occurred during the utterance-independent framework. At least, the proposed feature vector achieved promising results for utterance-independent recognition of high- and low-arousal emotions.
HTM Spatial Pooler With Memristor Crossbar Circuits for Sparse Biometric Recognition.
James, Alex Pappachen; Fedorova, Irina; Ibrayev, Timur; Kudithipudi, Dhireesha
2017-06-01
Hierarchical Temporal Memory (HTM) is an online machine learning algorithm that emulates the neo-cortex. The development of a scalable on-chip HTM architecture is an open research area. The two core substructures of HTM are spatial pooler and temporal memory. In this work, we propose a new Spatial Pooler circuit design with parallel memristive crossbar arrays for the 2D columns. The proposed design was validated on two different benchmark datasets, face recognition, and speech recognition. The circuits are simulated and analyzed using a practical memristor device model and 0.18 μm IBM CMOS technology model. The databases AR, YALE, ORL, and UFI, are used to test the performance of the design in face recognition. TIMIT dataset is used for the speech recognition.
"My Mind Is Doing It All": No "Brake" to Stop Speech Generation in Jargon Aphasia.
Robinson, Gail A; Butterworth, Brian; Cipolotti, Lisa
2015-12-01
To study whether pressure of speech in jargon aphasia arises out of disturbances to core language or executive processes, or at the intersection of conceptual preparation. Conceptual preparation mechanisms for speech have not been well studied. Several mechanisms have been proposed for jargon aphasia, a fluent, well-articulated, logorrheic propositional speech that is almost incomprehensible. We studied the vast quantity of jargon speech produced by patient J.A., who had suffered an infarct after the clipping of a middle cerebral artery aneurysm. We gave J.A. baseline cognitive tests and experimental word- and sentence-generation tasks that we had designed for patients with dynamic aphasia, a severely reduced but otherwise fairly normal propositional speech thought to result from deficits in conceptual preparation. J.A. had cognitive dysfunction, including executive difficulties, and a language profile characterized by poor repetition and naming in the context of relatively intact single-word comprehension. J.A.'s spontaneous speech was fluent but jargon. He had no difficulty generating sentences; in contrast to dynamic aphasia, his sentences were largely meaningless and not significantly affected by stimulus constraint level. This patient with jargon aphasia highlights that voluminous speech output can arise from disturbances of both language and executive functions. Our previous studies have identified three conceptual preparation mechanisms for speech: generation of novel thoughts, their sequencing, and selection. This study raises the possibility that a "brake" to stop message generation may be a fourth conceptual preparation mechanism behind the pressure of speech characteristic of jargon aphasia.
Reddy, Rajgopal R; Gosla Reddy, Srinivas; Vaidhyanathan, Anitha; Bergé, Stefaan J; Kuijpers-Jagtman, Anne Marie
2017-06-01
The number of surgical procedures to repair a cleft palate may play a role in the outcome for maxillofacial growth and speech. The aim of this systematic review was to investigate the relationship between the number of surgical procedures performed to repair the cleft palate and maxillofacial growth, speech and fistula formation in non-syndromic patients with unilateral cleft lip and palate. An electronic search was performed in PubMed/old MEDLINE, the Cochrane Library, EMBASE, Scopus and CINAHL databases for publications between 1960 and December 2015. Publications before 1950-journals of plastic and maxillofacial surgery-were hand searched. Additional hand searches were performed on studies mentioned in the reference lists of relevant articles. Search terms included unilateral, cleft lip and/or palate and palatoplasty. Two reviewers assessed eligibility for inclusion, extracted data, applied quality indicators and graded level of evidence. Twenty-six studies met the inclusion criteria. All were retrospective and non-randomized comparisons of one- and two-stage palatoplasty. The methodological quality of most of the studies was graded moderate to low. The outcomes concerned the comparison of one- and two-stage palatoplasty with respect to growth of the mandible, maxilla and cranial base, and speech and fistula formation. Due to the lack of high-quality studies there is no conclusive evidence of a relationship between one- or two-stage palatoplasty and facial growth, speech and fistula formation in patients with unilateral cleft lip and palate. Copyright © 2017 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Speech and motor disturbances in Rett syndrome.
Bashina, V M; Simashkova, N V; Grachev, V V; Gorbachevskaya, N L
2002-01-01
Rett syndrome is a severe, genetically determined disease of early childhood which produces a defined clinical phenotype in girls. The main clinical manifestations include lesions affecting speech functions, involving both expressive and receptive speech, as well as motor functions, producing apraxia of the arms and profound abnormalities of gait in the form of ataxia-apraxia. Most investigators note that patients have variability in the severity of derangement to large motor acts and in the damage to fine hand movements and speech functions. The aims of the present work were to study disturbances of speech and motor functions over 2-5 years in 50 girls aged 12 months to 14 years with Rett syndrome and to analyze the correlations between these disturbances. The results of comparing clinical data and EEG traces supported the stepwise involvement of frontal and parietal-temporal cortical structures in the pathological process. The ability to organize speech and motor activity is affected first, with subsequent development of lesions to gnostic functions, which are in turn followed by derangement of subcortical structures and the cerebellum and later by damage to structures in the spinal cord. A clear correlation was found between the severity of lesions to motor and speech functions and neurophysiological data: the higher the level of preservation of elements of speech and motor functions, the smaller were the contributions of theta activity and the greater the contributions of alpha and beta activities to the EEG. The possible pathogenetic mechanisms underlying the motor and speech disturbances in Rett syndrome are discussed.
The Timing and Effort of Lexical Access in Natural and Degraded Speech
Wagner, Anita E.; Toffanin, Paolo; Başkent, Deniz
2016-01-01
Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners’ quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal situations speech is perceived as an undemanding task. Degradation of the signal or the receiver channel can quickly bring this well-adjusted timing out of balance and lead to increase in mental effort. Incomplete and effortful processing at the early pre-lexical stages has its consequences on lexical processing as it adds uncertainty to the forming and revising of lexical hypotheses. PMID:27065901
Speech processing in children with functional articulation disorders.
Gósy, Mária; Horváth, Viktória
2015-03-01
This study explored auditory speech processing and comprehension abilities in 5-8-year-old monolingual Hungarian children with functional articulation disorders (FADs) and their typically developing peers. Our main hypothesis was that children with FAD would show co-existing auditory speech processing disorders, with different levels of these skills depending on the nature of the receptive processes. The tasks included (i) sentence and non-word repetitions, (ii) non-word discrimination and (iii) sentence and story comprehension. Results suggest that the auditory speech processing of children with FAD is underdeveloped compared with that of typically developing children, and largely varies across task types. In addition, there are differences between children with FAD and controls in all age groups from 5 to 8 years. Our results have several clinical implications.
ERIC Educational Resources Information Center
Uno, Mariko
2017-01-01
The present dissertation extracted 17,291 questions from Aki, Ryo, and Tai and their mother's spontaneously produced speech data available in the CHILDES database (MacWhinney, 2000; Oshima-Takane & MacWhinney, 1998). The children's age ranged from 1;3 to 3;0. Their questions were coded for (1) yes/no questions that include a sentence-final…
Schwartz, Jean-Luc; Savariaux, Christophe
2014-01-01
An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction. PMID:25079216
Quantifying the intelligibility of speech in noise for non-native listeners.
van Wijngaarden, Sander J; Steeneken, Herman J M; Houtgast, Tammo
2002-04-01
When listening to languages learned at a later age, speech intelligibility is generally lower than when listening to one's native language. The main purpose of this study is to quantify speech intelligibility in noise for specific populations of non-native listeners, only broadly addressing the underlying perceptual and linguistic processing. An easy method is sought to extend these quantitative findings to other listener populations. Dutch subjects listening to Germans and English speech, ranging from reasonable to excellent proficiency in these languages, were found to require a 1-7 dB better speech-to-noise ratio to obtain 50% sentence intelligibility than native listeners. Also, the psychometric function for sentence recognition in noise was found to be shallower for non-native than for native listeners (worst-case slope around the 50% point of 7.5%/dB, compared to 12.6%/dB for native listeners). Differences between native and non-native speech intelligibility are largely predicted by linguistic entropy estimates as derived from a letter guessing task. Less effective use of context effects (especially semantic redundancy) explains the reduced speech intelligibility for non-native listeners. While measuring speech intelligibility for many different populations of listeners (languages, linguistic experience) may be prohibitively time consuming, obtaining predictions of non-native intelligibility from linguistic entropy may help to extend the results of this study to other listener populations.
Quantifying the intelligibility of speech in noise for non-native listeners
NASA Astrophysics Data System (ADS)
van Wijngaarden, Sander J.; Steeneken, Herman J. M.; Houtgast, Tammo
2002-04-01
When listening to languages learned at a later age, speech intelligibility is generally lower than when listening to one's native language. The main purpose of this study is to quantify speech intelligibility in noise for specific populations of non-native listeners, only broadly addressing the underlying perceptual and linguistic processing. An easy method is sought to extend these quantitative findings to other listener populations. Dutch subjects listening to Germans and English speech, ranging from reasonable to excellent proficiency in these languages, were found to require a 1-7 dB better speech-to-noise ratio to obtain 50% sentence intelligibility than native listeners. Also, the psychometric function for sentence recognition in noise was found to be shallower for non-native than for native listeners (worst-case slope around the 50% point of 7.5%/dB, compared to 12.6%/dB for native listeners). Differences between native and non-native speech intelligibility are largely predicted by linguistic entropy estimates as derived from a letter guessing task. Less effective use of context effects (especially semantic redundancy) explains the reduced speech intelligibility for non-native listeners. While measuring speech intelligibility for many different populations of listeners (languages, linguistic experience) may be prohibitively time consuming, obtaining predictions of non-native intelligibility from linguistic entropy may help to extend the results of this study to other listener populations.
A highly penetrant form of childhood apraxia of speech due to deletion of 16p11.2
Fedorenko, Evelina; Morgan, Angela; Murray, Elizabeth; Cardinaux, Annie; Mei, Cristina; Tager-Flusberg, Helen; Fisher, Simon E; Kanwisher, Nancy
2016-01-01
Individuals with heterozygous 16p11.2 deletions reportedly suffer from a variety of difficulties with speech and language. Indeed, recent copy-number variant screens of children with childhood apraxia of speech (CAS), a specific and rare motor speech disorder, have identified three unrelated individuals with 16p11.2 deletions. However, the nature and prevalence of speech and language disorders in general, and CAS in particular, is unknown for individuals with 16p11.2 deletions. Here we took a genotype-first approach, conducting detailed and systematic characterization of speech abilities in a group of 11 unrelated children ascertained on the basis of 16p11.2 deletions. To obtain the most precise and replicable phenotyping, we included tasks that are highly diagnostic for CAS, and we tested children under the age of 18 years, an age group where CAS has been best characterized. Two individuals were largely nonverbal, preventing detailed speech analysis, whereas the remaining nine met the standard accepted diagnostic criteria for CAS. These results link 16p11.2 deletions to a highly penetrant form of CAS. Our findings underline the need for further precise characterization of speech and language profiles in larger groups of affected individuals, which will also enhance our understanding of how genetic pathways contribute to human communication disorders. PMID:26173965
Pitch-Based Segregation of Reverberant Speech
2005-02-01
speaker recognition in real environments, audio information retrieval and hearing prosthesis. Second, although binaural listening improves the...intelligibility of target speech under anechoic conditions (Bronkhorst, 2000), this binaural advantage is largely eliminated by reverberation (Plomp, 1976...Brown and Cooke, 1994; Wang and Brown, 1999; Hu and Wang, 2004) as well as in binaural separation (e.g., Roman et al., 2003; Palomaki et al., 2004
ERIC Educational Resources Information Center
Gentilucci, Maurizio; Campione, Giovanna Cristina; Volta, Riccardo Dalla; Bernardis, Paolo
2009-01-01
Does the mirror system affect the control of speech? This issue was addressed in behavioral and Transcranial Magnetic Stimulation (TMS) experiments. In behavioral experiment 1, participants pronounced the syllable /da/ while observing (1) a hand grasping large and small objects with power and precision grasps, respectively, (2) a foot interacting…
ERIC Educational Resources Information Center
Blau, Vera; Reithler, Joel; van Atteveldt, Nienke; Seitz, Jochen; Gerretsen, Patty; Goebel, Rainer; Blomert, Leo
2010-01-01
Learning to associate auditory information of speech sounds with visual information of letters is a first and critical step for becoming a skilled reader in alphabetic languages. Nevertheless, it remains largely unknown which brain areas subserve the learning and automation of such associations. Here, we employ functional magnetic resonance…
Polur, Prasad D; Miller, Gerald E
2006-10-01
Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients requires a robust technique that can handle conditions of very high variability and limited training data. In this study, application of a 10 state ergodic hidden Markov model (HMM)/artificial neural network (ANN) hybrid structure for a dysarthric speech (isolated word) recognition system, intended to act as an assistive tool, was investigated. A small size vocabulary spoken by three cerebral palsy subjects was chosen. The effect of such a structure on the recognition rate of the system was investigated by comparing it with an ergodic hidden Markov model as a control tool. This was done in order to determine if this modified technique contributed to enhanced recognition of dysarthric speech. The speech was sampled at 11 kHz. Mel frequency cepstral coefficients were extracted from them using 15 ms frames and served as training input to the hybrid model setup. The subsequent results demonstrated that the hybrid model structure was quite robust in its ability to handle the large variability and non-conformity of dysarthric speech. The level of variability in input dysarthric speech patterns sometimes limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor impaired individuals holds sufficient promise.
The ``listener'' in the modeling of speech prosody
NASA Astrophysics Data System (ADS)
Kohler, Klaus J.
2004-05-01
Autosegmental-metrical modeling of speech prosody is principally speaker-oriented. The production of pitch patterns, in systematic lab speech experiments as well as in spontaneous speech corpora, is analyzed in f0 tracings, from which sequences of H(igh) and L(ow) are abstracted. The perceptual relevance of these pitch categories in the transmission from speakers to listeners is largely not conceptualized; thus their modeling in speech communication lacks an essential component. In the metalinguistic task of labeling speech data with the annotation system ToBI, the ``listener'' plays a subordinate role as well: H and L, being suggestive of signal values, are allocated with reference to f0 curves and little or no concern for perceptual classification by the trained labeler. The seriousness of this theoretical gap in the modeling of speech prosody is demonstrated by experimental data concerning f0-peak alignment. A number of papers in JASA have dealt with this topic from the point of synchronizing f0 with the vocal tract time course in acoustic output. However, perceptual experiments within the Kiel intonation model show that ``early,'' ``medial'' and ``late'' peak alignments need to be defined perceptually and that in doing so microprosodic variation has to be filtered out from the surface signal.
Speech intelligibility in complex acoustic environments in young children
NASA Astrophysics Data System (ADS)
Litovsky, Ruth
2003-04-01
While the auditory system undergoes tremendous maturation during the first few years of life, it has become clear that in complex scenarios when multiple sounds occur and when echoes are present, children's performance is significantly worse than their adult counterparts. The ability of children (3-7 years of age) to understand speech in a simulated multi-talker environment and to benefit from spatial separation of the target and competing sounds was investigated. In these studies, competing sources vary in number, location, and content (speech, modulated or unmodulated speech-shaped noise and time-reversed speech). The acoustic spaces were also varied in size and amount of reverberation. Finally, children with chronic otitis media who received binaural training were tested pre- and post-training on a subset of conditions. Results indicated the following. (1) Children experienced significantly more masking than adults, even in the simplest conditions tested. (2) When the target and competing sounds were spatially separated speech intelligibility improved, but the amount varied with age, type of competing sound, and number of competitors. (3) In a large reverberant classroom there was no benefit of spatial separation. (4) Binaural training improved speech intelligibility performance in children with otitis media. Future work includes similar studies in children with unilateral and bilateral cochlear implants. [Work supported by NIDCD, DRF, and NOHR.
Popova, Svetlana; Lange, Shannon; Burd, Larry; Shield, Kevin; Rehm, Jürgen
2014-12-01
This study, which is part of a large economic project on the overall burden and cost associated with Foetal Alcohol Spectrum Disorder (FASD) in Canada, estimated the cost of 1:1 speech-language interventions among children and youth with FASD for Canada in 2011. The number of children and youth with FASD and speech-language disorder(s) (SLD), the distribution of the level of severity, and the number of hours needed to treat were estimated using data from the available literature. 1:1 speech-language interventions were computed using the average cost per hour for speech-language pathologists. It was estimated that ˜ 37,928 children and youth with FASD had SLD in Canada in 2011. Using the most conservative approach, the annual cost of 1:1 speech-language interventions among children and youth with FASD is substantial, ranging from $72.5 million to $144.1 million Canadian dollars. Speech-language pathologists should be aware of the disproportionate number of children and youth with FASD who have SLD and the need for early identification to improve access to early intervention. Early identification and access to high quality services may have a role in decreasing the risk of developing the secondary disabilities and in reducing the economic burden of FASD on society.
Common cues to emotion in the dynamic facial expressions of speech and song.
Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline
2015-01-01
Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.
Brammer, Anthony J; Yu, Gongqiang; Bernstein, Eric R; Cherniack, Martin G; Peterson, Donald R; Tufts, Jennifer B
2014-08-01
An adaptive, delayless, subband feed-forward control structure is employed to improve the speech signal-to-noise ratio (SNR) in the communication channel of a circumaural headset/hearing protector (HPD) from 90 Hz to 11.3 kHz, and to provide active noise control (ANC) from 50 to 800 Hz to complement the passive attenuation of the HPD. The task involves optimizing the speech SNR for each communication channel subband, subject to limiting the maximum sound level at the ear, maintaining a speech SNR preferred by users, and reducing large inter-band gain differences to improve speech quality. The performance of a proof-of-concept device has been evaluated in a pseudo-diffuse sound field when worn by human subjects under conditions of environmental noise and speech that do not pose a risk to hearing, and by simulation for other conditions. For the environmental noises employed in this study, subband speech SNR control combined with subband ANC produced greater improvement in word scores than subband ANC alone, and improved the consistency of word scores across subjects. The simulation employed a subject-specific linear model, and predicted that word scores are maintained in excess of 90% for sound levels outside the HPD of up to ∼115 dBA.
Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications.
VanDam, Mark; Silbert, Noah H
2016-01-01
Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output.
Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications
2016-01-01
Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output. PMID:27529813
A survey of acoustic conditions in semi-open plan classrooms in the United Kingdom.
Greenland, Emma E; Shield, Bridget M
2011-09-01
This paper reports the results of a large scale, detailed acoustic survey of 42 open plan classrooms of varying design in the UK each of which contained between 2 and 14 teaching areas or classbases. The objective survey procedure, which was designed specifically for use in open plan classrooms, is described. The acoustic measurements relating to speech intelligibility within a classbase, including ambient noise level, intrusive noise level, speech to noise ratio, speech transmission index, and reverberation time, are presented. The effects on speech intelligibility of critical physical design variables, such as the number of classbases within an open plan unit and the selection of acoustic finishes for control of reverberation, are examined. This analysis enables limitations of open plan classrooms to be discussed and acoustic design guidelines to be developed to ensure good listening conditions. The types of teaching activity to provide adequate acoustic conditions, plus the speech intelligibility requirements of younger children, are also discussed. © 2011 Acoustical Society of America
Design and performance of an analysis-by-synthesis class of predictive speech coders
NASA Technical Reports Server (NTRS)
Rose, Richard C.; Barnwell, Thomas P., III
1990-01-01
The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.
The role of the supplementary motor area for speech and language processing.
Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann
2016-09-01
Apart from its function in speech motor control, the supplementary motor area (SMA) has largely been neglected in models of speech and language processing in the brain. The aim of this review paper is to summarize more recent work, suggesting that the SMA has various superordinate control functions during speech communication and language reception, which is particularly relevant in case of increased task demands. The SMA is subdivided into a posterior region serving predominantly motor-related functions (SMA proper) whereas the anterior part (pre-SMA) is involved in higher-order cognitive control mechanisms. In analogy to motor triggering functions of the SMA proper, the pre-SMA seems to manage procedural aspects of cognitive processing. These latter functions, among others, comprise attentional switching, ambiguity resolution, context integration, and coordination between procedural and declarative memory structures. Regarding language processing, this refers, for example, to the use of inner speech mechanisms during language encoding, but also to lexical disambiguation, syntax and prosody integration, and context-tracking. Copyright © 2016 Elsevier Ltd. All rights reserved.
JND measurements of the speech formants parameters and its implication in the LPC pole quantization
NASA Astrophysics Data System (ADS)
Orgad, Yaakov
1988-08-01
The inherent sensitivity of auditory perception is explicitly used with the objective of designing an efficient speech encoder. Speech can be modelled by a filter representing the vocal tract shape that is driven by an excitation signal representing glottal air flow. This work concentrates on the filter encoding problem, assuming that excitation signal encoding is optimal. Linear predictive coding (LPC) techniques were used to model a short speech segment by an all-pole filter; each pole was directly related to the speech formants. Measurements were made of the auditory just noticeable difference (JND) corresponding to the natural speech formants, with the LPC filter poles as the best candidates to represent the speech spectral envelope. The JND is the maximum precision required in speech quantization; it was defined on the basis of the shift of one pole parameter of a single frame of a speech segment, necessary to induce subjective perception of the distortion, with .75 probability. The average JND in LPC filter poles in natural speech was found to increase with increasing pole bandwidth and, to a lesser extent, frequency. The JND measurements showed a large spread of the residuals around the average values, indicating that inter-formant coupling and, perhaps, other, not yet fully understood, factors were not taken into account at this stage of the research. A future treatment should consider these factors. The average JNDs obtained in this work were used to design pole quantization tables for speech coding and provided a better bit-rate than the standard quantizer of reflection coefficient; a 30-bits-per-frame pole quantizer yielded a speech quality similar to that obtained with a standard 41-bits-per-frame reflection coefficient quantizer. Owing to the complexity of the numerical root extraction system, the practical implementation of the pole quantization approach remains to be proved.
Hodgetts, Sophie; Gallagher, Peter; Stow, Daniel; Ferrier, I Nicol; O'Brien, John T
2017-03-01
Depression is known to negatively impact social functioning, with patients commonly reporting difficulties maintaining social relationships. Moreover, a large body of evidence suggests poor social functioning is not only present in depression but that social functioning is an important factor in illness course and outcome. In addition, good social relationships can play a protective role against the onset of depressive symptoms, particularly in late-life depression. However, the majority of research in this area has employed self-report measures of social function. This approach is problematic, as due to their reliance on memory, such measures are prone to error from the neurocognitive impairments of depression, as well as mood-congruent biases. Narrative review based on searches of the Web of Science and PubMed database(s) from the start of the databases, until the end of 2015. The present review provides an overview of the literature on social functioning in (late-life) depression and discusses the potential for new technologies to improve the measurement of social function in depressed older adults. In particular, the use of wearable technology to collect direct, objective measures of social activity, such as physical activity and speech, is considered. In order to develop a greater understanding of social functioning in late-life depression, future research should include the development and validation of more direct, objective measures in conjunction with subjective self-report measures. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Cleft audit protocol for speech (CAPS-A): a comprehensive training package for speech analysis.
Sell, D; John, A; Harding-Bell, A; Sweeney, T; Hegarty, F; Freeman, J
2009-01-01
The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been paid to this issue. To design, execute, and evaluate a training programme for speech and language therapists on the systematic and reliable use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A), addressing issues of standardized speech samples, data acquisition, recording, playback, and listening guidelines. Thirty-six specialist speech and language therapists undertook the training programme over four days. This consisted of two days' training on the CAPS-A tool followed by a third day, making independent ratings and transcriptions on ten new cases which had been previously recorded during routine audit data collection. This task was repeated on day 4, a minimum of one month later. Ratings were made using the CAPS-A record form with the CAPS-A definition table. An analysis was made of the speech and language therapists' CAPS-A ratings at occasion 1 and occasion 2 and the intra- and inter-rater reliability calculated. Trained therapists showed consistency in individual judgements on specific sections of the tool. Intraclass correlation coefficients were calculated for each section with good agreement on eight of 13 sections. There were only fair levels of agreement on anterior oral cleft speech characteristics, non-cleft errors/immaturities and voice. This was explained, at least in part, by their low prevalence which affects the calculation of the intraclass correlation coefficient statistic. Speech and language therapists benefited from training on the CAPS-A, focusing on specific aspects of speech using definitions of parameters and scalar points, in order to apply the tool systematically and reliably. Ratings are enhanced by ensuring a high degree of attention to the nature of the data, standardizing the speech sample, data acquisition, the listening process together with the use of high-quality recording and playback equipment. In addition, a method is proposed for maintaining listening skills following training as part of an individual's continuing education.
The Hierarchical Cortical Organization of Human Speech Processing
de Heer, Wendy A.; Huth, Alexander G.; Griffiths, Thomas L.
2017-01-01
Speech comprehension requires that the brain extract semantic meaning from the spectral features represented at the cochlea. To investigate this process, we performed an fMRI experiment in which five men and two women passively listened to several hours of natural narrative speech. We then used voxelwise modeling to predict BOLD responses based on three different feature spaces that represent the spectral, articulatory, and semantic properties of speech. The amount of variance explained by each feature space was then assessed using a separate validation dataset. Because some responses might be explained equally well by more than one feature space, we used a variance partitioning analysis to determine the fraction of the variance that was uniquely explained by each feature space. Consistent with previous studies, we found that speech comprehension involves hierarchical representations starting in primary auditory areas and moving laterally on the temporal lobe: spectral features are found in the core of A1, mixtures of spectral and articulatory in STG, mixtures of articulatory and semantic in STS, and semantic in STS and beyond. Our data also show that both hemispheres are equally and actively involved in speech perception and interpretation. Further, responses as early in the auditory hierarchy as in STS are more correlated with semantic than spectral representations. These results illustrate the importance of using natural speech in neurolinguistic research. Our methodology also provides an efficient way to simultaneously test multiple specific hypotheses about the representations of speech without using block designs and segmented or synthetic speech. SIGNIFICANCE STATEMENT To investigate the processing steps performed by the human brain to transform natural speech sound into meaningful language, we used models based on a hierarchical set of speech features to predict BOLD responses of individual voxels recorded in an fMRI experiment while subjects listened to natural speech. Both cerebral hemispheres were actively involved in speech processing in large and equal amounts. Also, the transformation from spectral features to semantic elements occurs early in the cortical speech-processing stream. Our experimental and analytical approaches are important alternatives and complements to standard approaches that use segmented speech and block designs, which report more laterality in speech processing and associated semantic processing to higher levels of cortex than reported here. PMID:28588065
Changes in Preference for Infant-Directed Speech in Low and Moderate Noise by 4.5- to 13-Month-Olds
ERIC Educational Resources Information Center
Newman, Rochelle S.; Hussain, Isma
2006-01-01
Although a large literature discusses infants' preference for infant-directed speech (IDS), few studies have examined how this preference might change over time or across listening situations. The work reported here compares infants' preference for IDS while listening in a quiet versus a noisy environment, and across 3 points in development: 4.5…
ERIC Educational Resources Information Center
Larsson, AnnaKarin; Schölin, Johnna; Mark, Hans; Jönsson, Radi; Persson, Christina
2017-01-01
Background: In the last decade, a large number of children with cleft lip and palate have been adopted to Sweden. A majority of the children were born in China and they usually arrive in Sweden with an unoperated palate. There is currently a lack of knowledge regarding speech and articulation development in this group of children, who also have to…
ERIC Educational Resources Information Center
Kurth, Ruth Justine; Kurth, Lila Mae
A study compared mothers' and fathers' speech patterns when speaking to preschool children, particularly utterance length, sentence types, and word frequencies. All of the children attended a nursery school with a student population of 136 in a large urban area in the Southwest. Volunteer subjects, 28 mothers and 28 fathers of 28 children who…
Magnotti, John F; Basu Mallick, Debshila; Feng, Guo; Zhou, Bin; Zhou, Wen; Beauchamp, Michael S
2015-09-01
Humans combine visual information from mouth movements with auditory information from the voice to recognize speech. A common method for assessing multisensory speech perception is the McGurk effect: When presented with particular pairings of incongruent auditory and visual speech syllables (e.g., the auditory speech sounds for "ba" dubbed onto the visual mouth movements for "ga"), individuals perceive a third syllable, distinct from the auditory and visual components. Chinese and American cultures differ in the prevalence of direct facial gaze and in the auditory structure of their languages, raising the possibility of cultural- and language-related group differences in the McGurk effect. There is no consensus in the literature about the existence of these group differences, with some studies reporting less McGurk effect in native Mandarin Chinese speakers than in English speakers and others reporting no difference. However, these studies sampled small numbers of participants tested with a small number of stimuli. Therefore, we collected data on the McGurk effect from large samples of Mandarin-speaking individuals from China and English-speaking individuals from the USA (total n = 307) viewing nine different stimuli. Averaged across participants and stimuli, we found similar frequencies of the McGurk effect between Chinese and American participants (48 vs. 44 %). In both groups, we observed a large range of frequencies both across participants (range from 0 to 100 %) and stimuli (15 to 83 %) with the main effect of culture and language accounting for only 0.3 % of the variance in the data. High individual variability in perception of the McGurk effect necessitates the use of large sample sizes to accurately estimate group differences.
The Role of Corticostriatal Systems in Speech Category Learning
Yi, Han-Gyol; Maddox, W. Todd; Mumford, Jeanette A.; Chandrasekaran, Bharath
2016-01-01
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. Keywords: category learning, fMRI, corticostriatal systems, speech, putamen PMID:25331600
A variable rate speech compressor for mobile applications
NASA Technical Reports Server (NTRS)
Yeldener, S.; Kondoz, A. M.; Evans, B. G.
1990-01-01
One of the most promising speech coder at the bit rate of 9.6 to 4.8 kbits/s is CELP. Code Excited Linear Prediction (CELP) has been dominating 9.6 to 4.8 kbits/s region during the past 3 to 4 years. Its set back however, is its expensive implementation. As an alternative to CELP, the Base-Band CELP (CELP-BB) was developed which produced good quality speech comparable to CELP and a single chip implementable complexity as reported previously. Its robustness was also improved to tolerate errors up to 1.0 pct. and maintain intelligibility up to 5.0 pct. and more. Although, CELP-BB produces good quality speech at around 4.8 kbits/s, it has a fundamental problem when updating the pitch filter memory. A sub-optimal solution is proposed for this problem. Below 4.8 kbits/s, however, CELP-BB suffers from noticeable quantization noise as a result of the large vector dimensions used. Efficient representation of speech below 4.8 kbits/s is reported by introducing Sinusoidal Transform Coding (STC) to represent the LPC excitation which is called Sine Wave Excited LPC (SWELP). In this case, natural sounding good quality synthetic speech is obtained at around 2.4 kbits/s.
What Part of "No" Do Children Not Understand? A Usage-Based Account of Multiword Negation
ERIC Educational Resources Information Center
Cameron-Faulkner, Thea; Lieven, Elena; Theakston, Anna
2007-01-01
The study investigates the development of English multiword negation, in particular the negation of zero marked verbs (e.g. "no sleep", "not see", "can't reach") from a usage-based perspective. The data was taken from a dense database consisting of the speech of an English-speaking child (Brian) aged 2;3-3;4 (MLU 2.05-3.1) and his mother. The…
Hantke, Simone; Weninger, Felix; Kurle, Richard; Ringeval, Fabien; Batliner, Anton; Mousa, Amr El-Desoky; Schuller, Björn
2016-01-01
We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient. PMID:27176486
On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common
Weninger, Felix; Eyben, Florian; Schuller, Björn W.; Mortillaro, Marcello; Scherer, Klaus R.
2013-01-01
Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of “the sound that something makes,” in order to evaluate the system’s auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects. PMID:23750144
Huttunen, Kerttu; Välimaa, Taina
2012-01-01
During the process of implantation, parents may have rather heterogeneous expectations and concerns about their child's development and the functioning of habilitation and education services. Their views on habilitation and education are important for building family-centred practices. We explored the perceptions of parents and speech and language therapists (SLTs) on the effects of implantation on the child and the family and on the quality of services provided. Their views were also compared. Parents and SLTs of 18 children filled out questionnaires containing open- and closed-ended questions at 6 months and annually 1-5 years after activation of the implant. Their responses were analysed mainly using data-based inductive content analysis. Positive experiences outnumbered negative ones in the responses of both the parents and the SLTs surveyed. The parents were particularly satisfied with the improvement in communication and expanded social life in the family. These were the most prevalent themes also raised by the SLTs. The parents were also satisfied with the organization and content of habilitation. Most of the negative experiences were related to arrangement of hospital visits and the usability and maintenance of speech processor technology. Some children did not receive enough speech and language therapy, and some of the parents were dissatisfied with educational services. The habilitation process had generally required parental efforts at an expected level. However, parents with a child with at least one concomitant problem experienced habilitation as more stressful than did other parents. Parents and SLTs had more positive than negative experiences with implantation. As the usability and maintenance of speech processor technology were often compromised, we urge implant centres to ensure sufficient personnel for technical maintenance. It is also important to promote services by providing enough information and parental support. © 2011 Royal College of Speech & Language Therapists.
Accuracy of cochlear implant recipients in speech reception in the presence of background music.
Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Kliethermes, Stephanie; Driscoll, Virginia
2012-12-01
This study examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of 3 contrasting types of background music, and compared performance based upon listener groups: CI recipients using conventional long-electrode devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing adults. We tested 154 long-electrode CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 normal-hearing adults on closed-set recognition of spondees presented in 3 contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Signal-to-noise ratio thresholds for speech in music were examined in relation to measures of speech recognition in background noise and multitalker babble, pitch perception, and music experience. The signal-to-noise ratio thresholds for speech in music varied as a function of category of background music, group membership (long-electrode, Hybrid, normal-hearing), and age. The thresholds for speech in background music were significantly correlated with measures of pitch perception and thresholds for speech in background noise; auditory status was an important predictor. Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music.
Neural Tuning to Low-Level Features of Speech throughout the Perisylvian Cortex.
Berezutskaya, Julia; Freudenburg, Zachary V; Güçlü, Umut; van Gerven, Marcel A J; Ramsey, Nick F
2017-08-16
Despite a large body of research, we continue to lack a detailed account of how auditory processing of continuous speech unfolds in the human brain. Previous research showed the propagation of low-level acoustic features of speech from posterior superior temporal gyrus toward anterior superior temporal gyrus in the human brain (Hullett et al., 2016). In this study, we investigate what happens to these neural representations past the superior temporal gyrus and how they engage higher-level language processing areas such as inferior frontal gyrus. We used low-level sound features to model neural responses to speech outside of the primary auditory cortex. Two complementary imaging techniques were used with human participants (both males and females): electrocorticography (ECoG) and fMRI. Both imaging techniques showed tuning of the perisylvian cortex to low-level speech features. With ECoG, we found evidence of propagation of the temporal features of speech sounds along the ventral pathway of language processing in the brain toward inferior frontal gyrus. Increasingly coarse temporal features of speech spreading from posterior superior temporal cortex toward inferior frontal gyrus were associated with linguistic features such as voice onset time, duration of the formant transitions, and phoneme, syllable, and word boundaries. The present findings provide the groundwork for a comprehensive bottom-up account of speech comprehension in the human brain. SIGNIFICANCE STATEMENT We know that, during natural speech comprehension, a broad network of perisylvian cortical regions is involved in sound and language processing. Here, we investigated the tuning to low-level sound features within these regions using neural responses to a short feature film. We also looked at whether the tuning organization along these brain regions showed any parallel to the hierarchy of language structures in continuous speech. Our results show that low-level speech features propagate throughout the perisylvian cortex and potentially contribute to the emergence of "coarse" speech representations in inferior frontal gyrus typically associated with high-level language processing. These findings add to the previous work on auditory processing and underline a distinctive role of inferior frontal gyrus in natural speech comprehension. Copyright © 2017 the authors 0270-6474/17/377906-15$15.00/0.
Content-based video indexing and searching with wavelet transformation
NASA Astrophysics Data System (ADS)
Stumpf, Florian; Al-Jawad, Naseer; Du, Hongbo; Jassim, Sabah
2006-05-01
Biometric databases form an essential tool in the fight against international terrorism, organised crime and fraud. Various government and law enforcement agencies have their own biometric databases consisting of combination of fingerprints, Iris codes, face images/videos and speech records for an increasing number of persons. In many cases personal data linked to biometric records are incomplete and/or inaccurate. Besides, biometric data in different databases for the same individual may be recorded with different personal details. Following the recent terrorist atrocities, law enforcing agencies collaborate more than before and have greater reliance on database sharing. In such an environment, reliable biometric-based identification must not only determine who you are but also who else you are. In this paper we propose a compact content-based video signature and indexing scheme that can facilitate retrieval of multiple records in face biometric databases that belong to the same person even if their associated personal data are inconsistent. We shall assess the performance of our system using a benchmark audio visual face biometric database that has multiple videos for each subject but with different identity claims. We shall demonstrate that retrieval of relatively small number of videos that are nearest, in terms of the proposed index, to any video in the database results in significant proportion of that individual biometric data.
A laboratory study for assessing speech privacy in a simulated open-plan office.
Lee, P J; Jeon, J Y
2014-06-01
The aim of this study is to assess speech privacy in open-plan office using two recently introduced single-number quantities: the spatial decay rate of speech, DL(2,S) [dB], and the A-weighted sound pressure level of speech at a distance of 4 m, L(p,A,S,4) m [dB]. Open-plan offices were modeled using a DL(2,S) of 4, 8, and 12 dB, and L(p,A,S,4) m was changed in three steps, from 43 to 57 dB.Auditory experiments were conducted at three locations with source–receiver distances of 8, 16, and 24 m, while background noise level was fixed at 30 dBA.A total of 20 subjects were asked to rate the speech intelligibility and listening difficulty of 240 Korean sentences in such surroundings. The speech intelligibility scores were not affected by DL(2,S) or L(p,A,S,4) m at a source–receiver distance of 8 m; however, listening difficulty ratings were significantly changed with increasing DL(2,S) and L(p,A,S,4) m values. At other locations, the influences of DL(2,S) and L(p,A,S,4) m on speech intelligibility and listening difficulty ratings were significant. It was also found that the speech intelligibility scores and listening difficulty ratings were considerably changed with increasing the distraction distance (r(D)). Furthermore, listening difficulty is more sensitive to variations in DL(2,S) and L(p,A,S,4) m than intelligibility scores for sound fields with high speech transmission performances. The recently introduced single-number quantities in the ISO standard, based on the spatial distribution of sound pressure level, were associated with speech privacy in an open-plan office. The results support single-number quantities being suitable to assess speech privacy, mainly at large distances. This new information can be considered when designing open-plan offices and making acoustic guidelines of open-plan offices.
Budde, Kristin S.; Barron, Daniel S.; Fox, Peter T.
2015-01-01
Developmental stuttering is a speech disorder most likely due to a heritable form of developmental dysmyelination impairing the function of the speech-motor system. Speech-induced brain-activation patterns in persons who stutter (PWS) are anomalous in various ways; the consistency of these aberrant patterns is a matter of ongoing debate. Here, we present a hierarchical series of coordinate-based meta-analyses addressing this issue. Two tiers of meta-analyses were performed on a 17-paper dataset (202 PWS; 167 fluent controls). Four large-scale (top-tier) meta-analyses were performed, two for each subject group (PWS and controls). These analyses robustly confirmed the regional effects previously postulated as “neural signatures of stuttering” (Brown 2005) and extended this designation to additional regions. Two smaller-scale (lower-tier) meta-analyses refined the interpretation of the large-scale analyses: 1) a between-group contrast targeting differences between PWS and controls (stuttering trait); and 2) a within-group contrast (PWS only) of stuttering with induced fluency (stuttering state). PMID:25463820
Auditory processing disorders: an update for speech-language pathologists.
DeBonis, David A; Moncrieff, Deborah
2008-02-01
Unanswered questions regarding the nature of auditory processing disorders (APDs), how best to identify at-risk students, how best to diagnose and differentiate APDs from other disorders, and concerns about the lack of valid treatments have resulted in ongoing confusion and skepticism about the diagnostic validity of this label. This poses challenges for speech-language pathologists (SLPs) who are working with school-age children and whose scope of practice includes APD screening and intervention. The purpose of this article is to address some of the questions commonly asked by SLPs regarding APDs in school-age children. This article is also intended to serve as a resource for SLPs to be used in deciding what role they will or will not play with respect to APDs in school-age children. The methodology used in this article included a computerized database review of the latest published information on APD, with an emphasis on the work of established researchers and expert panels, including articles from the American Speech-Language-Hearing Association and the American Academy of Audiology. The article concludes with the authors' recommendations for continued research and their views on the appropriate role of the SLP in performing careful screening, making referrals, and supporting intervention.
NASA Astrophysics Data System (ADS)
Morishima, Shigeo; Nakamura, Satoshi
2004-12-01
We introduce a multimodal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion by synchronizing it to the translated speech. This system also introduces both a face synthesis technique that can generate any viseme lip shape and a face tracking technique that can estimate the original position and rotation of a speaker's face in an image sequence. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a 3D wire-frame model that is adaptable to any speaker. Our approach provides translated image synthesis with an extremely small database. The tracking motion of the face from a video image is performed by template matching. In this system, the translation and rotation of the face are detected by using a 3D personal face model whose texture is captured from a video frame. We also propose a method to customize the personal face model by using our GUI tool. By combining these techniques and the translated voice synthesis technique, an automatic multimodal translation can be achieved that is suitable for video mail or automatic dubbing systems into other languages.
The use of conjunctions by children with typical language development.
Glória, Yasmin Alves Leão; Hanauer, Letícia Pessota; Wiethan, Fernanda Marafiga; Nóro, Letícia Arruda; Mota, Helena Bolli
2016-07-04
To investigate the use of conjunctions in the spontaneous speech of three years old children with typical language development, who live in Santa Maria - RS. 45 children, aged 3:0;0 - 3:11;29 (years:months;days) from the database of the Center for the Study of Language and Speech (CELF) participated of this study. The spontaneous speech of each child was transcribed and followed by analysis of the samples to identify the types of conjunctions for each age group. The samples were statistically analyzed using the R software that allowed the evaluation of the number and type of conjunctions used in each age group by comparing them with each other. The data indicated that the higher the age of the child, the greater the number of types of conjunctions used by them. The comparison between age groups showed significant differences when comparing the average number of conjunctions per age group, as well as for additive conjunctions and subordinating conjunctions. At age of three the children begin to develop the grammatical use of conjunctions, early appearing additive, adversative and explanatory coordinating conjunctions, and at 3:6 they are able to use the most complex conjunctions, as subordinating conjunctions.
Rieger, J M; Zalmanowitz, J G; Wolfaardt, J F
2006-07-01
The use of radiation therapy and/or chemotherapy in advanced head and neck cancer is increasing in popularity, driven by the notion that sparing the organs of speech and swallowing from surgical resection will also spare function. This critical review of the literature considered functional outcomes after organ preservation to assess the impact of such treatment on speech, swallowing and quality of life in patients with head and neck cancer. Literature searches were conducted on several library databases. A total of 50 relevant articles were identified and found to meet the inclusion criteria specified a priori. The majority of reports suggested that organ preservation techniques have the potential to result in swallowing disorders, often related to dysmotility of the oropharyngeal and laryngeal structures, and resulting in frequent episodes of aspiration. This may lead to the need for enteral feeding in the short term for some patients while, in others, this need is life long. Speech does not appear to be affected to the same degree as swallowing. These results suggest that organ preservation does not translate into function preservation for all patients with head and neck cancer.
van den Broek, Egon L
2004-01-01
The voice embodies three sources of information: speech, the identity, and the emotional state of the speaker (i.e., emotional prosody). The latter feature is resembled by the variability of the F0 (also named fundamental frequency of pitch) (SD F0). To extract this feature, Emotional Prosody Measurement (EPM) was developed, which consists of 1) speech recording, 2) removal of speckle noise, 3) a Fourier Transform to extract the F0-signal, and 4) the determination of SD F0. After a pilot study in which six participants mimicked emotions by their voice, the core experiment was conducted to see whether EPM is successful. Twenty-five patients suffering from a panic disorder with agoraphobia participated. Two methods (story-telling and reliving) were used to trigger anxiety and were compared with comparable but more relaxed conditions. This resulted in a unique database of speech samples that was used to compare the EPM with the Subjective Unit of Distress to validate it as measure for anxiety/stress. The experimental manipulation of anxiety proved to be successful and EPM proved to be a successful evaluation method for psychological therapy effectiveness.
Speaker Recognition Through NLP and CWT Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown-VanHoozer, S.A.; Kercel, S.W.; Tucker, R.W.
The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the "large population" problem by seeking two completely different kinds of characterizing features. These features are he techniques of Neuro-Linguistic Programming (NLP) and the continuous waveletmore » transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less
A speech and psychological profile of treatment-seeking adolescents who stutter.
Iverach, Lisa; Lowe, Robyn; Jones, Mark; O'Brian, Susan; Menzies, Ross G; Packman, Ann; Onslow, Mark
2017-03-01
The purpose of this study was to evaluate the relationship between stuttering severity, psychological functioning, and overall impact of stuttering, in a large sample of adolescents who stutter. Participants were 102 adolescents (11-17 years) seeking speech treatment for stuttering, including 86 boys and 16 girls, classified into younger (11-14 years, n=57) and older (15-17 years, n=45) adolescents. Linear regression models were used to evaluate the relationship between speech and psychological variables and overall impact of stuttering. The impact of stuttering during adolescence is influenced by a complex interplay of speech and psychological variables. Anxiety and depression scores fell within normal limits. However, higher self-reported stuttering severity predicted higher anxiety and internalizing problems. Boys reported externalizing problems-aggression, rule-breaking-in the clinical range, and girls reported total problems in the borderline-clinical range. Overall, higher scores on measures of anxiety, stuttering severity, and speech dissatisfaction predicted a more negative overall impact of stuttering. To our knowledge, this is the largest cohort study of adolescents who stutter. Higher stuttering severity, speech dissatisfaction, and anxiety predicted a more negative overall impact of stuttering, indicating the importance of carefully managing the speech and psychological needs of adolescents who stutter. Further research is needed to understand the relationship between stuttering and externalizing problems for adolescent boys who stutter. Copyright © 2016. Published by Elsevier Inc.
Speech pattern improvement following gingivectomy of excess palatal tissue.
Holtzclaw, Dan; Toscano, Nicholas
2008-10-01
Speech disruption secondary to excessive gingival tissue has received scant attention in periodontal literature. Although a few articles have addressed the causes of this condition, documentation and scientific explanation of treatment outcomes are virtually non-existent. This case report describes speech pattern improvements secondary to periodontal surgery and provides a concise review of linguistic and phonetic literature pertinent to the case. A 21-year-old white female with a history of gingival abscesses secondary to excessive palatal tissue presented for treatment. Bilateral gingivectomies of palatal tissues were performed with inverse bevel incisions extending distally from teeth #5 and #12 to the maxillary tuberosities, and large wedges of epithelium/connective tissue were excised. Within the first month of the surgery, the patient noted "changes in the manner in which her tongue contacted the roof of her mouth" and "changes in her speech." Further anecdotal investigation revealed the patient's enunciation of sounds such as "s," "sh," and "k" was greatly improved following the gingivectomy procedure. Palatometric research clearly demonstrates that the tongue has intimate contact with the lateral aspects of the posterior palate during speech. Gingival excess in this and other palatal locations has the potential to alter linguopalatal contact patterns and disrupt normal speech patterns. Surgical correction of this condition via excisional procedures may improve linguopalatal contact patterns which, in turn, may lead to improved patient speech.
Intentional Voice Command Detection for Trigger-Free Speech Interface
NASA Astrophysics Data System (ADS)
Obuchi, Yasunari; Sumiyoshi, Takashi
In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
NASA Astrophysics Data System (ADS)
Maskeliunas, Rytis; Rudzionis, Vytautas
2011-06-01
In recent years various commercial speech recognizers have become available. These recognizers provide the possibility to develop applications incorporating various speech recognition techniques easily and quickly. All of these commercial recognizers are typically targeted to widely spoken languages having large market potential; however, it may be possible to adapt available commercial recognizers for use in environments where less widely spoken languages are used. Since most commercial recognition engines are closed systems the single avenue for the adaptation is to try set ways for the selection of proper phonetic transcription methods between the two languages. This paper deals with the methods to find the phonetic transcriptions for Lithuanian voice commands to be recognized using English speech engines. The experimental evaluation showed that it is possible to find phonetic transcriptions that will enable the recognition of Lithuanian voice commands with recognition accuracy of over 90%.
Functional outcome after total and subtotal glossectomy with free flap reconstruction.
Yanai, Chie; Kikutani, Takesi; Adachi, Masatosi; Thoren, Hanna; Suzuki, Munekazu; Iizuka, Tateyuki
2008-07-01
The aim of this study was to evaluate postoperative oral functions of patients who had undergone total or subtotal (75%) glossectomy with preservation of the larynx for oral squamous cell carcinomas. Speech intelligibility and swallowing capacity of 17 patients who had been treated between 1992 and 2002 were scored and classified using standard protocols 6 to 36 months postoperatively. The outcomes were finally rated as good, acceptable, or poor. The 4-year disease-specific survival rate was 64%. Speech intelligibility and swallowing capacity were satisfactory (acceptable or good) in 82.3%. Only 3 patients were still dependent on tube feeding. Good speech perceptibility did not always go together with normal diet tolerance, however. Our satisfactory results are attributable to the use of large, voluminous soft tissue flaps for reconstruction, and to the instigation of postoperative swallowing and speech therapy on a routine basis and at an early juncture.
Conceptual clusters in figurative language production.
Corts, Daniel P; Meyers, Kristina
2002-07-01
Although most prior research on figurative language examines comprehension, several recent studies on the production of such language have proved to be informative. One of the most noticeable traits of figurative language production is that it is produced at a somewhat random rate with occasional bursts of highly figurative speech (e.g., Corts & Pollio, 1999). The present article seeks to extend these findings by observing production during speech that involves a very high base rate of figurative language, making statistically defined bursts difficult to detect. In an analysis of three Baptist sermons, burst-like clusters of figurative language were identified. Further study indicated that these clusters largely involve a central root metaphor that represents the topic under consideration. An interaction of the coherence, along with a conceptual understanding of a topic and the relative importance of the topic to the purpose of the speech, is offered as the most likely explanation for the clustering of figurative language in natural speech.
Goldrick, Matthew; Keshet, Joseph; Gustafson, Erin; Heller, Jordana; Needle, Jeremy
2016-04-01
Traces of the cognitive mechanisms underlying speaking can be found within subtle variations in how we pronounce sounds. While speech errors have traditionally been seen as categorical substitutions of one sound for another, acoustic/articulatory analyses show they partially reflect the intended sound. When "pig" is mispronounced as "big," the resulting /b/ sound differs from correct productions of "big," moving towards intended "pig"-revealing the role of graded sound representations in speech production. Investigating the origins of such phenomena requires detailed estimation of speech sound distributions; this has been hampered by reliance on subjective, labor-intensive manual annotation. Computational methods can address these issues by providing for objective, automatic measurements. We develop a novel high-precision computational approach, based on a set of machine learning algorithms, for measurement of elicited speech. The algorithms are trained on existing manually labeled data to detect and locate linguistically relevant acoustic properties with high accuracy. Our approach is robust, is designed to handle mis-productions, and overall matches the performance of expert coders. It allows us to analyze a very large dataset of speech errors (containing far more errors than the total in the existing literature), illuminating properties of speech sound distributions previously impossible to reliably observe. We argue that this provides novel evidence that two sources both contribute to deviations in speech errors: planning processes specifying the targets of articulation and articulatory processes specifying the motor movements that execute this plan. These findings illustrate how a much richer picture of speech provides an opportunity to gain novel insights into language processing. Copyright © 2016 Elsevier B.V. All rights reserved.
Visual activity predicts auditory recovery from deafness after adult cochlear implantation.
Strelnikov, Kuzma; Rouger, Julien; Demonet, Jean-François; Lagleyre, Sebastien; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal
2013-12-01
Modern cochlear implantation technologies allow deaf patients to understand auditory speech; however, the implants deliver only a coarse auditory input and patients must use long-term adaptive processes to achieve coherent percepts. In adults with post-lingual deafness, the high progress of speech recovery is observed during the first year after cochlear implantation, but there is a large range of variability in the level of cochlear implant outcomes and the temporal evolution of recovery. It has been proposed that when profoundly deaf subjects receive a cochlear implant, the visual cross-modal reorganization of the brain is deleterious for auditory speech recovery. We tested this hypothesis in post-lingually deaf adults by analysing whether brain activity shortly after implantation correlated with the level of auditory recovery 6 months later. Based on brain activity induced by a speech-processing task, we found strong positive correlations in areas outside the auditory cortex. The highest positive correlations were found in the occipital cortex involved in visual processing, as well as in the posterior-temporal cortex known for audio-visual integration. The other area, which positively correlated with auditory speech recovery, was localized in the left inferior frontal area known for speech processing. Our results demonstrate that the visual modality's functional level is related to the proficiency level of auditory recovery. Based on the positive correlation of visual activity with auditory speech recovery, we suggest that visual modality may facilitate the perception of the word's auditory counterpart in communicative situations. The link demonstrated between visual activity and auditory speech perception indicates that visuoauditory synergy is crucial for cross-modal plasticity and fostering speech-comprehension recovery in adult cochlear-implanted deaf patients.
Acoustic foundations of the speech-to-song illusion.
Tierney, Adam; Patel, Aniruddh D; Breen, Mara
2018-06-01
In the "speech-to-song illusion," certain spoken phrases are heard as highly song-like when isolated from context and repeated. This phenomenon occurs to a greater degree for some stimuli than for others, suggesting that particular cues prompt listeners to perceive a spoken phrase as song. Here we investigated the nature of these cues across four experiments. In Experiment 1, participants were asked to rate how song-like spoken phrases were after each of eight repetitions. Initial ratings were correlated with the consistency of an underlying beat and within-syllable pitch slope, while rating change was linked to beat consistency, within-syllable pitch slope, and melodic structure. In Experiment 2, the within-syllable pitch slope of the stimuli was manipulated, and this manipulation changed the extent to which participants heard certain stimuli as more musical than others. In Experiment 3, the extent to which the pitch sequences of a phrase fit a computational model of melodic structure was altered, but this manipulation did not have a significant effect on musicality ratings. In Experiment 4, the consistency of intersyllable timing was manipulated, but this manipulation did not have an effect on the change in perceived musicality after repetition. Our methods provide a new way of studying the causal role of specific acoustic features in the speech-to-song illusion via subtle acoustic manipulations of speech, and show that listeners can rapidly (and implicitly) assess the degree to which nonmusical stimuli contain musical structure. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Automated Depression Analysis Using Convolutional Neural Networks from Speech.
He, Lang; Cao, Cui
2018-05-28
To help clinicians to efficiently diagnose the severity of a person's depression, the affective computing community and the artificial intelligence field have shown a growing interest in designing automated systems. The speech features have useful information for the diagnosis of depression. However, manually designing and domain knowledge are still important for the selection of the feature, which makes the process labor consuming and subjective. In recent years, deep-learned features based on neural networks have shown superior performance to hand-crafted features in various areas. In this paper, to overcome the difficulties mentioned above, we propose a combination of hand-crafted and deep-learned features which can effectively measure the severity of depression from speech. In the proposed method, Deep Convolutional Neural Networks (DCNN) are firstly built to learn deep-learned features from spectrograms and raw speech waveforms. Then we manually extract the state-of-the-art texture descriptors named median robust extended local binary patterns (MRELBP) from spectrograms. To capture the complementary information within the hand-crafted features and deep-learned features, we propose joint fine-tuning layers to combine the raw and spectrogram DCNN to boost the depression recognition performance. Moreover, to address the problems with small samples, a data augmentation method was proposed. Experiments conducted on AVEC2013 and AVEC2014 depression databases show that our approach is robust and effective for the diagnosis of depression when compared to state-of-the-art audio-based methods. Copyright © 2018. Published by Elsevier Inc.
Turnover and intent to leave among speech pathologists.
McLaughlin, Emma G H; Adamson, Barbara J; Lincoln, Michelle A; Pallant, Julie F; Cooper, Cary L
2010-05-01
Sound, large scale and systematic research into why health professionals want to leave their jobs is needed. This study used psychometrically-sound tools and logistic regression analyses to determine why Australian speech pathologists were intending to leave their jobs or the profession. Based on data from 620 questionnaires, several variables were found to be significantly related to intent to leave. The speech pathologists intending to look for a new job were more likely to be under 34 years of age, and perceive low levels of job security and benefits of the profession. Those intending to leave the profession were more likely to spend greater than half their time at work on administrative duties, have a higher negative affect score, not have children under 18 years of age, and perceive that speech pathology did not offer benefits that met their professional needs. The findings of this study provide the first evidence regarding the reasons for turnover and attrition in the Australian speech pathology workforce, and can inform the development of strategies to retain a skilled and experienced allied health workforce.
A scoping review of Australian allied health research in ehealth.
Iacono, Teresa; Stagg, Kellie; Pearce, Natalie; Hulme Chambers, Alana
2016-10-04
Uptake of e-health, the use of information communication technologies (ICT) for health service delivery, in allied health appears to be lagging behind other health care areas, despite offering the potential to address problems with service access by rural and remote Australians. The aim of the study was to conduct a scoping review of studies into the application of or attitudes towards ehealth amongst allied health professionals conducted in Australia. Studies meeting inclusion criteria published from January 2004 to June 2015 were reviewed. Professions included were audiology, dietetics, exercise physiology, occupational therapy, physiotherapy, podiatry, social work, and speech pathology. Terms for these professions and forms of ehealth were combined in databases of CINAHL (EBSCO), Cochrane Library, PsycINFO (1806 - Ovid), MEDLINE (Ovid) and AMED (Ovid). Forty-four studies meeting inclusion criteria were summarised. They were either trials of aspects of ehealth service delivery, or clinician and/or client use of and attitudes towards ehealth. Trials of ehealth were largely from two research groups located at the Universities of Sydney and Queensland; most involved speech pathology and physiotherapy. Assessments through ehealth and intervention outcomes through ehealth were comparable with face-to-face delivery. Clinicians used ICT mostly for managing their work and for professional development, but were reticent about its use in service delivery, which contrasted with the more positive attitudes and experiences of clients. The potential of ehealth to address allied health needs of Australians living in rural and remote Australia appears unrealised. Clinicians may need to embrace ehealth as a means to radicalise practice, rather than replicate existing practices through a different mode of delivery.
1988-09-01
Group Subgroup Command and control; Computational linguistics; expert system voice recognition; man- machine interface; U.S. Government 19 Abstract...simulates the characteristics of FRESH on a smaller scale. This study assisted NOSC in developing a voice-recognition, man- machine interface that could...scale. This study assisted NOSC in developing a voice-recogni- tion, man- machine interface that could be used with TONE and upgraded at a later date
NASA Astrophysics Data System (ADS)
Lecun, Yann; Bengio, Yoshua; Hinton, Geoffrey
2015-05-01
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
2015-05-28
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Speech perception in noise with a harmonic complex excited vocoder.
Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Ihlefeld, Antje; Litovsky, Ruth Y
2014-04-01
A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.
Toward a dual-learning systems model of speech category learning
Chandrasekaran, Bharath; Koslov, Seth R.; Maddox, W. T.
2014-01-01
More than two decades of work in vision posits the existence of dual-learning systems of category learning. The reflective system uses working memory to develop and test rules for classifying in an explicit fashion, while the reflexive system operates by implicitly associating perception with actions that lead to reinforcement. Dual-learning systems models hypothesize that in learning natural categories, learners initially use the reflective system and, with practice, transfer control to the reflexive system. The role of reflective and reflexive systems in auditory category learning and more specifically in speech category learning has not been systematically examined. In this article, we describe a neurobiologically constrained dual-learning systems theoretical framework that is currently being developed in speech category learning and review recent applications of this framework. Using behavioral and computational modeling approaches, we provide evidence that speech category learning is predominantly mediated by the reflexive learning system. In one application, we explore the effects of normal aging on non-speech and speech category learning. Prominently, we find a large age-related deficit in speech learning. The computational modeling suggests that older adults are less likely to transition from simple, reflective, unidimensional rules to more complex, reflexive, multi-dimensional rules. In a second application, we summarize a recent study examining auditory category learning in individuals with elevated depressive symptoms. We find a deficit in reflective-optimal and an enhancement in reflexive-optimal auditory category learning. Interestingly, individuals with elevated depressive symptoms also show an advantage in learning speech categories. We end with a brief summary and description of a number of future directions. PMID:25132827
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers.
Liu, Fang; Chan, Alice H D; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C M
2016-07-01
This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production.
Restoring speech perception with cochlear implants by spanning defective electrode contacts.
Frijns, Johan H M; Snel-Bongers, Jorien; Vellinga, Dirk; Schrage, Erik; Vanpoucke, Filiep J; Briaire, Jeroen J
2013-04-01
Even with six defective contacts, spanning can largely restore speech perception with the HiRes 120 speech processing strategy to the level supported by an intact electrode array. Moreover, the sound quality is not degraded. Previous studies have demonstrated reduced speech perception scores (SPS) with defective contacts in HiRes 120. This study investigated whether replacing defective contacts by spanning, i.e. current steering on non-adjacent contacts, is able to restore speech recognition to the level supported by an intact electrode array. Ten adult cochlear implant recipients (HiRes90K, HiFocus1J) with experience with HiRes 120 participated in this study. Three different defective electrode arrays were simulated (six separate defective contacts, three pairs or two triplets). The participants received three take-home strategies and were asked to evaluate the sound quality in five predefined listening conditions. After 3 weeks, SPS were evaluated with monosyllabic words in quiet and in speech-shaped background noise. The participants rated the sound quality equal for all take-home strategies. SPS with background noise were equal for all conditions tested. However, SPS in quiet (85% phonemes correct on average with the full array) decreased significantly with increasing spanning distance, with a 3% decrease for each spanned contact.
Schmidt-Naylor, Anna C; Saunders, Kathryn J; Brady, Nancy C
2017-05-17
We explored alphabet supplementation as an augmentative and alternative communication strategy for adults with minimal literacy. Study 1's goal was to teach onset-letter selection with spoken words and assess generalization to untaught words, demonstrating the alphabetic principle. Study 2 incorporated alphabet supplementation within a naming task and then assessed effects on speech intelligibility. Three men with intellectual disabilities (ID) and low speech intelligibility participated. Study 1 used a multiple-probe design, across three 20-word sets, to show that our computer-based training improved onset-letter selection. We also probed generalization to untrained words. Study 2 taught onset-letter selection for 30 new words chosen for functionality. Five listeners transcribed speech samples of the 30 words in 2 conditions: speech only and speech with alphabet supplementation. Across studies 1 and 2, participants demonstrated onset-letter selection for at least 90 words. Study 1 showed evidence of the alphabetic principle for some but not all word sets. In study 2, participants readily used alphabet supplementation, enabling listeners to understand twice as many words. This is the first demonstration of alphabet supplementation in individuals with ID and minimal literacy. The large number of words learned holds promise both for improving communication and providing a foundation for improved literacy.
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers
Liu, Fang; Chan, Alice H. D.; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C. M.
2016-01-01
This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production. PMID:27475178
Echoes of the spoken past: how auditory cortex hears context during speech perception
Skipper, Jeremy I.
2014-01-01
What do we hear when someone speaks and what does auditory cortex (AC) do with that sound? Given how meaningful speech is, it might be hypothesized that AC is most active when other people talk so that their productions get decoded. Here, neuroimaging meta-analyses show the opposite: AC is least active and sometimes deactivated when participants listened to meaningful speech compared to less meaningful sounds. Results are explained by an active hypothesis-and-test mechanism where speech production (SP) regions are neurally re-used to predict auditory objects associated with available context. By this model, more AC activity for less meaningful sounds occurs because predictions are less successful from context, requiring further hypotheses be tested. This also explains the large overlap of AC co-activity for less meaningful sounds with meta-analyses of SP. An experiment showed a similar pattern of results for non-verbal context. Specifically, words produced less activity in AC and SP regions when preceded by co-speech gestures that visually described those words compared to those words without gestures. Results collectively suggest that what we ‘hear’ during real-world speech perception may come more from the brain than our ears and that the function of AC is to confirm or deny internal predictions about the identity of sounds. PMID:25092665
Longitudinal changes in speech recognition in older persons.
Dubno, Judy R; Lee, Fu-Shing; Matthews, Lois J; Ahlstrom, Jayne B; Horwitz, Amy R; Mills, John H
2008-01-01
Recognition of isolated monosyllabic words in quiet and recognition of key words in low- and high-context sentences in babble were measured in a large sample of older persons enrolled in a longitudinal study of age-related hearing loss. Repeated measures were obtained yearly or every 2 to 3 years. To control for concurrent changes in pure-tone thresholds and speech levels, speech-recognition scores were adjusted using an importance-weighted speech-audibility metric (AI). Linear-regression slope estimated the rate of change in adjusted speech-recognition scores. Recognition of words in quiet declined significantly faster with age than predicted by declines in speech audibility. As subjects aged, observed scores deviated increasingly from AI-predicted scores, but this effect did not accelerate with age. Rate of decline in word recognition was significantly faster for females than males and for females with high serum progesterone levels, whereas noise history had no effect. Rate of decline did not accelerate with age but increased with degree of hearing loss, suggesting that with more severe injury to the auditory system, impairments to auditory function other than reduced audibility resulted in faster declines in word recognition as subjects aged. Recognition of key words in low- and high-context sentences in babble did not decline significantly with age.
Quantitative application of the primary progressive aphasia consensus criteria.
Wicklund, Meredith R; Duffy, Joseph R; Strand, Edythe A; Machulda, Mary M; Whitwell, Jennifer L; Josephs, Keith A
2014-04-01
To determine how well the consensus criteria could classify subjects with primary progressive aphasia (PPA) using a quantitative speech and language battery that matches the test descriptions provided by the consensus criteria. A total of 105 participants with a neurodegenerative speech and language disorder were prospectively recruited and underwent neurologic, neuropsychological, and speech and language testing and MRI in this case-control study. Twenty-one participants with apraxia of speech without aphasia served as controls. Select tests from the speech and language battery were chosen for application of consensus criteria and cutoffs were employed to determine syndromic classification. Hierarchical cluster analysis was used to examine participants who could not be classified. Of the 84 participants, 58 (69%) could be classified as agrammatic (27%), semantic (7%), or logopenic (35%) variants of PPA. The remaining 31% of participants could not be classified. Of the unclassifiable participants, 2 clusters were identified. The speech and language profile of the first cluster resembled mild logopenic PPA and the second cluster semantic PPA. Gray matter patterns of loss of these 2 clusters of unclassified participants also resembled mild logopenic and semantic variants. Quantitative application of consensus PPA criteria yields the 3 syndromic variants but leaves a large proportion unclassified. Therefore, the current consensus criteria need to be modified in order to improve sensitivity.
Morrison, Geoffrey Stewart; Enzinger, Ewald; Zhang, Cuiling
2016-12-01
Hicks et alii [Sci. Just. 55 (2015) 520-525. http://dx.doi.org/10.1016/j.scijus.2015.06.008] propose that forensic speech scientists not use the accent of the speaker of questioned identity to refine the relevant population. This proposal is based on a lack of understanding of the realities of forensic voice comparison. If it were implemented, it would make data-based forensic voice comparison analysis within the likelihood ratio framework virtually impossible. We argue that it would also lead forensic speech scientists to present invalid unreliable strength of evidence statements, and not allow them to conduct the tests that would make them aware of this problem. Copyright © 2016 The Chartered Society of Forensic Sciences. Published by Elsevier Ireland Ltd. All rights reserved.
Warlaumont, Anne S; Jarmulowicz, Linda
2012-11-01
Acquisition of regular inflectional suffixes is an integral part of grammatical development in English and delayed acquisition of certain inflectional suffixes is a hallmark of language impairment. We investigate the relationship between input frequency and grammatical suffix acquisition, analyzing 217 transcripts of mother-child (ages 1 ; 11-6 ; 9) conversations from the CHILDES database. Maternal suffix frequency correlates with previously reported rank orders of acquisition and with child suffix frequency. Percentages of children using a suffix are consistent with frequencies in caregiver speech. Although late talkers acquire suffixes later than typically developing children, order of acquisition is similar across populations. Furthermore, the third person singular and past tense verb suffixes, weaknesses for children with language impairment, are less frequent in caregiver speech than the plural noun suffix, a relative strength in language impairment. Similar findings hold across typical, SLI and late talker populations, suggesting that frequency plays a role in suffix acquisition.
Rehabilitation of language in expressive aphasias: a literature review.
da Fontoura, Denise Ren; Rodrigues, Jaqueline de Carvalho; Carneiro, Luciana Behs de Sá; Monção, Ana Maria; de Salles, Jerusa Fumagalli
2012-01-01
This paper reviews the methodological characteristics of studies on rehabilitation of expressive aphasia, describing the techniques of rehabilitation used. The databases Medline, Science Direct and PubMed were searched for relevant articles (January 1999 to December 2011) using the keywords Expressive / Broca / Nonfluent Aphasia, combined with Language or Speech Rehabilitation / Therapy / Intervention. A total of 56 articles were retrieved describing rehabilitation techniques, including 22 with a focus on lexical processing, 18 on syntax stimulation, seven with the aim of developing speech and nine with multiple foci. A variety of techniques and theoretical approaches are available, highlighting the heterogeneity of research in this area. This diversity can be justified by the uniqueness of patients' language deficits, making it difficult to generalize. In addition, there is a need to combine the formal measures of tests with measures of pragmatic and social skills of communication to determine the effect of rehabilitation on the patient's daily life.
Al-Nasheri, Ahmed; Muhammad, Ghulam; Alsulaiman, Mansour; Ali, Zulfiqar; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H; Bencherif, Mohamed A
2017-01-01
Automatic voice-pathology detection and classification systems may help clinicians to detect the existence of any voice pathologies and the type of pathology from which patients suffer in the early stages. The main aim of this paper is to investigate Multidimensional Voice Program (MDVP) parameters to automatically detect and classify the voice pathologies in multiple databases, and then to find out which parameters performed well in these two processes. Samples of the sustained vowel /a/ of normal and pathological voices were extracted from three different databases, which have three voice pathologies in common. The selected databases in this study represent three distinct languages: (1) the Arabic voice pathology database; (2) the Massachusetts Eye and Ear Infirmary database (English database); and (3) the Saarbruecken Voice Database (German database). A computerized speech lab program was used to extract MDVP parameters as features, and an acoustical analysis was performed. The Fisher discrimination ratio was applied to rank the parameters. A t test was performed to highlight any significant differences in the means of the normal and pathological samples. The experimental results demonstrate a clear difference in the performance of the MDVP parameters using these databases. The highly ranked parameters also differed from one database to another. The best accuracies were obtained by using the three highest ranked MDVP parameters arranged according to the Fisher discrimination ratio: these accuracies were 99.68%, 88.21%, and 72.53% for the Saarbruecken Voice Database, the Massachusetts Eye and Ear Infirmary database, and the Arabic voice pathology database, respectively. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Typical versus delayed speech onset influences verbal reporting of autistic interests.
Chiodo, Liliane; Majerus, Steve; Mottron, Laurent
2017-01-01
The distinction between autism and Asperger syndrome has been abandoned in the DSM-5. However, this clinical categorization largely overlaps with the presence or absence of a speech onset delay which is associated with clinical, cognitive, and neural differences. It is unknown whether these different speech development pathways and associated cognitive differences are involved in the heterogeneity of the restricted interests that characterize autistic adults. This study tested the hypothesis that speech onset delay, or conversely, early mastery of speech, orients the nature and verbal reporting of adult autistic interests. The occurrence of a priori defined descriptors for perceptual and thematic dimensions were determined, as well as the perceived function and benefits, in the response of autistic people to a semi-structured interview on their intense interests. The number of words, grammatical categories, and proportion of perceptual / thematic descriptors were computed and compared between groups by variance analyses. The participants comprised 40 autistic adults grouped according to the presence ( N = 20) or absence ( N = 20) of speech onset delay, as well as 20 non-autistic adults, also with intense interests, matched for non-verbal intelligence using Raven's Progressive Matrices. The overall nature, function, and benefit of intense interests were similar across autistic subgroups, and between autistic and non-autistic groups. However, autistic participants with a history of speech onset delay used more perceptual than thematic descriptors when talking about their interests, whereas the opposite was true for autistic individuals without speech onset delay. This finding remained significant after controlling for linguistic differences observed between the two groups. Verbal reporting, but not the nature or positive function, of intense interests differed between adult autistic individuals depending on their speech acquisition history: oral reporting of intense interests was characterized by perceptual dominance for autistic individuals with delayed speech onset and thematic dominance for those without. This may contribute to the heterogeneous presentation observed among autistic adults of normal intelligence.
Tao, Duoduo; Deng, Rui; Jiang, Ye; Galvin, John J; Fu, Qian-Jie; Chen, Bing
2014-01-01
To investigate how auditory working memory relates to speech perception performance by Mandarin-speaking cochlear implant (CI) users. Auditory working memory and speech perception was measured in Mandarin-speaking CI and normal-hearing (NH) participants. Working memory capacity was measured using forward digit span and backward digit span; working memory efficiency was measured using articulation rate. Speech perception was assessed with: (a) word-in-sentence recognition in quiet, (b) word-in-sentence recognition in speech-shaped steady noise at +5 dB signal-to-noise ratio, (c) Chinese disyllable recognition in quiet, (d) Chinese lexical tone recognition in quiet. Self-reported school rank was also collected regarding performance in schoolwork. There was large inter-subject variability in auditory working memory and speech performance for CI participants. Working memory and speech performance were significantly poorer for CI than for NH participants. All three working memory measures were strongly correlated with each other for both CI and NH participants. Partial correlation analyses were performed on the CI data while controlling for demographic variables. Working memory efficiency was significantly correlated only with sentence recognition in quiet when working memory capacity was partialled out. Working memory capacity was correlated with disyllable recognition and school rank when efficiency was partialled out. There was no correlation between working memory and lexical tone recognition in the present CI participants. Mandarin-speaking CI users experience significant deficits in auditory working memory and speech performance compared with NH listeners. The present data suggest that auditory working memory may contribute to CI users' difficulties in speech understanding. The present pattern of results with Mandarin-speaking CI users is consistent with previous auditory working memory studies with English-speaking CI users, suggesting that the lexical importance of voice pitch cues (albeit poorly coded by the CI) did not influence the relationship between working memory and speech perception.
a Comparative Analysis of Fluent and Cerebral Palsied Speech.
NASA Astrophysics Data System (ADS)
van Doorn, Janis Lee
Several features of the acoustic waveforms of fluent and cerebral palsied speech were compared, using six fluent and seven cerebral palsied subjects, with a major emphasis being placed on an investigation of the trajectories of the first three formants (vocal tract resonances). To provide an overall picture which included other acoustic features, fundamental frequency, intensity, speech timing (speech rate and syllable duration), and prevocalization (vocalization prior to initial stop consonants found in cerebral palsied speech) were also investigated. Measurements were made using repetitions of a test sentence which was chosen because it required large excursions of the speech articulators (lips, tongue and jaw), so that differences in the formant trajectories for the fluent and cerebral palsied speakers would be emphasized. The acoustic features were all extracted from the digitized speech waveform (10 kHz sampling rate): the fundamental frequency contours were derived manually, the intensity contours were measured using the signal covariance, speech rate and syllable durations were measured manually, as were the prevocalization durations, while the formant trajectories were derived from short time spectra which were calculated for each 10 ms of speech using linear prediction analysis. Differences which were found in the acoustic features can be summarized as follows. For cerebral palsied speakers, the fundamental frequency contours generally showed inappropriate exaggerated fluctuations, as did some of the intensity contours; the mean fundamental frequencies were either higher or the same as for the fluent subjects; speech rates were reduced, and syllable durations were longer; prevocalization was consistently present at the beginning of the test sentence; formant trajectories were found to have overall reduced frequency ranges, and to contain anomalous transitional features, but it is noteworthy that for any one cerebral palsied subject, the inappropriate trajectory pattern was generally reproducible. The anomalous transitional features took the form of (a) inappropriate transition patterns, (b) reduced frequency excursions, (c) increased transition durations, and (d) decreased maximum rates of frequency change.
Sub-Audible Speech Recognition Based upon Electromyographic Signals
NASA Technical Reports Server (NTRS)
Jorgensen, Charles C. (Inventor); Agabon, Shane T. (Inventor); Lee, Diana D. (Inventor)
2012-01-01
Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns ("SASPs") for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms ("SPTs") are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.
Lee, S L
2000-05-01
Nurses, therapists and case managers were spending too much time each week on the phone waiting to read patient reports to live transcriptionists who would then type the reports for storage in VNSNY's clinical management mainframe database. A speech recognition system helped solve the problem by providing the staff 24-hour access to an automated transcription service any day of the week. Nurses and case managers no longer wait in long queues to transmit patient reports or to retrieve information from the database. Everything is done automatically within minutes. VNSNY saved both time and money by updating its transcription strategy. Now nurses can spend more time with patients and less time on the phone transcribing notes. It also means fewer staff members are needed on weekends to do manual transcribing.
Technology transfer at NASA - A librarian's view
NASA Technical Reports Server (NTRS)
Buchan, Ronald L.
1991-01-01
The NASA programs, publications, and services promoting the transfer and utilization of aerospace technology developed by and for NASA are briefly surveyed. Topics addressed include the corporate sources of NASA technical information and its interest for corporate users of information services; the IAA and STAR abstract journals; NASA/RECON, NTIS, and the AIAA Aerospace Database; the RECON Space Commercialization file; the Computer Software Management and Information Center file; company information in the RECON database; and services to small businesses. Also discussed are the NASA publications Tech Briefs and Spinoff, the Industrial Applications Centers, NASA continuing bibliographies on management and patent abstracts (indexed using the NASA Thesaurus), the Index to NASA News Releases and Speeches, and the Aerospace Research Information Network (ARIN).
Central Presbycusis: A Review and Evaluation of the Evidence
Humes, Larry E.; Dubno, Judy R.; Gordon-Salant, Sandra; Lister, Jennifer J.; Cacace, Anthony T.; Cruickshanks, Karen J.; Gates, George A.; Wilson, Richard H.; Wingfield, Arthur
2018-01-01
Background The authors reviewed the evidence regarding the existence of age-related declines in central auditory processes and the consequences of any such declines for everyday communication. Purpose This report summarizes the review process and presents its findings. Data Collection and Analysis The authors reviewed 165 articles germane to central presbycusis. Of the 165 articles, 132 articles with a focus on human behavioral measures for either speech or nonspeech stimuli were selected for further analysis. Results For 76 smaller-scale studies of speech understanding in older adults reviewed, the following findings emerged: (1) the three most commonly studied behavioral measures were speech in competition, temporally distorted speech, and binaural speech perception (especially dichotic listening); (2) for speech in competition and temporally degraded speech, hearing loss proved to have a significant negative effect on performance in most of the laboratory studies; (3) significant negative effects of age, unconfounded by hearing loss, were observed in most of the studies of speech in competing speech, time-compressed speech, and binaural speech perception; and (4) the influence of cognitive processing on speech understanding has been examined much less frequently, but when included, significant positive associations with speech understanding were observed. For 36 smaller-scale studies of the perception of nonspeech stimuli by older adults reviewed, the following findings emerged: (1) the three most frequently studied behavioral measures were gap detection, temporal discrimination, and temporal-order discrimination or identification; (2) hearing loss was seldom a significant factor; and (3) negative effects of age were almost always observed. For 18 studies reviewed that made use of test batteries and medium-to-large sample sizes, the following findings emerged: (1) all studies included speech-based measures of auditory processing; (2) 4 of the 18 studies included nonspeech stimuli; (3) for the speech-based measures, monaural speech in a competing-speech background, dichotic speech, and monaural time-compressed speech were investigated most frequently; (4) the most frequently used tests were the Synthetic Sentence Identification (SSI) test with Ipsilateral Competing Message (ICM), the Dichotic Sentence Identification (DSI) test, and time-compressed speech; (5) many of these studies using speech-based measures reported significant effects of age, but most of these studies were confounded by declines in hearing, cognition, or both; (6) for nonspeech auditory-processing measures, the focus was on measures of temporal processing in all four studies; (7) effects of cognition on nonspeech measures of auditory processing have been studied less frequently, with mixed results, whereas the effects of hearing loss on performance were minimal due to judicious selection of stimuli; and (8) there is a paucity of observational studies using test batteries and longitudinal designs. Conclusions Based on this review of the scientific literature, there is insufficient evidence to confirm the existence of central presbycusis as an isolated entity. On the other hand, recent evidence has been accumulating in support of the existence of central presbycusis as a multifactorial condition that involves age- and/or disease-related changes in the auditory system and in the brain. Moreover, there is a clear need for additional research in this area. PMID:22967738
A novel speech-processing strategy incorporating tonal information for cochlear implants.
Lan, N; Nie, K B; Gao, S K; Zeng, F G
2004-05-01
Good performance in cochlear implant users depends in large part on the ability of a speech processor to effectively decompose speech signals into multiple channels of narrow-band electrical pulses for stimulation of the auditory nerve. Speech processors that extract only envelopes of the narrow-band signals (e.g., the continuous interleaved sampling (CIS) processor) may not provide sufficient information to encode the tonal cues in languages such as Chinese. To improve the performance in cochlear implant users who speak tonal language, we proposed and developed a novel speech-processing strategy, which extracted both the envelopes of the narrow-band signals and the fundamental frequency (F0) of the speech signal, and used them to modulate both the amplitude and the frequency of the electrical pulses delivered to stimulation electrodes. We developed an algorithm to extract the fundatmental frequency and identified the general patterns of pitch variations of four typical tones in Chinese speech. The effectiveness of the extraction algorithm was verified with an artificial neural network that recognized the tonal patterns from the extracted F0 information. We then compared the novel strategy with the envelope-extraction CIS strategy in human subjects with normal hearing. The novel strategy produced significant improvement in perception of Chinese tones, phrases, and sentences. This novel processor with dynamic modulation of both frequency and amplitude is encouraging for the design of a cochlear implant device for sensorineurally deaf patients who speak tonal languages.
High visual resolution matters in audiovisual speech perception, but only for some.
Alsius, Agnès; Wayne, Rachel V; Paré, Martin; Munhall, Kevin G
2016-07-01
The basis for individual differences in the degree to which visual speech input enhances comprehension of acoustically degraded speech is largely unknown. Previous research indicates that fine facial detail is not critical for visual enhancement when auditory information is available; however, these studies did not examine individual differences in ability to make use of fine facial detail in relation to audiovisual speech perception ability. Here, we compare participants based on their ability to benefit from visual speech information in the presence of an auditory signal degraded with noise, modulating the resolution of the visual signal through low-pass spatial frequency filtering and monitoring gaze behavior. Participants who benefited most from the addition of visual information (high visual gain) were more adversely affected by the removal of high spatial frequency information, compared to participants with low visual gain, for materials with both poor and rich contextual cues (i.e., words and sentences, respectively). Differences as a function of gaze behavior between participants with the highest and lowest visual gains were observed only for words, with participants with the highest visual gain fixating longer on the mouth region. Our results indicate that the individual variance in audiovisual speech in noise performance can be accounted for, in part, by better use of fine facial detail information extracted from the visual signal and increased fixation on mouth regions for short stimuli. Thus, for some, audiovisual speech perception may suffer when the visual input (in addition to the auditory signal) is less than perfect.
Samson, F; Zeffiro, T A; Doyon, J; Benali, H; Mottron, L
2015-09-01
A continuum of phenotypes makes up the autism spectrum (AS). In particular, individuals show large differences in language acquisition, ranging from precocious speech to severe speech onset delay. However, the neurological origin of this heterogeneity remains unknown. Here, we sought to determine whether AS individuals differing in speech acquisition show different cortical responses to auditory stimulation and morphometric brain differences. Whole-brain activity following exposure to non-social sounds was investigated. Individuals in the AS were classified according to the presence or absence of Speech Onset Delay (AS-SOD and AS-NoSOD, respectively) and were compared with IQ-matched typically developing individuals (TYP). AS-NoSOD participants displayed greater task-related activity than TYP in the inferior frontal gyrus and peri-auditory middle and superior temporal gyri, which are associated with language processing. Conversely, the AS-SOD group only showed enhanced activity in the vicinity of the auditory cortex. We detected no differences in brain structure between groups. This is the first study to demonstrate the existence of differences in functional brain activity between AS individuals divided according to their pattern of speech development. These findings support the Trigger-threshold-target model and indicate that the occurrence of speech onset delay in AS individuals depends on the location of cortical functional reallocation, which favors perception in AS-SOD and language in AS-NoSOD. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Accuracy of Cochlear Implant Recipients on Speech Reception in Background Music
Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Kliethermes, Stephanie; Driscoll, Virginia
2012-01-01
Objectives This study (a) examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of three contrasting types of background music, and (b) compared performance based upon listener groups: CI recipients using conventional long-electrode (LE) devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing (NH) adults. Methods We tested 154 LE CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 NH adults on closed-set recognition of spondees presented in three contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Outcomes Signal-to-noise thresholds for speech in music (SRTM) were examined in relation to measures of speech recognition in background noise and multi-talker babble, pitch perception, and music experience. Results SRTM thresholds varied as a function of category of background music, group membership (LE, Hybrid, NH), and age. Thresholds for speech in background music were significantly correlated with measures of pitch perception and speech in background noise thresholds; auditory status was an important predictor. Conclusions Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music. PMID:23342550
Multi-modal Biomarkers to Discriminate Cognitive State
2015-11-01
in the speech of a large sample of Parkinson patients. J. Speech Hear. Disord. 43(1), 47. 36. Ekman, P., Freisen, W.V. and Ancoli, S . 1980. Facial...Patel, Laura Brattain, Brian S . Helfer, Daryush D. Mehta, Jeffrey Palmer Kristin Heaton2, Marianna Eddy3, Joseph Moran3 1MIT Lincoln Laboratory...Parkinson’s disease [23]-[35]. Voice has been used in cognitive load by Yin et al [2] who achieved 77% accuracy using standard vocal features (e.g., mel
Real-time speech gisting for ATC applications
NASA Astrophysics Data System (ADS)
Dunkelberger, Kirk A.
1995-06-01
Command and control within the ATC environment remains primarily voice-based. Hence, automatic real time, speaker independent, continuous speech recognition (CSR) has many obvious applications and implied benefits to the ATC community: automated target tagging, aircraft compliance monitoring, controller training, automatic alarm disabling, display management, and many others. However, while current state-of-the-art CSR systems provide upwards of 98% word accuracy in laboratory environments, recent low-intrusion experiments in the ATCT environments demonstrated less than 70% word accuracy in spite of significant investments in recognizer tuning. Acoustic channel irregularities and controller/pilot grammar verities impact current CSR algorithms at their weakest points. It will be shown herein, however, that real time context- and environment-sensitive gisting can provide key command phrase recognition rates of greater than 95% using the same low-intrusion approach. The combination of real time inexact syntactic pattern recognition techniques and a tight integration of CSR, gisting, and ATC database accessor system components is the key to these high phase recognition rates. A system concept for real time gisting in the ATC context is presented herein. After establishing an application context, discussion presents a minimal CSR technology context then focuses on the gisting mechanism, desirable interfaces into the ATCT database environment, and data and control flow within the prototype system. Results of recent tests for a subset of the functionality are presented together with suggestions for further research.
Speech-Language Pathology production regarding voice in popular singing.
Drumond, Lorena Badaró; Vieira, Naymme Barbosa; Oliveira, Domingos Sávio Ferreira de
2011-12-01
To present a literature review about the Brazilian scientific production in Speech-Language Pathology and Audiology regarding voice in popular singing in the last decade, as for number of publications, musical styles studied, focus of the researches, and instruments used for data collection. Cross-sectional descriptive study carried out in two stages: search in databases and publications encompassing the last decade of researches in this area in Brazil, and reading of the material obtained for posterior categorization. The databases LILACS and SciELO, the Databasis of Dissertations and Theses organized by CAPES, the online version of Acta ORL, and the online version of OPUS were searched, using the following uniterms: voice, professional voice, singing voice, dysphonia, voice disorders, voice training, music, dysodia. Articles published between the years 2000 and 2010 were selected. The researches found were classified and categorized after reading their abstracts and, when necessary, the whole study. Twenty researches within the proposed theme were selected, all of which were descriptive, involving several musical styles. Twelve studies focused on the evaluation of the popular singer's voice, and the most frequently used data collection instrument was the auditory-perceptual evaluation. The results of the publications found corroborate the objectives proposed by the authors and the different methodologies. The number of studies published is still restricted when compared to the diversity of musical genres and the uniqueness of popular singer.
Multimodal interfaces with voice and gesture input
DOE Office of Scientific and Technical Information (OSTI.GOV)
Milota, A.D.; Blattner, M.M.
1995-07-20
The modalities of speech and gesture have different strengths and weaknesses, but combined they create synergy where each modality corrects the weaknesses of the other. We believe that a multimodal system such a one interwining speech and gesture must start from a different foundation than ones which are based solely on pen input. In order to provide a basis for the design of a speech and gesture system, we have examined the research in other disciplines such as anthropology and linguistics. The result of this investigation was a taxonomy that gave us material for the incorporation of gestures whose meaningsmore » are largely transparent to the users. This study describes the taxonomy and gives examples of applications to pen input systems.« less
Non-speech oral motor treatment for children with developmental speech sound disorders.
Lee, Alice S-Y; Gibbon, Fiona E
2015-03-25
Children with developmental speech sound disorders have difficulties in producing the speech sounds of their native language. These speech difficulties could be due to structural, sensory or neurophysiological causes (e.g. hearing impairment), but more often the cause of the problem is unknown. One treatment approach used by speech-language therapists/pathologists is non-speech oral motor treatment (NSOMT). NSOMTs are non-speech activities that aim to stimulate or improve speech production and treat specific speech errors. For example, using exercises such as smiling, pursing, blowing into horns, blowing bubbles, and lip massage to target lip mobility for the production of speech sounds involving the lips, such as /p/, /b/, and /m/. The efficacy of this treatment approach is controversial, and evidence regarding the efficacy of NSOMTs needs to be examined. To assess the efficacy of non-speech oral motor treatment (NSOMT) in treating children with developmental speech sound disorders who have speech errors. In April 2014 we searched the Cochrane Central Register of Controlled Trials (CENTRAL), Ovid MEDLINE (R) and Ovid MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, Education Resources Information Center (ERIC), PsycINFO and 11 other databases. We also searched five trial and research registers, checked the reference lists of relevant titles identified by the search and contacted researchers to identify other possible published and unpublished studies. Randomised and quasi-randomised controlled trials that compared (1) NSOMT versus placebo or control; and (2) NSOMT as adjunctive treatment or speech intervention versus speech intervention alone, for children aged three to 16 years with developmental speech sound disorders, as judged by a speech and language therapist. Individuals with an intellectual disability (e.g. Down syndrome) or a physical disability were not excluded. The Trials Search Co-ordinator of the Cochrane Developmental, Psychosocial and Learning Problems Group and one review author ran the searches. Two review authors independently screened titles and abstracts to eliminate irrelevant studies, extracted data from the included studies and assessed risk of bias in each of these studies. In cases of ambiguity or information missing from the paper, we contacted trial authors. This review identified three studies (from four reports) involving a total of 22 children that investigated the efficacy of NSOMT as adjunctive treatment to conventional speech intervention versus conventional speech intervention for children with speech sound disorders. One study, a randomised controlled trial (RCT), included four boys aged seven years one month to nine years six months - all had speech sound disorders, and two had additional conditions (one was diagnosed as "communication impaired" and the other as "multiply disabled"). Of the two quasi-randomised controlled trials, one included 10 children (six boys and four girls), aged five years eight months to six years nine months, with speech sound disorders as a result of tongue thrust, and the other study included eight children (four boys and four girls), aged three to six years, with moderate to severe articulation disorder only. Two studies did not find NSOMT as adjunctive treatment to be more effective than conventional speech intervention alone, as both intervention and control groups made similar improvements in articulation after receiving treatments. One study reported a change in postintervention articulation test results but used an inappropriate statistical test and did not report the results clearly. None of the included studies examined the effects of NSOMTs on any other primary outcomes, such as speech intelligibility, speech physiology and adverse effects, or on any of the secondary outcomes such as listener acceptability.The RCT was judged at low risk for selection bias. The two quasi-randomised trials used randomisation but did not report the method for generating the random sequence and were judged as having unclear risk of selection bias. The three included studies were deemed to have high risk of performance bias as, given the nature of the intervention, blinding of participants was not possible. Only one study implemented blinding of outcome assessment and was at low risk for detection bias. One study showed high risk of other bias as the baseline characteristics of participants seemed to be unequal. The sample size of each of the included studies was very small, which means it is highly likely that participants in these studies were not representative of its target population. In the light of these serious limitations in methodology, the overall quality of the evidence provided by the included trials is judged to be low. Therefore, further research is very likely to have an important impact on our confidence in the estimate of treatment effect and is likely to change the estimate. The three included studies were small in scale and had a number of serious methodological limitations. In addition, they covered limited types of NSOMTs for treating children with speech sound disorders of unknown origin with the sounds /s/ and /z/. Hence, we judged the overall applicability of the evidence as limited and incomplete. Results of this review are consistent with those of previous reviews: Currently no strong evidence suggests that NSOMTs are an effective treatment or an effective adjunctive treatment for children with developmental speech sound disorders. Lack of strong evidence regarding the treatment efficacy of NSOMTs has implications for clinicians when they make decisions in relation to treatment plans. Well-designed research is needed to carefully investigate NSOMT as a type of treatment for children with speech sound disorders.
Wiggins, Ian M; Anderson, Carly A; Kitterick, Pádraig T; Hartley, Douglas E H
2016-09-01
Functional near-infrared spectroscopy (fNIRS) is a silent, non-invasive neuroimaging technique that is potentially well suited to auditory research. However, the reliability of auditory-evoked activation measured using fNIRS is largely unknown. The present study investigated the test-retest reliability of speech-evoked fNIRS responses in normally-hearing adults. Seventeen participants underwent fNIRS imaging in two sessions separated by three months. In a block design, participants were presented with auditory speech, visual speech (silent speechreading), and audiovisual speech conditions. Optode arrays were placed bilaterally over the temporal lobes, targeting auditory brain regions. A range of established metrics was used to quantify the reproducibility of cortical activation patterns, as well as the amplitude and time course of the haemodynamic response within predefined regions of interest. The use of a signal processing algorithm designed to reduce the influence of systemic physiological signals was found to be crucial to achieving reliable detection of significant activation at the group level. For auditory speech (with or without visual cues), reliability was good to excellent at the group level, but highly variable among individuals. Temporal-lobe activation in response to visual speech was less reliable, especially in the right hemisphere. Consistent with previous reports, fNIRS reliability was improved by averaging across a small number of channels overlying a cortical region of interest. Overall, the present results confirm that fNIRS can measure speech-evoked auditory responses in adults that are highly reliable at the group level, and indicate that signal processing to reduce physiological noise may substantially improve the reliability of fNIRS measurements. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Jalaei, Bahram; Azmi, Mohd Hafiz Afifi Mohd; Zakaria, Mohd Normani
2018-05-17
Binaurally evoked auditory evoked potentials have good diagnostic values when testing subjects with central auditory deficits. The literature on speech-evoked auditory brainstem response evoked by binaural stimulation is in fact limited. Gender disparities in speech-evoked auditory brainstem response results have been consistently noted but the magnitude of gender difference has not been reported. The present study aimed to compare the magnitude of gender difference in speech-evoked auditory brainstem response results between monaural and binaural stimulations. A total of 34 healthy Asian adults aged 19-30 years participated in this comparative study. Eighteen of them were females (mean age=23.6±2.3 years) and the remaining sixteen were males (mean age=22.0±2.3 years). For each subject, speech-evoked auditory brainstem response was recorded with the synthesized syllable /da/ presented monaurally and binaurally. While latencies were not affected (p>0.05), the binaural stimulation produced statistically higher speech-evoked auditory brainstem response amplitudes than the monaural stimulation (p<0.05). As revealed by large effect sizes (d>0.80), substantive gender differences were noted in most of speech-evoked auditory brainstem response peaks for both stimulation modes. The magnitude of gender difference between the two stimulation modes revealed some distinct patterns. Based on these clinically significant results, gender-specific normative data are highly recommended when using speech-evoked auditory brainstem response for clinical and future applications. The preliminary normative data provided in the present study can serve as the reference for future studies on this test among Asian adults. Copyright © 2018 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Budde, Kristin S; Barron, Daniel S; Fox, Peter T
2014-12-01
Developmental stuttering is a speech disorder most likely due to a heritable form of developmental dysmyelination impairing the function of the speech-motor system. Speech-induced brain-activation patterns in persons who stutter (PWS) are anomalous in various ways; the consistency of these aberrant patterns is a matter of ongoing debate. Here, we present a hierarchical series of coordinate-based meta-analyses addressing this issue. Two tiers of meta-analyses were performed on a 17-paper dataset (202 PWS; 167 fluent controls). Four large-scale (top-tier) meta-analyses were performed, two for each subject group (PWS and controls). These analyses robustly confirmed the regional effects previously postulated as "neural signatures of stuttering" (Brown, Ingham, Ingham, Laird, & Fox, 2005) and extended this designation to additional regions. Two smaller-scale (lower-tier) meta-analyses refined the interpretation of the large-scale analyses: (1) a between-group contrast targeting differences between PWS and controls (stuttering trait); and (2) a within-group contrast (PWS only) of stuttering with induced fluency (stuttering state). Copyright © 2014 Elsevier Inc. All rights reserved.
Functional Characterization of the Human Speech Articulation Network.
Basilakos, Alexandra; Smith, Kimberly G; Fillmore, Paul; Fridriksson, Julius; Fedorenko, Evelina
2018-05-01
A number of brain regions have been implicated in articulation, but their precise computations remain debated. Using functional magnetic resonance imaging, we examine the degree of functional specificity of articulation-responsive brain regions to constrain hypotheses about their contributions to speech production. We find that articulation-responsive regions (1) are sensitive to articulatory complexity, but (2) are largely nonoverlapping with nearby domain-general regions that support diverse goal-directed behaviors. Furthermore, premotor articulation regions show selectivity for speech production over some related tasks (respiration control), but not others (nonspeech oral-motor [NSO] movements). This overlap between speech and nonspeech movements concords with electrocorticographic evidence that these regions encode articulators and their states, and with patient evidence whereby articulatory deficits are often accompanied by oral-motor deficits. In contrast, the superior temporal regions show strong selectivity for articulation relative to nonspeech movements, suggesting that these regions play a specific role in speech planning/production. Finally, articulation-responsive portions of posterior inferior frontal gyrus show some selectivity for articulation, in line with the hypothesis that this region prepares an articulatory code that is passed to the premotor cortex. Taken together, these results inform the architecture of the human articulation system.
NASA Astrophysics Data System (ADS)
Whang, Tom; Ratib, Osman M.; Umamoto, Kathleen; Grant, Edward G.; McCoy, Michael J.
2002-05-01
The goal of this study is to determine the financial value and workflow improvements achievable by replacing traditional transcription services with a speech recognition system in a large, university hospital setting. Workflow metrics were measured at two hospitals, one of which exclusively uses a transcription service (UCLA Medical Center), and the other which exclusively uses speech recognition (West Los Angeles VA Hospital). Workflow metrics include time spent per report (the sum of time spent interpreting, dictating, reviewing, and editing), transcription turnaround, and total report turnaround. Compared to traditional transcription, speech recognition resulted in radiologists spending 13-32% more time per report, but it also resulted in reduction of report turnaround time by 22-62% and reduction of marginal cost per report by 94%. The model developed here helps justify the introduction of a speech recognition system by showing that the benefits of reduced operating costs and decreased turnaround time outweigh the cost of increased time spent per report. Whether the ultimate goal is to achieve a financial objective or to improve operational efficiency, it is important to conduct a thorough analysis of workflow before implementation.
Hurtado, Nereyda; Marchman, Virginia A.; Fernald, Anne
2010-01-01
It is well established that variation in caregivers' speech is associated with language outcomes, yet little is known about the learning principles that mediate these effects. This longitudinal study (n = 27) explores whether Spanish-learning children's early experiences with language predict efficiency in real-time comprehension and vocabulary learning. Measures of mothers' speech at 18 months were examined in relation to children's speech processing efficiency and reported vocabulary at 18 and 24 months. Children of mothers who provided more input at 18 months knew more words and were faster in word recognition at 24 months. Moreover, multiple regression analyses indicated that the influences of caregiver speech on speed of word recognition and vocabulary were largely overlapping. This study provides the first evidence that input shapes children's lexical processing efficiency and that vocabulary growth and increasing facility in spoken word comprehension work together to support the uptake of the information that rich input affords the young language learner. PMID:19046145
Recognizing speech under a processing load: dissociating energetic from informational factors.
Mattys, Sven L; Brooks, Joanna; Cooke, Martin
2009-11-01
Effects of perceptual and cognitive loads on spoken-word recognition have so far largely escaped investigation. This study lays the foundations of a psycholinguistic approach to speech recognition in adverse conditions that draws upon the distinction between energetic masking, i.e., listening environments leading to signal degradation, and informational masking, i.e., listening environments leading to depletion of higher-order, domain-general processing resources, independent of signal degradation. We show that severe energetic masking, such as that produced by background speech or noise, curtails reliance on lexical-semantic knowledge and increases relative reliance on salient acoustic detail. In contrast, informational masking, induced by a resource-depleting competing task (divided attention or a memory load), results in the opposite pattern. Based on this clear dissociation, we propose a model of speech recognition that addresses not only the mapping between sensory input and lexical representations, as traditionally advocated, but also the way in which this mapping interfaces with general cognition and non-linguistic processes.
Atherton, Marie; Dung, Nguyễn Thị Ngọc; Nhân, Võ Hoàng
2013-02-01
Wylie, McAllister, Davidson, and Marshall (2013) argue that recommendations made within the World Report on Disability provide an opportunity for speech-language pathologists to consider new ways of developing services for people with communication and swallowing disorders. They propose that current approaches to the delivery of speech-language pathology services are largely embedded within the medical model of impairment, thereby limiting the ability of services to meet the needs of people in a holistic manner. In this paper, the criticality of selecting an appropriate service delivery model is discussed within the context of a recently established post-graduate speech therapy education programme in Viet Nam. Driving forces for the implementation of the program will be explored, as will the factors that determined the choice of service delivery. Opportunities and challenges to the long-term viability of the program and the program's potential to meet the needs of persons with communication and swallowing disorders in Viet Nam will be considered.
The dispersion-focalization theory of sound systems
NASA Astrophysics Data System (ADS)
Schwartz, Jean-Luc; Abry, Christian; Boë, Louis-Jean; Vallée, Nathalie; Ménard, Lucie
2005-04-01
The Dispersion-Focalization Theory states that sound systems in human languages are shaped by two major perceptual constraints: dispersion driving auditory contrast towards maximal or sufficient values [B. Lindblom, J. Phonetics 18, 135-152 (1990)] and focalization driving auditory spectra towards patterns with close neighboring formants. Dispersion is computed from the sum of the inverse squared inter-spectra distances in the (F1, F2, F3, F4) space, using a non-linear process based on the 3.5 Bark critical distance to estimate F2'. Focalization is based on the idea that close neighboring formants produce vowel spectra with marked peaks, easier to process and memorize in the auditory system. Evidence for increased stability of focal vowels in short-term memory was provided in a discrimination experiment on adult French subjects [J. L. Schwartz and P. Escudier, Speech Comm. 8, 235-259 (1989)]. A reanalysis of infant discrimination data shows that focalization could well be the responsible for recurrent discrimination asymmetries [J. L. Schwartz et al., Speech Comm. (in press)]. Recent data about children vowel production indicate that focalization seems to be part of the perceptual templates driving speech development. The Dispersion-Focalization Theory produces valid predictions for both vowel and consonant systems, in relation with available databases of human languages inventories.
Guo, Ruiling; Bain, Barbara A; Willer, Janene
2008-04-01
The research assesses the information needs of speech-language pathologists (SLPs) and audiologists in Idaho and identifies specific needs for training in evidence-based practice (EBP) principles and searching EBP resources. A survey was developed to assess knowledge and skills in accessing information. Questionnaires were distributed to 217 members of the Idaho Speech-Language-Hearing Association, who were given multiple options to return the assessment survey (web, email, mail). Data were analyzed descriptively and statistically. The total response rate was 38.7% (84/217). Of the respondents, 87.0% (73/84) indicated insufficient knowledge and skills to search PubMed. Further, 47.6% (40/84) indicated limited knowledge of EBP. Of professionals responding, 52.4% (44/84) reported interest in learning more about EBP and 47.6% (40/84) reported interest in learning to search PubMed. SLPs and audiologists who graduated within the last 10 years were more likely to respond online, while those graduating prior to that time preferred to respond via hard copy. DISCUSSIONS/CONCLUSION: More effort should be made to ensure that SLPs and audiologists develop skills in locating information to support their practice. Results from this information needs assessment were used to design a training and outreach program on EBP and EBP database searching for SLPs and audiologists in Idaho.
A Browser-Server-Based Tele-audiology System That Supports Multiple Hearing Test Modalities
Yao, Daoyuan; Givens, Gregg
2015-01-01
Abstract Introduction: Millions of global citizens suffering from hearing disorders have limited or no access to much needed hearing healthcare. Although tele-audiology presents a solution to alleviate this problem, existing remote hearing diagnosis systems support only pure-tone tests, leaving speech and other test procedures unsolved, due to the lack of software and hardware to enable communication required between audiologists and their remote patients. This article presents a comprehensive remote hearing test system that integrates the two most needed hearing test procedures: a pure-tone audiogram and a speech test. Materials and Methods: This enhanced system is composed of a Web application server, an embedded smart Internet-Bluetooth® (Bluetooth SIG, Kirkland, WA) gateway (or console device), and a Bluetooth-enabled audiometer. Several graphical user interfaces and a relational database are hosted on the application server. The console device has been designed to support the tests and auxiliary communication between the local site and the remote site. Results: The study was conducted at an audiology laboratory. Pure-tone audiogram and speech test results from volunteers tested with this tele-audiology system are comparable with results from the traditional face-to-face approach. Conclusions: This browser-server–based comprehensive tele-audiology offers a flexible platform to expand hearing services to traditionally underserved groups. PMID:25919376
A Browser-Server-Based Tele-audiology System That Supports Multiple Hearing Test Modalities.
Yao, Jianchu Jason; Yao, Daoyuan; Givens, Gregg
2015-09-01
Millions of global citizens suffering from hearing disorders have limited or no access to much needed hearing healthcare. Although tele-audiology presents a solution to alleviate this problem, existing remote hearing diagnosis systems support only pure-tone tests, leaving speech and other test procedures unsolved, due to the lack of software and hardware to enable communication required between audiologists and their remote patients. This article presents a comprehensive remote hearing test system that integrates the two most needed hearing test procedures: a pure-tone audiogram and a speech test. This enhanced system is composed of a Web application server, an embedded smart Internet-Bluetooth(®) (Bluetooth SIG, Kirkland, WA) gateway (or console device), and a Bluetooth-enabled audiometer. Several graphical user interfaces and a relational database are hosted on the application server. The console device has been designed to support the tests and auxiliary communication between the local site and the remote site. The study was conducted at an audiology laboratory. Pure-tone audiogram and speech test results from volunteers tested with this tele-audiology system are comparable with results from the traditional face-to-face approach. This browser-server-based comprehensive tele-audiology offers a flexible platform to expand hearing services to traditionally underserved groups.
Inequality across consonantal contrasts in speech perception: evidence from mismatch negativity.
Cornell, Sonia A; Lahiri, Aditi; Eulitz, Carsten
2013-06-01
The precise structure of speech sound representations is still a matter of debate. In the present neurobiological study, we compared predictions about differential sensitivity to speech contrasts between models that assume full specification of all phonological information in the mental lexicon with those assuming sparse representations (only contrastive or otherwise not predictable information is stored). In a passive oddball paradigm, we studied the contrast sensitivity as reflected in the mismatch negativity (MMN) response to changes in the manner of articulation, as well as place of articulation of consonants in intervocalic positions of nonwords (manner of articulation: [edi ~ eni], [ezi ~ eni]; place of articulation: [edi ~ egi]). Models that assume full specification of all phonological information in the mental lexicon posit equal MMNs within each contrast (symmetric MMNs), that is, changes from standard [edi] to deviant [eni] elicit a similar MMN response as changes from standard [eni] to deviant [edi]. In contrast, models that assume sparse representations predict that only the [ezi] ~ [eni] reversals will evoke symmetric MMNs because of their conflicting fully specified manner features. Asymmetric MMNs are predicted, however, for the reversals of [edi] ~ [eni] and [edi] ~ [egi] because either a manner or place property in each pair is not fully specified in the mental lexicon. Our results show a pattern of symmetric and asymmetric MMNs that is in line with predictions of the featurally underspecified lexicon model that assumes sparse phonological representations. We conclude that the brain refers to underspecified phonological representations during speech perception. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor
2015-01-01
In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757
Analyzing crowdsourced ratings of speech-based take-over requests for automated driving.
Bazilinskyy, P; de Winter, J C F
2017-10-01
Take-over requests in automated driving should fit the urgency of the traffic situation. The robustness of various published research findings on the valuations of speech-based warning messages is unclear. This research aimed to establish how people value speech-based take-over requests as a function of speech rate, background noise, spoken phrase, and speaker's gender and emotional tone. By means of crowdsourcing, 2669 participants from 95 countries listened to a random 10 out of 140 take-over requests, and rated each take-over request on urgency, commandingness, pleasantness, and ease of understanding. Our results replicate several published findings, in particular that an increase in speech rate results in a monotonic increase of perceived urgency. The female voice was easier to understand than a male voice when there was a high level of background noise, a finding that contradicts the literature. Moreover, a take-over request spoken with Indian accent was found to be easier to understand by participants from India than by participants from other countries. Our results replicate effects in the literature regarding speech-based warnings, and shed new light on effects of background noise, gender, and nationality. The results may have implications for the selection of appropriate take-over requests in automated driving. Additionally, our study demonstrates the promise of crowdsourcing for testing human factors and ergonomics theories with large sample sizes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Envelope and intensity based prediction of psychoacoustic masking and speech intelligibility.
Biberger, Thomas; Ewert, Stephan D
2016-08-01
Human auditory perception and speech intelligibility have been successfully described based on the two concepts of spectral masking and amplitude modulation (AM) masking. The power-spectrum model (PSM) [Patterson and Moore (1986). Frequency Selectivity in Hearing, pp. 123-177] accounts for effects of spectral masking and critical bandwidth, while the envelope power-spectrum model (EPSM) [Ewert and Dau (2000). J. Acoust. Soc. Am. 108, 1181-1196] has been successfully applied to AM masking and discrimination. Both models extract the long-term (envelope) power to calculate signal-to-noise ratios (SNR). Recently, the EPSM has been applied to speech intelligibility (SI) considering the short-term envelope SNR on various time scales (multi-resolution speech-based envelope power-spectrum model; mr-sEPSM) to account for SI in fluctuating noise [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436-446]. Here, a generalized auditory model is suggested combining the classical PSM and the mr-sEPSM to jointly account for psychoacoustics and speech intelligibility. The model was extended to consider the local AM depth in conditions with slowly varying signal levels, and the relative role of long-term and short-term SNR was assessed. The suggested generalized power-spectrum model is shown to account for a large variety of psychoacoustic data and to predict speech intelligibility in various types of background noise.
Simonyan, Kristina; Herscovitch, Peter; Horwitz, Barry
2013-01-01
Considerable progress has been recently made in understanding the brain mechanisms underlying speech and language control. However, the neurochemical underpinnings of normal speech production remain largely unknown. We investigated the extent of striatal endogenous dopamine release and its influences on the organization of functional striatal speech networks during production of meaningful English sentences using a combination of positron emission tomography (PET) with the dopamine D2/D3 receptor radioligand [11C]raclopride and functional MRI (fMRI). In addition, we used diffusion tensor tractography (DTI) to examine the extent of dopaminergic modulatory influences on striatal structural network organization. We found that, during sentence production, endogenous dopamine was released in the ventromedial portion of the dorsal striatum, in its both associative and sensorimotor functional divisions. In the associative striatum, speech-induced dopamine release established a significant relationship with neural activity and influenced the left-hemispheric lateralization of striatal functional networks. In contrast, there were no significant effects of endogenous dopamine release on the lateralization of striatal structural networks. Our data provide the first evidence for endogenous dopamine release in the dorsal striatum during normal speaking and point to the possible mechanisms behind the modulatory influences of dopamine on the organization of functional brain circuits controlling normal human speech. PMID:23277111
Quantitative application of the primary progressive aphasia consensus criteria
Wicklund, Meredith R.; Duffy, Joseph R.; Strand, Edythe A.; Machulda, Mary M.; Whitwell, Jennifer L.
2014-01-01
Objective: To determine how well the consensus criteria could classify subjects with primary progressive aphasia (PPA) using a quantitative speech and language battery that matches the test descriptions provided by the consensus criteria. Methods: A total of 105 participants with a neurodegenerative speech and language disorder were prospectively recruited and underwent neurologic, neuropsychological, and speech and language testing and MRI in this case-control study. Twenty-one participants with apraxia of speech without aphasia served as controls. Select tests from the speech and language battery were chosen for application of consensus criteria and cutoffs were employed to determine syndromic classification. Hierarchical cluster analysis was used to examine participants who could not be classified. Results: Of the 84 participants, 58 (69%) could be classified as agrammatic (27%), semantic (7%), or logopenic (35%) variants of PPA. The remaining 31% of participants could not be classified. Of the unclassifiable participants, 2 clusters were identified. The speech and language profile of the first cluster resembled mild logopenic PPA and the second cluster semantic PPA. Gray matter patterns of loss of these 2 clusters of unclassified participants also resembled mild logopenic and semantic variants. Conclusions: Quantitative application of consensus PPA criteria yields the 3 syndromic variants but leaves a large proportion unclassified. Therefore, the current consensus criteria need to be modified in order to improve sensitivity. PMID:24598709
Schmidt-Naylor, Anna C.; Brady, Nancy C.
2017-01-01
Purpose We explored alphabet supplementation as an augmentative and alternative communication strategy for adults with minimal literacy. Study 1's goal was to teach onset-letter selection with spoken words and assess generalization to untaught words, demonstrating the alphabetic principle. Study 2 incorporated alphabet supplementation within a naming task and then assessed effects on speech intelligibility. Method Three men with intellectual disabilities (ID) and low speech intelligibility participated. Study 1 used a multiple-probe design, across three 20-word sets, to show that our computer-based training improved onset-letter selection. We also probed generalization to untrained words. Study 2 taught onset-letter selection for 30 new words chosen for functionality. Five listeners transcribed speech samples of the 30 words in 2 conditions: speech only and speech with alphabet supplementation. Results Across studies 1 and 2, participants demonstrated onset-letter selection for at least 90 words. Study 1 showed evidence of the alphabetic principle for some but not all word sets. In study 2, participants readily used alphabet supplementation, enabling listeners to understand twice as many words. Conclusions This is the first demonstration of alphabet supplementation in individuals with ID and minimal literacy. The large number of words learned holds promise both for improving communication and providing a foundation for improved literacy. PMID:28474087
Primate vocal communication: a useful tool for understanding human speech and language evolution?
Fedurek, Pawel; Slocombe, Katie E
2011-04-01
Language is a uniquely human trait, and questions of how and why it evolved have been intriguing scientists for years. Nonhuman primates (primates) are our closest living relatives, and their behavior can be used to estimate the capacities of our extinct ancestors. As humans and many primate species rely on vocalizations as their primary mode of communication, the vocal behavior of primates has been an obvious target for studies investigating the evolutionary roots of human speech and language. By studying the similarities and differences between human and primate vocalizations, comparative research has the potential to clarify the evolutionary processes that shaped human speech and language. This review examines some of the seminal and recent studies that contribute to our knowledge regarding the link between primate calls and human language and speech. We focus on three main aspects of primate vocal behavior: functional reference, call combinations, and vocal learning. Studies in these areas indicate that despite important differences, primate vocal communication exhibits some key features characterizing human language. They also indicate, however, that some critical aspects of speech, such as vocal plasticity, are not shared with our primate cousins. We conclude that comparative research on primate vocal behavior is a very promising tool for deepening our understanding of the evolution of human speech and language, but much is still to be done as many aspects of monkey and ape vocalizations remain largely unexplored.
Music and speech prosody: a common rhythm.
Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo
2013-01-01
Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).
Data-driven analysis of functional brain interactions during free listening to music and speech.
Fang, Jun; Hu, Xintao; Han, Junwei; Jiang, Xi; Zhu, Dajiang; Guo, Lei; Liu, Tianming
2015-06-01
Natural stimulus functional magnetic resonance imaging (N-fMRI) such as fMRI acquired when participants were watching video streams or listening to audio streams has been increasingly used to investigate functional mechanisms of the human brain in recent years. One of the fundamental challenges in functional brain mapping based on N-fMRI is to model the brain's functional responses to continuous, naturalistic and dynamic natural stimuli. To address this challenge, in this paper we present a data-driven approach to exploring functional interactions in the human brain during free listening to music and speech streams. Specifically, we model the brain responses using N-fMRI by measuring the functional interactions on large-scale brain networks with intrinsically established structural correspondence, and perform music and speech classification tasks to guide the systematic identification of consistent and discriminative functional interactions when multiple subjects were listening music and speech in multiple categories. The underlying premise is that the functional interactions derived from N-fMRI data of multiple subjects should exhibit both consistency and discriminability. Our experimental results show that a variety of brain systems including attention, memory, auditory/language, emotion, and action networks are among the most relevant brain systems involved in classic music, pop music and speech differentiation. Our study provides an alternative approach to investigating the human brain's mechanism in comprehension of complex natural music and speech.
Music and speech prosody: a common rhythm
Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo
2013-01-01
Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022
McCreery, Ryan W; Walker, Elizabeth A; Spratford, Meredith; Oleson, Jacob; Bentler, Ruth; Holte, Lenore; Roush, Patricia
2015-01-01
Progress has been made in recent years in the provision of amplification and early intervention for children who are hard of hearing. However, children who use hearing aids (HAs) may have inconsistent access to their auditory environment due to limitations in speech audibility through their HAs or limited HA use. The effects of variability in children's auditory experience on parent-reported auditory skills questionnaires and on speech recognition in quiet and in noise were examined for a large group of children who were followed as part of the Outcomes of Children with Hearing Loss study. Parent ratings on auditory development questionnaires and children's speech recognition were assessed for 306 children who are hard of hearing. Children ranged in age from 12 months to 9 years. Three questionnaires involving parent ratings of auditory skill development and behavior were used, including the LittlEARS Auditory Questionnaire, Parents Evaluation of Oral/Aural Performance in Children rating scale, and an adaptation of the Speech, Spatial, and Qualities of Hearing scale. Speech recognition in quiet was assessed using the Open- and Closed-Set Test, Early Speech Perception test, Lexical Neighborhood Test, and Phonetically Balanced Kindergarten word lists. Speech recognition in noise was assessed using the Computer-Assisted Speech Perception Assessment. Children who are hard of hearing were compared with peers with normal hearing matched for age, maternal educational level, and nonverbal intelligence. The effects of aided audibility, HA use, and language ability on parent responses to auditory development questionnaires and on children's speech recognition were also examined. Children who are hard of hearing had poorer performance than peers with normal hearing on parent ratings of auditory skills and had poorer speech recognition. Significant individual variability among children who are hard of hearing was observed. Children with greater aided audibility through their HAs, more hours of HA use, and better language abilities generally had higher parent ratings of auditory skills and better speech-recognition abilities in quiet and in noise than peers with less audibility, more limited HA use, or poorer language abilities. In addition to the auditory and language factors that were predictive for speech recognition in quiet, phonological working memory was also a positive predictor for word recognition abilities in noise. Children who are hard of hearing continue to experience delays in auditory skill development and speech-recognition abilities compared with peers with normal hearing. However, significant improvements in these domains have occurred in comparison to similar data reported before the adoption of universal newborn hearing screening and early intervention programs for children who are hard of hearing. Increasing the audibility of speech has a direct positive effect on auditory skill development and speech-recognition abilities and also may enhance these skills by improving language abilities in children who are hard of hearing. Greater number of hours of HA use also had a significant positive impact on parent ratings of auditory skills and children's speech recognition.
Rosemann, Stephanie; Thiel, Christiane M
2018-07-15
Hearing loss is associated with difficulties in understanding speech, especially under adverse listening conditions. In these situations, seeing the speaker improves speech intelligibility in hearing-impaired participants. On the neuronal level, previous research has shown cross-modal plastic reorganization in the auditory cortex following hearing loss leading to altered processing of auditory, visual and audio-visual information. However, how reduced auditory input effects audio-visual speech perception in hearing-impaired subjects is largely unknown. We here investigated the impact of mild to moderate age-related hearing loss on processing audio-visual speech using functional magnetic resonance imaging. Normal-hearing and hearing-impaired participants performed two audio-visual speech integration tasks: a sentence detection task inside the scanner and the McGurk illusion outside the scanner. Both tasks consisted of congruent and incongruent audio-visual conditions, as well as auditory-only and visual-only conditions. We found a significantly stronger McGurk illusion in the hearing-impaired participants, which indicates stronger audio-visual integration. Neurally, hearing loss was associated with an increased recruitment of frontal brain areas when processing incongruent audio-visual, auditory and also visual speech stimuli, which may reflect the increased effort to perform the task. Hearing loss modulated both the audio-visual integration strength measured with the McGurk illusion and brain activation in frontal areas in the sentence task, showing stronger integration and higher brain activation with increasing hearing loss. Incongruent compared to congruent audio-visual speech revealed an opposite brain activation pattern in left ventral postcentral gyrus in both groups, with higher activation in hearing-impaired participants in the incongruent condition. Our results indicate that already mild to moderate hearing loss impacts audio-visual speech processing accompanied by changes in brain activation particularly involving frontal areas. These changes are modulated by the extent of hearing loss. Copyright © 2018 Elsevier Inc. All rights reserved.
Speech Perception With Combined Electric-Acoustic Stimulation: A Simulation and Model Comparison.
Rader, Tobias; Adel, Youssef; Fastl, Hugo; Baumann, Uwe
2015-01-01
The aim of this study is to simulate speech perception with combined electric-acoustic stimulation (EAS), verify the advantage of combined stimulation in normal-hearing (NH) subjects, and then compare it with cochlear implant (CI) and EAS user results from the authors' previous study. Furthermore, an automatic speech recognition (ASR) system was built to examine the impact of low-frequency information and is proposed as an applied model to study different hypotheses of the combined-stimulation advantage. Signal-detection-theory (SDT) models were applied to assess predictions of subject performance without the need to assume any synergistic effects. Speech perception was tested using a closed-set matrix test (Oldenburg sentence test), and its speech material was processed to simulate CI and EAS hearing. A total of 43 NH subjects and a customized ASR system were tested. CI hearing was simulated by an aurally adequate signal spectrum analysis and representation, the part-tone-time-pattern, which was vocoded at 12 center frequencies according to the MED-EL DUET speech processor. Residual acoustic hearing was simulated by low-pass (LP)-filtered speech with cutoff frequencies 200 and 500 Hz for NH subjects and in the range from 100 to 500 Hz for the ASR system. Speech reception thresholds were determined in amplitude-modulated noise and in pseudocontinuous noise. Previously proposed SDT models were lastly applied to predict NH subject performance with EAS simulations. NH subjects tested with EAS simulations demonstrated the combined-stimulation advantage. Increasing the LP cutoff frequency from 200 to 500 Hz significantly improved speech reception thresholds in both noise conditions. In continuous noise, CI and EAS users showed generally better performance than NH subjects tested with simulations. In modulated noise, performance was comparable except for the EAS at cutoff frequency 500 Hz where NH subject performance was superior. The ASR system showed similar behavior to NH subjects despite a positive signal-to-noise ratio shift for both noise conditions, while demonstrating the synergistic effect for cutoff frequencies ≥300 Hz. One SDT model largely predicted the combined-stimulation results in continuous noise, while falling short of predicting performance observed in modulated noise. The presented simulation was able to demonstrate the combined-stimulation advantage for NH subjects as observed in EAS users. Only NH subjects tested with EAS simulations were able to take advantage of the gap listening effect, while CI and EAS user performance was consistently degraded in modulated noise compared with performance in continuous noise. The application of ASR systems seems feasible to assess the impact of different signal processing strategies on speech perception with CI and EAS simulations. In continuous noise, SDT models were largely able to predict the performance gain without assuming any synergistic effects, but model amendments are required to explain the gap listening effect in modulated noise.
Overlapping Networks Engaged during Spoken Language Production and Its Cognitive Control
Wise, Richard J.S.; Mehta, Amrish; Leech, Robert
2014-01-01
Spoken language production is a complex brain function that relies on large-scale networks. These include domain-specific networks that mediate language-specific processes, as well as domain-general networks mediating top-down and bottom-up attentional control. Language control is thought to involve a left-lateralized fronto-temporal-parietal (FTP) system. However, these regions do not always activate for language tasks and similar regions have been implicated in nonlinguistic cognitive processes. These inconsistent findings suggest that either the left FTP is involved in multidomain cognitive control or that there are multiple spatially overlapping FTP systems. We present evidence from an fMRI study using multivariate analysis to identify spatiotemporal networks involved in spoken language production in humans. We compared spoken language production (Speech) with multiple baselines, counting (Count), nonverbal decision (Decision), and “rest,” to pull apart the multiple partially overlapping networks that are involved in speech production. A left-lateralized FTP network was activated during Speech and deactivated during Count and nonverbal Decision trials, implicating it in cognitive control specific to sentential spoken language production. A mirror right-lateralized FTP network was activated in the Count and Decision trials, but not Speech. Importantly, a second overlapping left FTP network showed relative deactivation in Speech. These three networks, with distinct time courses, overlapped in the left parietal lobe. Contrary to the standard model of the left FTP as being dominant for speech, we revealed a more complex pattern within the left FTP, including at least two left FTP networks with competing functional roles, only one of which was activated in speech production. PMID:24966373
Overlapping networks engaged during spoken language production and its cognitive control.
Geranmayeh, Fatemeh; Wise, Richard J S; Mehta, Amrish; Leech, Robert
2014-06-25
Spoken language production is a complex brain function that relies on large-scale networks. These include domain-specific networks that mediate language-specific processes, as well as domain-general networks mediating top-down and bottom-up attentional control. Language control is thought to involve a left-lateralized fronto-temporal-parietal (FTP) system. However, these regions do not always activate for language tasks and similar regions have been implicated in nonlinguistic cognitive processes. These inconsistent findings suggest that either the left FTP is involved in multidomain cognitive control or that there are multiple spatially overlapping FTP systems. We present evidence from an fMRI study using multivariate analysis to identify spatiotemporal networks involved in spoken language production in humans. We compared spoken language production (Speech) with multiple baselines, counting (Count), nonverbal decision (Decision), and "rest," to pull apart the multiple partially overlapping networks that are involved in speech production. A left-lateralized FTP network was activated during Speech and deactivated during Count and nonverbal Decision trials, implicating it in cognitive control specific to sentential spoken language production. A mirror right-lateralized FTP network was activated in the Count and Decision trials, but not Speech. Importantly, a second overlapping left FTP network showed relative deactivation in Speech. These three networks, with distinct time courses, overlapped in the left parietal lobe. Contrary to the standard model of the left FTP as being dominant for speech, we revealed a more complex pattern within the left FTP, including at least two left FTP networks with competing functional roles, only one of which was activated in speech production. Copyright © 2014 Geranmayeh et al.
Speech Recognition: Proceedings of a Workshop Held in Palo Alto, California on 19-20 February 1986
1986-02-01
always make the system fail - sometimes even when not trying to do so. Finally, Dr. Jelinek of IBM warned that the not-invented-here syndrome is hard to...Evaluation of its performance on 365 sentences indicates that 70% of the nasals are correctly located, with one impostor accepted for ev- ery nasal. Most...for each feature. Evaluation of the classifier performance on the same database indicates that 80% of the nasals and impostors are correctly id
Symposium on electron linear accelerators in honor of Richard B. Neal's 80th birthday: Proceedings
DOE Office of Scientific and Technical Information (OSTI.GOV)
Siemann, R.H.
The papers presented at the conference are: (1) the construction of SLAC and the role of R.B. Neal; (2) symposium speech; (3) lessons learned from the SLC; (4) alternate approaches to future electron-positron linear colliders; (5) the NLC technical program; (6) advanced electron linacs; (7) medical uses of linear accelerators; (8) linac-based, intense, coherent X-ray source using self-amplified spontaneous emission. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.
Human neuromagnetic steady-state responses to amplitude-modulated tones, speech, and music.
Lamminmäki, Satu; Parkkonen, Lauri; Hari, Riitta
2014-01-01
Auditory steady-state responses that can be elicited by various periodic sounds inform about subcortical and early cortical auditory processing. Steady-state responses to amplitude-modulated pure tones have been used to scrutinize binaural interaction by frequency-tagging the two ears' inputs at different frequencies. Unlike pure tones, speech and music are physically very complex, as they include many frequency components, pauses, and large temporal variations. To examine the utility of magnetoencephalographic (MEG) steady-state fields (SSFs) in the study of early cortical processing of complex natural sounds, the authors tested the extent to which amplitude-modulated speech and music can elicit reliable SSFs. MEG responses were recorded to 90-s-long binaural tones, speech, and music, amplitude-modulated at 41.1 Hz at four different depths (25, 50, 75, and 100%). The subjects were 11 healthy, normal-hearing adults. MEG signals were averaged in phase with the modulation frequency, and the sources of the resulting SSFs were modeled by current dipoles. After the MEG recording, intelligibility of the speech, musical quality of the music stimuli, naturalness of music and speech stimuli, and the perceived deterioration caused by the modulation were evaluated on visual analog scales. The perceived quality of the stimuli decreased as a function of increasing modulation depth, more strongly for music than speech; yet, all subjects considered the speech intelligible even at the 100% modulation. SSFs were the strongest to tones and the weakest to speech stimuli; the amplitudes increased with increasing modulation depth for all stimuli. SSFs to tones were reliably detectable at all modulation depths (in all subjects in the right hemisphere, in 9 subjects in the left hemisphere) and to music stimuli at 50 to 100% depths, whereas speech usually elicited clear SSFs only at 100% depth.The hemispheric balance of SSFs was toward the right hemisphere for tones and speech, whereas SSFs to music showed no lateralization. In addition, the right lateralization of SSFs to the speech stimuli decreased with decreasing modulation depth. The results showed that SSFs can be reliably measured to amplitude-modulated natural sounds, with slightly different hemispheric lateralization for different carrier sounds. With speech stimuli, modulation at 100% depth is required, whereas for music the 75% or even 50% modulation depths provide a reasonable compromise between the signal-to-noise ratio of SSFs and sound quality or perceptual requirements. SSF recordings thus seem feasible for assessing the early cortical processing of natural sounds.
Amplify scientific discovery with artificial intelligence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gil, Yolanda; Greaves, Mark T.; Hendler, James
Computing innovations have fundamentally changed many aspects of scientific inquiry. For example, advances in robotics, high-end computing, networking, and databases now underlie much of what we do in science such as gene sequencing, general number crunching, sharing information between scientists, and analyzing large amounts of data. As computing has evolved at a rapid pace, so too has its impact in science, with the most recent computing innovations repeatedly being brought to bear to facilitate new forms of inquiry. Recently, advances in Artificial Intelligence (AI) have deeply penetrated many consumer sectors, including for example Apple’s Siri™ speech recognition system, real-time automatedmore » language translation services, and a new generation of self-driving cars and self-navigating drones. However, AI has yet to achieve comparable levels of penetration in scientific inquiry, despite its tremendous potential in aiding computers to help scientists tackle tasks that require scientific reasoning. We contend that advances in AI will transform the practice of science as we are increasingly able to effectively and jointly harness human and machine intelligence in the pursuit of major scientific challenges.« less
Practice variations in voice treatment selection following vocal fold mucosal resection.
Moore, Jaime E; Rathouz, Paul J; Havlena, Jeffrey A; Zhao, Qianqian; Dailey, Seth H; Smith, Maureen A; Greenberg, Caprice C; Welham, Nathan V
2016-11-01
To characterize initial voice treatment selection following vocal fold mucosal resection in a Medicare population. Retrospective analysis of a large, nationally representative Medicare claims database. Patients with > 12 months of continuous Medicare coverage who underwent a leukoplakia- or cancer-related vocal fold mucosal resection (index) procedure during calendar years 2004 to 2009 were studied. The primary outcome of interest was receipt of initial voice treatment (thyroplasty, vocal fold injection, or speech therapy) following the index procedure. We evaluated the cumulative incidence of each postindex treatment type, treating the other treatment types as competing risks, and further evaluated postindex treatment utilization using the proportional hazards model for the subdistribution of a competing risk. Patient age, sex, and Medicaid eligibility were used as predictors. A total of 2,041 patients underwent 2,427 index procedures during the study period. In 14% of cases, an initial voice treatment event was identified. Women were significantly less likely to receive surgical or behavioral treatment compared to men. From age 65 to 75 years, the likelihood of undergoing surgical treatment increased significantly with each 5-year age increase; after age 75 years, the likelihood of undergoing either surgical or behavioral treatment decreased significantly every 5 years. Patients with low socioeconomic status were significantly less likely to undergo speech therapy. The majority of Medicare patients do not undergo voice treatment following vocal fold mucosal resection. Further, the treatments analyzed here appear disproportionally utilized based on patient sex, age, and socioeconomic status. Additional research is needed to determine whether these observations reflect clinically explainable differences or disparities in care. 2c. Laryngoscope, 126:2505-2512, 2016. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
Practice variations in voice treatment selection following vocal fold mucosal resection
Moore, Jaime E.; Rathouz, Paul J.; Havlena, Jeffrey A.; Zhao, Qianqian; Dailey, Seth H.; Smith, Maureen A.; Greenberg, Caprice C.; Welham, Nathan V.
2016-01-01
Objective To characterize initial voice treatment selection following vocal fold mucosal resection in a Medicare population. Study Design Retrospective analysis of a large, nationally-representative Medicare claims database. Methods Patients with >12 months of continuous Medicare coverage who underwent a leukoplakia- or cancer-related vocal fold mucosal resection (index) procedure during calendar years 2004–2009 were studied. The primary outcome of interest was receipt of initial voice treatment (thyroplasty, vocal fold injection, or speech therapy) following the index procedure. We evaluated the cumulative incidence of each post-index treatment type treating the other treatment types as competing risks, and further evaluated post-index treatment utilization using the proportional hazards model for the subdistribution of a competing risk. Patient age, sex and Medicaid eligibility were used as predictors. Results 2041 patients underwent 2427 index procedures during the study period. An initial voice treatment event was identified in 14% of cases. Women were significantly less likely to receive surgical or behavioral treatment compared to men. From age 65–75 years, the likelihood of undergoing surgical treatment increased significantly with each 5-year age increase; after age 75 years, the likelihood of undergoing either surgical or behavioral treatment decreased significantly every 5 years. Patients with low socioeconomic status were significantly less likely to undergo speech therapy. Conclusions The majority of Medicare patients do not undergo voice treatment following vocal fold mucosal resection. Further, the treatments analyzed here appear disproportionally utilized based on patient sex, age and socioeconomic status. Additional research is needed to determine whether these observations reflect clinically explainable differences or disparities in care. Level of Evidence 2c PMID:26972900
Speech and Language: Translating the Genome.
Deriziotis, Pelagia; Fisher, Simon E
2017-09-01
Investigation of the biological basis of human speech and language is being transformed by developments in molecular technologies, including high-throughput genotyping and next-generation sequencing of whole genomes. These advances are shedding new light on the genetic architecture underlying language-related disorders (speech apraxia, specific language impairment, developmental dyslexia) as well as that contributing to variation in relevant skills in the general population. We discuss how state-of-the-art methods are uncovering a range of genetic mechanisms, from rare mutations of large effect to common polymorphisms that increase risk in a subtle way, while converging on neurogenetic pathways that are shared between distinct disorders. We consider the future of the field, highlighting the unusual challenges and opportunities associated with studying genomics of language-related traits. Copyright © 2017 Elsevier Ltd. All rights reserved.
Khwaileh, Tariq; Body, Richard; Herbert, Ruth
2014-12-01
Research into lexical retrieval requires pictorial stimuli standardised for key psycholinguistic variables. Such databases exist in a number of languages but not in Arabic. In addition there are few studies of the effects of psycholinguistic and morpho-syntactic variables on Arabic lexical retrieval. The current study identified a set of culturally and linguistically appropriate concept labels, and corresponding photographic representations for Levantine Arabic. The set included masculine and feminine nouns, nouns from both types of plural formation (sound and broken), and both rational and irrational nouns. Levantine Arabic speakers provided norms for visual complexity, imageability, age of acquisition, naming latency and name agreement. This delivered a normative database for a set of 186 Arabic nouns. The effects of the morpho-syntactic and the psycholinguistic variables on lexical retrieval were explored using the database. Imageability and age of acquisition were the only significant determinants of successful lexical retrieval in Arabic. None of the other variables, including all the linguistic variables, had any effect on production time. The normative database is available for the use of clinicians and researchers in the Arab world in the domains of speech and language pathology, neurolinguistics and psycholinguistics. The database and the photographic representations will be soon available for free download from the first author's personal webpage or via email.
Audio stream classification for multimedia database search
NASA Astrophysics Data System (ADS)
Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.
2013-03-01
Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.
Very Large Scale Integration (VLSI).
ERIC Educational Resources Information Center
Yeaman, Andrew R. J.
Very Large Scale Integration (VLSI), the state-of-the-art production techniques for computer chips, promises such powerful, inexpensive computing that, in the future, people will be able to communicate with computer devices in natural language or even speech. However, before full-scale VLSI implementation can occur, certain salient factors must be…
McCreery, Ryan W.; Walker, Elizabeth A.; Spratford, Meredith; Oleson, Jacob; Bentler, Ruth; Holte, Lenore; Roush, Patricia
2015-01-01
Objectives Progress has been made in recent years in the provision of amplification and early intervention for children who are hard of hearing. However, children who use hearing aids (HA) may have inconsistent access to their auditory environment due to limitations in speech audibility through their HAs or limited HA use. The effects of variability in children’s auditory experience on parent-report auditory skills questionnaires and on speech recognition in quiet and in noise were examined for a large group of children who were followed as part of the Outcomes of Children with Hearing Loss study. Design Parent ratings on auditory development questionnaires and children’s speech recognition were assessed for 306 children who are hard of hearing. Children ranged in age from 12 months to 9 years of age. Three questionnaires involving parent ratings of auditory skill development and behavior were used, including the LittlEARS Auditory Questionnaire, Parents Evaluation of Oral/Aural Performance in Children Rating Scale, and an adaptation of the Speech, Spatial and Qualities of Hearing scale. Speech recognition in quiet was assessed using the Open and Closed set task, Early Speech Perception Test, Lexical Neighborhood Test, and Phonetically-balanced Kindergarten word lists. Speech recognition in noise was assessed using the Computer-Assisted Speech Perception Assessment. Children who are hard of hearing were compared to peers with normal hearing matched for age, maternal educational level and nonverbal intelligence. The effects of aided audibility, HA use and language ability on parent responses to auditory development questionnaires and on children’s speech recognition were also examined. Results Children who are hard of hearing had poorer performance than peers with normal hearing on parent ratings of auditory skills and had poorer speech recognition. Significant individual variability among children who are hard of hearing was observed. Children with greater aided audibility through their HAs, more hours of HA use and better language abilities generally had higher parent ratings of auditory skills and better speech recognition abilities in quiet and in noise than peers with less audibility, more limited HA use or poorer language abilities. In addition to the auditory and language factors that were predictive for speech recognition in quiet, phonological working memory was also a positive predictor for word recognition abilities in noise. Conclusions Children who are hard of hearing continue to experience delays in auditory skill development and speech recognition abilities compared to peers with normal hearing. However, significant improvements in these domains have occurred in comparison to similar data reported prior to the adoption of universal newborn hearing screening and early intervention programs for children who are hard of hearing. Increasing the audibility of speech has a direct positive effect on auditory skill development and speech recognition abilities, and may also enhance these skills by improving language abilities in children who are hard of hearing. Greater number of hours of HA use also had a significant positive impact on parent ratings of auditory skills and children’s speech recognition. PMID:26731160
Kreft, Heather A.
2014-01-01
Under normal conditions, human speech is remarkably robust to degradation by noise and other distortions. However, people with hearing loss, including those with cochlear implants, often experience great difficulty in understanding speech in noisy environments. Recent work with normal-hearing listeners has shown that the amplitude fluctuations inherent in noise contribute strongly to the masking of speech. In contrast, this study shows that speech perception via a cochlear implant is unaffected by the inherent temporal fluctuations of noise. This qualitative difference between acoustic and electric auditory perception does not seem to be due to differences in underlying temporal acuity but can instead be explained by the poorer spectral resolution of cochlear implants, relative to the normally functioning ear, which leads to an effective smoothing of the inherent temporal-envelope fluctuations of noise. The outcome suggests an unexpected trade-off between the detrimental effects of poorer spectral resolution and the beneficial effects of a smoother noise temporal envelope. This trade-off provides an explanation for the long-standing puzzle of why strong correlations between speech understanding and spectral resolution have remained elusive. The results also provide a potential explanation for why cochlear-implant users and hearing-impaired listeners exhibit reduced or absent masking release when large and relatively slow temporal fluctuations are introduced in noise maskers. The multitone maskers used here may provide an effective new diagnostic tool for assessing functional hearing loss and reduced spectral resolution. PMID:25315376
Alternating motion rate as an index of speech motor disorder in traumatic brain injury.
Wang, Yu-Tsai; Kent, Ray D; Duffy, Joseph R; Thomas, Jack E; Weismer, Gary
2004-01-01
The task of syllable alternating motion rate (AMR) (also called diadochokinesis) is suitable for examining speech disorders of varying degrees of severity and in individuals with varying levels of linguistic and cognitive ability. However, very limited information on this task has been published for subjects with traumatic brain injury (TBI). This study is a quantitative and qualitative acoustic analysis of AMR in seven subjects with TBI. The primary goal was to use acoustic analyses to assess speech motor control disturbances for the group as a whole and for individual patients. Quantitative analyses included measures of syllable rate, syllable and intersyllable gap durations, energy maxima, and voice onset time (VOT). Qualitative analyses included classification of features evident in spectrograms and waveforms to provide a more detailed description. The TBI group had (1) a slowed syllable rate due mostly to lengthened syllables and, to a lesser degree, lengthened intersyllable gaps, (2) highly correlated syllable rates between AMR and conversation, (3) temporal and energy maxima irregularities within repetition sequences, (4) normal median VOT values but with large variation, and (5) a number of speech production abnormalities revealed by qualitative analysis, including explosive speech quality, breathy voice quality, phonatory instability, multiple or missing stop bursts, continuous voicing, and spirantization. The relationships between these findings and TBI speakers' neurological status and dysarthria types are also discussed. It was concluded that acoustic analyses of the AMR task provides specific information on motor speech limitations in individuals with TBI.
Deep bottleneck features for spoken language identification.
Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong
2014-01-01
A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
Colburn, H. Steven
2016-01-01
Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. PMID:27698261
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments.
Mi, Jing; Colburn, H Steven
2016-10-03
Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. © The Author(s) 2016.
Perception and the temporal properties of speech
NASA Astrophysics Data System (ADS)
Gordon, Peter C.
1991-11-01
Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.
Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola
2015-11-06
Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.
Johnson, Erin Phinney; Pennington, Bruce F.; Lowenstein, Joanna H.; Nittrouer, Susan
2011-01-01
Purpose Children with speech sound disorder (SSD) and reading disability (RD) have poor phonological awareness, a problem believed to arise largely from deficits in processing the sensory information in speech, specifically individual acoustic cues. However, such cues are details of acoustic structure. Recent theories suggest that listeners also need to be able to integrate those details to perceive linguistically relevant form. This study examined abilities of children with SSD, RD, and SSD+RD not only to process acoustic cues but also to recover linguistically relevant form from the speech signal. Method Ten- to 11-year-olds with SSD (n = 17), RD (n = 16), SSD+RD (n = 17), and Controls (n = 16) were tested to examine their sensitivity to (1) voice onset times (VOT); (2) spectral structure in fricative-vowel syllables; and (3) vocoded sentences. Results Children in all groups performed similarly with VOT stimuli, but children with disorders showed delays on other tasks, although the specifics of their performance varied. Conclusion Children with poor phonemic awareness not only lack sensitivity to acoustic details, but are also less able to recover linguistically relevant forms. This is contrary to one of the main current theories of the relation between spoken and written language development. PMID:21329941
Guidi, Andrea; Salvi, Sergio; Ottaviano, Manuel; Gentili, Claudio; Bertschy, Gilles; de Rossi, Danilo; Scilingo, Enzo Pasquale; Vanello, Nicola
2015-01-01
Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented. PMID:26561811
Phase effects in masking by harmonic complexes: speech recognition.
Deroche, Mickael L D; Culling, John F; Chatterjee, Monita
2013-12-01
Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.
Abnormal Brain Dynamics Underlie Speech Production in Children with Autism Spectrum Disorder.
Pang, Elizabeth W; Valica, Tatiana; MacDonald, Matt J; Taylor, Margot J; Brian, Jessica; Lerch, Jason P; Anagnostou, Evdokia
2016-02-01
A large proportion of children with autism spectrum disorder (ASD) have speech and/or language difficulties. While a number of structural and functional neuroimaging methods have been used to explore the brain differences in ASD with regards to speech and language comprehension and production, the neurobiology of basic speech function in ASD has not been examined. Magnetoencephalography (MEG) is a neuroimaging modality with high spatial and temporal resolution that can be applied to the examination of brain dynamics underlying speech as it can capture the fast responses fundamental to this function. We acquired MEG from 21 children with high-functioning autism (mean age: 11.43 years) and 21 age- and sex-matched controls as they performed a simple oromotor task, a phoneme production task and a phonemic sequencing task. Results showed significant differences in activation magnitude and peak latencies in primary motor cortex (Brodmann Area 4), motor planning areas (BA 6), temporal sequencing and sensorimotor integration areas (BA 22/13) and executive control areas (BA 9). Our findings of significant functional brain differences between these two groups on these simple oromotor and phonemic tasks suggest that these deficits may be foundational and could underlie the language deficits seen in ASD. © 2015 The Authors Autism Research published by Wiley Periodicals, Inc. on behalf of International Society for Autism Research.
Common cues to emotion in the dynamic facial expressions of speech and song
Livingstone, Steven R.; Thompson, William F.; Wanderley, Marcelo M.; Palmer, Caroline
2015-01-01
Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech–song differences. Vocalists’ jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech–song. Vocalists’ emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists’ facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production. PMID:25424388
The Role of Corticostriatal Systems in Speech Category Learning.
Yi, Han-Gyol; Maddox, W Todd; Mumford, Jeanette A; Chandrasekaran, Bharath
2016-04-01
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
You 'have' to hear this: Using tone of voice to motivate others.
Weinstein, Netta; Zougkou, Konstantina; Paulmann, Silke
2018-06-01
The present studies explored the role of prosody in motivating others, and applied self-determination theory (Ryan & Deci, 2000) to do so. Initial studies describe patterns of prosody that discriminate motivational speech. Autonomy support was expressed with lower intensity, slower speech rate and less voice energy in both motivationally laden and neutral (but motivationally primed) sentences. In a follow-up study, participants were able to recognize motivational prosody in semantically neutral sentences, suggesting prosody alone may carry motivational content. Findings from subsequent studies also showed that an autonomy-supportive as compared with a controlling tone facilitated positive personal (perceived choice and lower perceived pressure, well-being) and interpersonal (closeness to others and prosocial behaviors) outcomes commonly linked to this type of motivation. Results inform both the social psychology (in particular motivation) and psycho-linguistic (in particular prosody) literatures and offer a first description of how motivational tone alone can shape listeners' experiences. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Rehabilitation of language in expressive aphasias: a literature review
da Fontoura, Denise Ren; Rodrigues, Jaqueline de Carvalho; Carneiro, Luciana Behs de Sá; Monção, Ana Maria; de Salles, Jerusa Fumagalli
2012-01-01
Objective This paper reviews the methodological characteristics of studies on rehabilitation of expressive aphasia, describing the techniques of rehabilitation used. Methods The databases Medline, Science Direct and PubMed were searched for relevant articles (January 1999 to December 2011) using the keywords Expressive / Broca / Nonfluent Aphasia, combined with Language or Speech Rehabilitation / Therapy / Intervention. Results A total of 56 articles were retrieved describing rehabilitation techniques, including 22 with a focus on lexical processing, 18 on syntax stimulation, seven with the aim of developing speech and nine with multiple foci. Conclusion A variety of techniques and theoretical approaches are available, highlighting the heterogeneity of research in this area. This diversity can be justified by the uniqueness of patients' language deficits, making it difficult to generalize. In addition, there is a need to combine the formal measures of tests with measures of pragmatic and social skills of communication to determine the effect of rehabilitation on the patient's daily life. PMID:29213802
Speech sound articulation abilities of preschool-age children who stutter.
Clark, Chagit E; Conture, Edward G; Walden, Tedra A; Lambert, Warren E
2013-12-01
The purpose of this study was to assess the association between speech sound articulation and childhood stuttering in a relatively large sample of preschool-age children who do and do not stutter, using the Goldman-Fristoe Test of Articulation-2 (GFTA-2; Goldman & Fristoe, 2000). Participants included 277 preschool-age children who do (CWS; n=128, 101 males) and do not stutter (CWNS; n=149, 76 males). Generalized estimating equations (GEE) were performed to assess between-group (CWS versus CWNS) differences on the GFTA-2. Additionally, within-group correlations were performed to explore the relation between CWS' speech sound articulation abilities and their stuttering frequency and severity, as well as their sound prolongation index (SPI; Schwartz & Conture, 1988). No significant differences were found between the articulation scores of preschool-age CWS and CWNS. However, there was a small gender effect for the 5-year-old age group, with girls generally exhibiting better articulation scores than boys. Additional findings indicated no relation between CWS' speech sound articulation abilities and their stuttering frequency, severity, or SPI. Findings suggest no apparent association between speech sound articulation-as measured by one standardized assessment (GFTA-2)-and childhood stuttering for this sample of preschool-age children (N=277). After reading this article, the reader will be able to: (1) discuss salient issues in the articulation literature relative to children who stutter; (2) compare/contrast the present study's methodologies and main findings to those of previous studies that investigated the association between childhood stuttering and speech sound articulation; (3) identify future research needs relative to the association between childhood stuttering and speech sound development; (4) replicate the present study's methodology to expand this body of knowledge. Copyright © 2013 Elsevier Inc. All rights reserved.
Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition
Rigoulot, Simon; Wassiliwizky, Eugen; Pell, Marc D.
2013-01-01
Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereas happiness and disgust are recognized relatively slowly; Pell and Kotz, 2011). To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated syllable-by-syllable from the offset rather than the onset of the stimulus). Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell and Kotz (2011). We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners' accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fear, anger, sadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgust were recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400–1200 ms time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech. PMID:23805115
Stop consonant voicing in young children's speech: Evidence from a cross-sectional study
NASA Astrophysics Data System (ADS)
Ganser, Emily
There are intuitive reasons to believe that speech-sound acquisition and language acquisition should be related in development. Surprisingly, only recently has research begun to parse just how the two might be related. This study investigated possible correlations between speech-sound acquisition and language acquisition, as part of a large-scale, longitudinal study of the relationship between different types of phonological development and vocabulary growth in the preschool years. Productions of voiced and voiceless stop-initial words were recorded from 96 children aged 28-39 months. Voice Onset Time (VOT, in ms) for each token context was calculated. A mixed-model logistic regression was calculated which predicted whether the sound was intended to be voiced or voiceless based on its VOT. This model estimated the slopes of the logistic function for each child. This slope was referred to as Robustness of Contrast (based on Holliday, Reidy, Beckman, and Edwards, 2015), defined as being the degree of categorical differentiation between the production of two speech sounds or classes of sounds, in this case, voiced and voiceless stops. Results showed a wide range of slopes for individual children, suggesting that slope-derived Robustness of Contrast could be a viable means of measuring a child's acquisition of the voicing contrast. Robustness of Contrast was then compared to traditional measures of speech and language skills to investigate whether there was any correlation between the production of stop voicing and broader measures of speech and language development. The Robustness of Contrast measure was found to correlate with all individual measures of speech and language, suggesting that it might indeed be predictive of later language skills.
Lawler, Marshall; Yu, Jeffrey; Aronoff, Justin M
Although speech perception is the gold standard for measuring cochlear implant (CI) users' performance, speech perception tests often require extensive adaptation to obtain accurate results, particularly after large changes in maps. Spectral ripple tests, which measure spectral resolution, are an alternate measure that has been shown to correlate with speech perception. A modified spectral ripple test, the spectral-temporally modulated ripple test (SMRT) has recently been developed, and the objective of this study was to compare speech perception and performance on the SMRT for a heterogeneous population of unilateral CI users, bilateral CI users, and bimodal users. Twenty-five CI users (eight using unilateral CIs, nine using bilateral CIs, and eight using a CI and a hearing aid) were tested on the Arizona Biomedical Institute Sentence Test (AzBio) with a +8 dB signal to noise ratio, and on the SMRT. All participants were tested with their clinical programs. There was a significant correlation between SMRT and AzBio performance. After a practice block, an improvement of one ripple per octave for SMRT corresponded to an improvement of 12.1% for AzBio. Additionally, there was no significant difference in slope or intercept between any of the CI populations. The results indicate that performance on the SMRT correlates with speech recognition in noise when measured across unilateral, bilateral, and bimodal CI populations. These results suggest that SMRT scores are strongly associated with speech recognition in noise ability in experienced CI users. Further studies should focus on increasing both the size and diversity of the tested participants, and on determining whether the SMRT technique can be used for early predictions of long-term speech scores, or for evaluating differences among different stimulation strategies or parameter settings.
Effects of Compression on Speech Acoustics, Intelligibility, and Sound Quality
Souza, Pamela E.
2002-01-01
The topic of compression has been discussed quite extensively in the last 20 years (eg, Braida et al., 1982; Dillon, 1996, 2000; Dreschler, 1992; Hickson, 1994; Kuk, 2000 and 2002; Kuk and Ludvigsen, 1999; Moore, 1990; Van Tasell, 1993; Venema, 2000; Verschuure et al., 1996; Walker and Dillon, 1982). However, the latest comprehensive update by this journal was published in 1996 (Kuk, 1996). Since that time, use of compression hearing aids has increased dramatically, from half of hearing aids dispensed only 5 years ago to four out of five hearing aids dispensed today (Strom, 2002b). Most of today's digital and digitally programmable hearing aids are compression devices (Strom, 2002a). It is probable that within a few years, very few patients will be fit with linear hearing aids. Furthermore, compression has increased in complexity, with greater numbers of parameters under the clinician's control. Ideally, these changes will translate to greater flexibility and precision in fitting and selection. However, they also increase the need for information about the effects of compression amplification on speech perception and speech quality. As evidenced by the large number of sessions at professional conferences on fitting compression hearing aids, clinicians continue to have questions about compression technology and when and how it should be used. How does compression work? Who are the best candidates for this technology? How should adjustable parameters be set to provide optimal speech recognition? What effect will compression have on speech quality? These and other questions continue to drive our interest in this technology. This article reviews the effects of compression on the speech signal and the implications for speech intelligibility, quality, and design of clinical procedures. PMID:25425919
Swallowing sounds in speech therapy practice: a critical analysis of the literature
Ferrucci, Juliana Lopes; Mangilli, Laura Davison; Sassi, Fernanda Chiarion; Limongi, Suelly Cecilia Olivan; de Andrade, Claudia Regina Furquim
2013-01-01
ABSTRACT This study aimed to investigate international scientific papers published on the subject of cervical auscultation and its use in speech therapy. The study involved a qualitative review of the literature spanning the last 10 years. Articles were selected from the PubMed database using the following keywords: cervical auscultation, swallowing and swallowing disorders. Research was included that was conducted on adult humans (over 18 years of age) and was written in English. Each citation retrieved from the database was analyzed independently by each of the study researchers to ascertain its relevance for inclusion in the study. The methodology involved formulating the research question, locating and selecting studies and critically evaluating the articles according to the precepts of the Cochrane Handbook. As a result, 35 studies were identified; 13 articles were analyzed because they allowed access to the full text and were related directly to the subject. We found that the studies were performed with groups of healthy subjects and subjects with different types of base pathology. Some studies compared the patterns found in the different groups. Some of the research sought to study the pattern of swallowing sounds with different factors - evaluator experience, the specificity and sensitivity of the method and how to improve the technique of cervical auscultation through the use of instruments other than the stethoscope. The conclusion of this critical analysis is that cervical auscultation is an important tool to be used in conjunction with other assessment methods in the routine clinical evaluation of swallowing. PMID:24488399
Al Otaiba, Stephanie; Puranik, Cynthia; Zilkowski, Robin; Curran, Tricia
2009-01-01
This article reviews research examining the efficacy of early phonological interventions for young students identified with Speech or Language impairments. Eighteen studies are included, providing results for nearly 500 students in preschool through third grade. Although findings were generally positive, there were large individual differences in response to intervention. Further, there was little evidence that interventions enabled students to catch up in phonological or reading skills to typically developing peers. Methodological issues are described and implications for practice and future research are discussed. PMID:20161557
Brabenec, L; Mekyska, J; Galaz, Z; Rektorova, Irena
2017-03-01
Hypokinetic dysarthria (HD) occurs in 90% of Parkinson's disease (PD) patients. It manifests specifically in the areas of articulation, phonation, prosody, speech fluency, and faciokinesis. We aimed to systematically review papers on HD in PD with a special focus on (1) early PD diagnosis and monitoring of the disease progression using acoustic voice and speech analysis, and (2) functional imaging studies exploring neural correlates of HD in PD, and (3) clinical studies using acoustic analysis to evaluate effects of dopaminergic medication and brain stimulation. A systematic literature search of articles written in English before March 2016 was conducted in the Web of Science, PubMed, SpringerLink, and IEEE Xplore databases using and combining specific relevant keywords. Articles were categorized into three groups: (1) articles focused on neural correlates of HD in PD using functional imaging (n = 13); (2) articles dealing with the acoustic analysis of HD in PD (n = 52); and (3) articles concerning specifically dopaminergic and brain stimulation-related effects as assessed by acoustic analysis (n = 31); the groups were then reviewed. We identified 14 combinations of speech tasks and acoustic features that can be recommended for use in describing the main features of HD in PD. While only a few acoustic parameters correlate with limb motor symptoms and can be partially relieved by dopaminergic medication, HD in PD seems to be mainly related to non-dopaminergic deficits and associated particularly with non-motor symptoms. Future studies should combine non-invasive brain stimulation with voice behavior approaches to achieve the best treatment effects by enhancing auditory-motor integration.
Right Ear Advantage of Speech Audiometry in Single-sided Deafness.
Wettstein, Vincent G; Probst, Rudolf
2018-04-01
Postlingual single-sided deafness (SSD) is defined as normal hearing in one ear and severely impaired hearing in the other ear. A right ear advantage and dominance of the left hemisphere are well established findings in individuals with normal hearing and speech processing. Therefore, it seems plausible that a right ear advantage would exist in patients with SSD. The audiometric database was searched to identify patients with SSD. Results from the German monosyllabic Freiburg word test and four-syllabic number test in quiet were evaluated. Results of right-sided SSD were compared with left-sided SSD. Statistical calculations were done with the Mann-Whitney U test. Four hundred and six patients with SSD were identified, 182 with right-sided and 224 with left-sided SSD. The two groups had similar pure-tone thresholds without significant differences. All test parameters of speech audiometry had better values for right ears (SSD left) when compared with left ears (SSD right). Statistically significant results (p < 0.05) were found for a weighted score (social index, 98.2 ± 4% right and 97.5 ± 4.7% left, p < 0.026), for word understanding at 60 dB SPL (95.2 ± 8.7% right and 93.9 ± 9.1% left, p < 0.035), and for the level at which 100% understanding was reached (61.5 ± 10.1 dB SPL right and 63.8 ± 11.1 dB SPL left, p < 0.022) on a performance-level function. A right ear advantage of speech audiometry was found in patients with SSD in this retrospective study of audiometric test results.
Relation between speech-in-noise threshold, hearing loss and cognition from 40-69 years of age.
Moore, David R; Edmondson-Jones, Mark; Dawes, Piers; Fortnum, Heather; McCormack, Abby; Pierzycki, Robert H; Munro, Kevin J
2014-01-01
Healthy hearing depends on sensitive ears and adequate brain processing. Essential aspects of both hearing and cognition decline with advancing age, but it is largely unknown how one influences the other. The current standard measure of hearing, the pure-tone audiogram is not very cognitively demanding and does not predict well the most important yet challenging use of hearing, listening to speech in noisy environments. We analysed data from UK Biobank that asked 40-69 year olds about their hearing, and assessed their ability on tests of speech-in-noise hearing and cognition. About half a million volunteers were recruited through NHS registers. Respondents completed 'whole-body' testing in purpose-designed, community-based test centres across the UK. Objective hearing (spoken digit recognition in noise) and cognitive (reasoning, memory, processing speed) data were analysed using logistic and multiple regression methods. Speech hearing in noise declined exponentially with age for both sexes from about 50 years, differing from previous audiogram data that showed a more linear decline from <40 years for men, and consistently less hearing loss for women. The decline in speech-in-noise hearing was especially dramatic among those with lower cognitive scores. Decreasing cognitive ability and increasing age were both independently associated with decreasing ability to hear speech-in-noise (0.70 and 0.89 dB, respectively) among the population studied. Men subjectively reported up to 60% higher rates of difficulty hearing than women. Workplace noise history associated with difficulty in both subjective hearing and objective speech hearing in noise. Leisure noise history was associated with subjective, but not with objective difficulty hearing. Older people have declining cognitive processing ability associated with reduced ability to hear speech in noise, measured by recognition of recorded spoken digits. Subjective reports of hearing difficulty generally show a higher prevalence than objective measures, suggesting that current objective methods could be extended further.
Perceptual sensitivity to spectral properties of earlier sounds during speech categorization.
Stilp, Christian E; Assgari, Ashley A
2018-02-28
Speech perception is heavily influenced by surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, this can produce spectral contrast effects (SCEs) that bias perception of later sounds. For example, when context sounds have more energy in low-F 1 frequency regions, listeners report more high-F 1 responses to a target vowel, and vice versa. SCEs have been reported using various approaches for a wide range of stimuli, but most often, large spectral peaks were added to the context to bias speech categorization. This obscures the lower limit of perceptual sensitivity to spectral properties of earlier sounds, i.e., when SCEs begin to bias speech categorization. Listeners categorized vowels (/ɪ/-/ɛ/, Experiment 1) or consonants (/d/-/g/, Experiment 2) following a context sentence with little spectral amplification (+1 to +4 dB) in frequency regions known to produce SCEs. In both experiments, +3 and +4 dB amplification in key frequency regions of the context produced SCEs, but lesser amplification was insufficient to bias performance. This establishes a lower limit of perceptual sensitivity where spectral differences across sounds can bias subsequent speech categorization. These results are consistent with proposed adaptation-based mechanisms that potentially underlie SCEs in auditory perception. Recent sounds can change what speech sounds we hear later. This can occur when the average frequency composition of earlier sounds differs from that of later sounds, biasing how they are perceived. These "spectral contrast effects" are widely observed when sounds' frequency compositions differ substantially. We reveal the lower limit of these effects, as +3 dB amplification of key frequency regions in earlier sounds was enough to bias categorization of the following vowel or consonant sound. Speech categorization being biased by very small spectral differences across sounds suggests that spectral contrast effects occur frequently in everyday speech perception.
Relation between Speech-in-Noise Threshold, Hearing Loss and Cognition from 40–69 Years of Age
Moore, David R.; Edmondson-Jones, Mark; Dawes, Piers; Fortnum, Heather; McCormack, Abby; Pierzycki, Robert H.; Munro, Kevin J.
2014-01-01
Background Healthy hearing depends on sensitive ears and adequate brain processing. Essential aspects of both hearing and cognition decline with advancing age, but it is largely unknown how one influences the other. The current standard measure of hearing, the pure-tone audiogram is not very cognitively demanding and does not predict well the most important yet challenging use of hearing, listening to speech in noisy environments. We analysed data from UK Biobank that asked 40–69 year olds about their hearing, and assessed their ability on tests of speech-in-noise hearing and cognition. Methods and Findings About half a million volunteers were recruited through NHS registers. Respondents completed ‘whole-body’ testing in purpose-designed, community-based test centres across the UK. Objective hearing (spoken digit recognition in noise) and cognitive (reasoning, memory, processing speed) data were analysed using logistic and multiple regression methods. Speech hearing in noise declined exponentially with age for both sexes from about 50 years, differing from previous audiogram data that showed a more linear decline from <40 years for men, and consistently less hearing loss for women. The decline in speech-in-noise hearing was especially dramatic among those with lower cognitive scores. Decreasing cognitive ability and increasing age were both independently associated with decreasing ability to hear speech-in-noise (0.70 and 0.89 dB, respectively) among the population studied. Men subjectively reported up to 60% higher rates of difficulty hearing than women. Workplace noise history associated with difficulty in both subjective hearing and objective speech hearing in noise. Leisure noise history was associated with subjective, but not with objective difficulty hearing. Conclusions Older people have declining cognitive processing ability associated with reduced ability to hear speech in noise, measured by recognition of recorded spoken digits. Subjective reports of hearing difficulty generally show a higher prevalence than objective measures, suggesting that current objective methods could be extended further. PMID:25229622
Krüger, H P
1989-02-01
The term "speech chronemics" is introduced to characterize a research strategy which extracts from the physical qualities of the speech signal only the pattern of ons ("speaking") and offs ("pausing"). The research in this field can be structured into the methodological dimension "unit of time", "number of speakers", and "quality of the prosodic measures". It is shown that a researcher's actual decision for one method largely determines the outcome of his study. Then, with the Logoport a new portable measurement device is presented. It enables the researcher to study speaking behavior over long periods of time (up to 24 hours) in the normal environment of his subjects. Two experiments are reported. The first shows the validity of articulation pauses for variations in the physiological state of the organism. The second study proves a new betablocking agent to have sociotropic effects: in a long-term trial socially high-strung subjects showed an improved interaction behavior (compared to placebo and socially easy-going persons) in their everyday life. Finally, the need for a comprehensive theoretical foundation and for standardization of measurement situations and methods is emphasized.
Guo, Ruiling; Bain, Barbara A.; Willer, Janene
2008-01-01
Objectives: The research assesses the information needs of speech-language pathologists (SLPs) and audiologists in Idaho and identifies specific needs for training in evidence-based practice (EBP) principles and searching EBP resources. Methods: A survey was developed to assess knowledge and skills in accessing information. Questionnaires were distributed to 217 members of the Idaho Speech-Language-Hearing Association, who were given multiple options to return the assessment survey (web, email, mail). Data were analyzed descriptively and statistically. Results: The total response rate was 38.7% (84/217). Of the respondents, 87.0% (73/84) indicated insufficient knowledge and skills to search PubMed. Further, 47.6% (40/84) indicated limited knowledge of EBP. Of professionals responding, 52.4% (44/84) reported interest in learning more about EBP and 47.6% (40/84) reported interest in learning to search PubMed. SLPs and audiologists who graduated within the last 10 years were more likely to respond online, while those graduating prior to that time preferred to respond via hard copy. Discussions/Conclusion: More effort should be made to ensure that SLPs and audiologists develop skills in locating information to support their practice. Results from this information needs assessment were used to design a training and outreach program on EBP and EBP database searching for SLPs and audiologists in Idaho. PMID:18379669
Testing the influence of external and internal cues on smoking motivation using a community sample.
Litvin, Erika B; Brandon, Thomas H
2010-02-01
Exposing smokers to either external cues (e.g., pictures of cigarettes) or internal cues (e.g., negative affect induction) can induce urge to smoke and other behavioral and physiological responses. However, little is known about whether the two types of cues interact when presented in close proximity, as is likely the case in the real word. Additionally, potential moderators of cue reactivity have rarely been examined. Finally, few cue-reactivity studies have used representative samples of smokers. In a randomized 2 x 2 crossed factorial between-subjects design, the current study tested the effects of a negative affect cue intended to produce anxiety (speech preparation task) and an external smoking cue on urge and behavioral reactivity in a community sample of adult smokers (N = 175), and whether trait impulsivity moderated the effects. Both types of cues produced main effects on urges to smoke, despite the speech task failing to increase anxiety significantly. The speech task increased smoking urge related to anticipation of negative affect relief, whereas the external smoking cues increased urges related to anticipation of pleasure; however, the cues did not interact. Impulsivity measures predicted urge and other smoking-related variables, but did not moderate cue-reactivity. Results suggest independent rather than synergistic effects of these contributors to smoking motivation. (PsycINFO Database Record (c) 2010 APA, all rights reserved).
Demir, Özlem Ece; Fisher, Joan A; Goldin-Meadow, Susan; Levine, Susan C
2014-03-01
Narrative skill in kindergarteners has been shown to be a reliable predictor of later reading comprehension and school achievement. However, we know little about how to scaffold children's narrative skill. Here we examine whether the quality of kindergarten children's narrative retellings depends on the kind of narrative elicitation they are given. We asked this question with respect to typically developing (TD) kindergarten children and children with pre- or perinatal unilateral brain injury (PL), a group that has been shown to have difficulty with narrative production. We compared children's skill in retelling stories originally presented to them in 4 different elicitation formats: (a) wordless cartoons, (b) stories told by a narrator through the auditory modality, (c) stories told by a narrator through the audiovisual modality without co-speech gestures, and (e) stories told by a narrator in the audiovisual modality with co-speech gestures. We found that children told better structured narratives in response to the audiovisual + gesture elicitation format than in response to the other 3 elicitation formats, consistent with findings that co-speech gestures can scaffold other aspects of language and memory. The audiovisual + gesture elicitation format was particularly beneficial for children who had the most difficulty telling a well-structured narrative, a group that included children with larger lesions associated with cerebrovascular infarcts. PsycINFO Database Record (c) 2014 APA, all rights reserved.
New Ideas for Speech Recognition and Related Technologies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzrichter, J F
The ideas relating to the use of organ motion sensors for the purposes of speech recognition were first described by.the author in spring 1994. During the past year, a series of productive collaborations between the author, Tom McEwan and Larry Ng ensued and have lead to demonstrations, new sensor ideas, and algorithmic descriptions of a large number of speech recognition concepts. This document summarizes the basic concepts of recognizing speech once organ motions have been obtained. Micro power radars and their uses for the measurement of body organ motions, such as those of the heart and lungs, have been demonstratedmore » by Tom McEwan over the past two years. McEwan and I conducted a series of experiments, using these instruments, on vocal organ motions beginning in late spring, during which we observed motions of vocal folds (i.e., cords), tongue, jaw, and related organs that are very useful for speech recognition and other purposes. These will be reviewed in a separate paper. Since late summer 1994, Lawrence Ng and I have worked to make many of the initial recognition ideas more rigorous and to investigate the applications of these new ideas to new speech recognition algorithms, to speech coding, and to speech synthesis. I introduce some of those ideas in section IV of this document, and we describe them more completely in the document following this one, UCRL-UR-120311. For the design and operation of micro-power radars and their application to body organ motions, the reader may contact Tom McEwan directly. The capability for using EM sensors (i.e., radar units) to measure body organ motions and positions has been available for decades. Impediments to their use appear to have been size, excessive power, lack of resolution, and lack of understanding of the value of organ motion measurements, especially as applied to speech related technologies. However, with the invention of very low power, portable systems as demonstrated by McEwan at LLNL researchers have begun to think differently about practical applications of such radars. In particular, his demonstrations of heart and lung motions have opened up many new areas of application for human and animal measurements.« less
A speech processing study using an acoustic model of a multiple-channel cochlear implant
NASA Astrophysics Data System (ADS)
Xu, Ying
1998-10-01
A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and channel numbers were not beneficial. Manipulations of stimulation rate and number of activated channels did not appreciably affect consonant recognition. These results suggest that overall speech performance may improve by appropriately increasing stimulation rate and number of activated channels. Future revision of this acoustic model is necessary to provide more accurate amplitude representation of speech.
Afrashtehfar, Kelvin I
2016-06-01
Data sourcesMedline, Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effects, Cochrane Central Register of Controlled Trials, Virtual Health Library and Web of Science were systematically searched up to July 2015 without limitations. Scopus, Google Scholar, ClinicalTrials.gov, the ISRCTN registry as well as reference lists of the trials included and relevant reviews were manually searched.Study selectionRandomised (RCTs) and prospective non-randomised clinical trials (non-RCTs) on human patients that compared therapeutic and adverse effects of lingual and labial appliances were considered. One reviewer initially screened titles and subsequently two reviewers independently screened the selected abstracts and full texts.Data extraction and synthesisThe data were extracted independently by the reviewers. Missing or unclear information, ongoing trials and raw data from split-mouth trials were requested from the authors of the trials. The quality of the included trials and potential bias across studies were assessed using Cochrane's risk of bias tool and the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. For parallel trials, mean difference (MD) and the relative risk (RR) were used for continuous (objective speech performance, subjective speech performance, intercanine width, intermolar width and sagittal anchorage loss) and binary outcomes (eating difficulty), respectively. The standardised mean difference (SMD) was chosen to pool, after conversion, the outcome (oral discomfort) that assessed both binary and continuous. Random-effects meta-analyses were conducted, followed by subgroup and sensitivity analyses.ResultsThirteen papers pertaining to 11 clinical trials (three parallel RCTs, one split-mouth RCT and seven parallel prospective non-RCTs) were included with a total of 407 (34% male/66% female) patients. All trials had at least one bias domain at high risk of bias. Compared with labial appliances, lingual appliances were associated with increased overall oral discomfort, increased speech impediment (measured using auditory analysis), worse speech performance assessed by laypersons, increased eating difficulty and decreased intermolar width. On the other hand, lingual appliances were associated with increased intercanine width and significantly decreased anchorage loss of the maxillary first molar during space closure. However, the quality of all analyses included was judged as very low because of the high risk of bias of the included trials, inconsistency and imprecision.ConclusionsBased on existing trials there is insufficient evidence to make robust recommendations for lingual fixed orthodontic appliances regarding their therapeutic or adverse effects, as the quality of evidence was low.
Local television news coverage of President Clinton's introduction of the Health Security Act.
Dorfman, L; Schauffler, H H; Wilkerson, J; Feinson, J
1996-04-17
To investigate how local television news reported on health system reform during the week President Clinton presented his health system reform bill. Retrospective content analysis of the 1342-page Health Security Act of 1993, the printed text of President Clinton's speech before Congress on September 22, 1993, and a sample of local television news stories on health system reform broadcast during the week of September 19 through 25, 1993. The state of California. During the week, 316 television news stories on health system reform were aired during the 166 local news broadcasts sampled. Health system reform was the second most frequently reported topic, second to stories on violent crime. News stories on health system reform averaged 1 minute 38 seconds in length, compared with 57 seconds for violent crime. Fifty-seven percent of the local news stories focused on interest group politics. Compared with the content of the Health Security Act, local news broadcasts devoted a significantly greater portion of their stories to financing, eligibility, and preventive services. Local news stories gave significantly less attention to cost-saving mechanisms, long-term care benefits, and changes in Medicare and Medicaid, and less than 2% of stories mentioned quality assurance mechanisms, malpractice reform, or new public health initiatives. Of the 316 televised news stories, 53 reported on the president's speech, covering many of the same topics emphasized in the speech (financing, organization and administration, and eligibility) and de-emphasizing many of the same topics (Medicare and Medicaid, quality assurance, and malpractice reform). Two percent of the president's speech covered partisan politics; 45% of the local news stories on the speech featured challenges from partisan politicians. Although health system reform was the focus of a large number of local television news stories during the week, in-depth explanation was scarce. In general, the news stories provided superficial coverage framed largely in terms of the risks and costs of reform to specific stakeholders.
[On factors that effect the variability of central mechanisms of bilingualism].
Kruchinina, O V; Gal'perina, E I; Kats, E É; Shepoval'nikov, A N
2012-01-01
The article discusses the probable role of many factors that determine the individual variety of neurophysiological mechanisms, which provide the opportunity to learn and free use two or more languages. The formation of a speech functions is affected by both the general factors for bilinguals and monolinguals, as well as the specific characteristic of the situation of bilingualism. The general factors include genetic and environmental impact of explaining the diversity of individual options for the development of morphofunctional organization of speech functions. A bilinguals, obviously, have even more wide variance of the central maintenance of speech activity, due to the combination of different conditions that influence the language environment, which include the age of the second language acquisition, the language proficiency, linguistic closeness of the languages, the method of their acquisition, intensity of use and the scope of application of each of the languages. The influence of these factors can mediates in different ways by the individual characteristics of the bilingual's brain. Being exposed to two languages from the first days of life, the child uses for the development of speech skills of the unique features of the brain, which are available only in the initial stages of postnatal ontogenesis. In older age mastering a second language requires much more effort, when, as maturation, the brain acquires new additional possibilities, but permanently lose that special "bonus", which nature gives a small child only in the first months of life. Large individual variability patterns of activation of the cortex when verbal activity in late bilingual" compared with the "early", allows to assume, that the brain of "late bilingual", mastering a new language, forced to operate a large number of backup mechanisms, and this is reflected in the increase of variation in the cerebral processes, responsible for providing of speech functions. In addition, there is serious reason to believe that learning a second language contributes to the expansion of the functional capabilities of the brain and creates the basis for a successful cognitive activity.
NASA Astrophysics Data System (ADS)
Hassanat, Ahmad B. A.; Jassim, Sabah
2010-04-01
In this paper, the automatic lip reading problem is investigated, and an innovative approach to providing solutions to this problem has been proposed. This new VSR approach is dependent on the signature of the word itself, which is obtained from a hybrid feature extraction method dependent on geometric, appearance, and image transform features. The proposed VSR approach is termed "visual words". The visual words approach consists of two main parts, 1) Feature extraction/selection, and 2) Visual speech feature recognition. After localizing face and lips, several visual features for the lips where extracted. Such as the height and width of the mouth, mutual information and the quality measurement between the DWT of the current ROI and the DWT of the previous ROI, the ratio of vertical to horizontal features taken from DWT of ROI, The ratio of vertical edges to horizontal edges of ROI, the appearance of the tongue and the appearance of teeth. Each spoken word is represented by 8 signals, one of each feature. Those signals maintain the dynamic of the spoken word, which contains a good portion of information. The system is then trained on these features using the KNN and DTW. This approach has been evaluated using a large database for different people, and large experiment sets. The evaluation has proved the visual words efficiency, and shown that the VSR is a speaker dependent problem.
Clinical expression of developmental coordination disorder in a large Canadian family
Gaines, Robin; Collins, David; Boycott, Kym; Missiuna, Cheryl; DeLaat, Denise; Soucie, Helen
2008-01-01
Previous studies of the phenotype of developmental coordination disorder (DCD) have largely concentrated on population-based samples. The present study reports on an in-depth examination of a large Canadian family with eight children, after three children who were suspected to have DCD were referred for evaluation. Subsequently, five of the six children whose motor impairments could be measured, and the mother, met the diagnostic criteria for DCD as per the American Psychiatric Association’s Diagnostic and Statistical Manual of Mental Disorders – fourth edition. The family members diagnosed with DCD showed remarkably similar profiles of motor difficulties. Additionally, the five children diagnosed with DCD had current speech articulation difficulties, with four of them having visited speech/language pathologists; the mother had a lateral lisp. More in-depth testing for three children revealed intact intellectual, academic and language comprehension skills. Three of the children diagnosed with DCD were obese. The present report highlights familial clustering of DCD and the presence of comorbid conditions in the affected children. PMID:19436536
Developing an automated speech-recognition telephone diabetes intervention.
Goldman, Roberta E; Sanchez-Hernandez, Maya; Ross-Degnan, Dennis; Piette, John D; Trinacty, Connie Mah; Simon, Steven R
2008-08-01
Many patients do not receive guideline-recommended care for diabetes and other chronic conditions. Automated speech-recognition telephone outreach to supplement in-person physician-patient communication may enhance patient care for chronic illness. We conducted this study to inform the development of an automated telephone outreach intervention for improving diabetes care among members of a large, not-for-profit health plan. In-depth telephone interviews with qualitative analysis. participants Individuals with diabetes (n=36) enrolled in a large regional health plan in the USA. Main outcome measure Patients' opinions about automated speech-recognition telephone technology. Patients who were recently diagnosed with diabetes and some with diabetes for a decade or more expressed basic informational needs. While most would prefer to speak with a live person rather than a computer-recorded voice, many felt that the automated system could successfully supplement the information they receive from their physicians and could serve as an integral part of their care. Patients suggested that such a system could provide specific dietary advice, information about diabetes and its self-care, a call-in menu of information topics, reminders about laboratory test results and appointments, tracking of personal laboratory results and feedback about their self-monitoring. While some patients expressed negative attitudes toward automated speech recognition telephone systems generally, most felt that a variety of functions of such a system could be beneficial to their diabetes care. In-depth interviews resulted in substantive input from health plan members for the design of an automated telephone outreach system to supplement in-person physician-patient communication in this population.
Very large database of lipids: rationale and design.
Martin, Seth S; Blaha, Michael J; Toth, Peter P; Joshi, Parag H; McEvoy, John W; Ahmed, Haitham M; Elshazly, Mohamed B; Swiger, Kristopher J; Michos, Erin D; Kwiterovich, Peter O; Kulkarni, Krishnaji R; Chimera, Joseph; Cannon, Christopher P; Blumenthal, Roger S; Jones, Steven R
2013-11-01
Blood lipids have major cardiovascular and public health implications. Lipid-lowering drugs are prescribed based in part on categorization of patients into normal or abnormal lipid metabolism, yet relatively little emphasis has been placed on: (1) the accuracy of current lipid measures used in clinical practice, (2) the reliability of current categorizations of dyslipidemia states, and (3) the relationship of advanced lipid characterization to other cardiovascular disease biomarkers. To these ends, we developed the Very Large Database of Lipids (NCT01698489), an ongoing database protocol that harnesses deidentified data from the daily operations of a commercial lipid laboratory. The database includes individuals who were referred for clinical purposes for a Vertical Auto Profile (Atherotech Inc., Birmingham, AL), which directly measures cholesterol concentrations of low-density lipoprotein, very low-density lipoprotein, intermediate-density lipoprotein, high-density lipoprotein, their subclasses, and lipoprotein(a). Individual Very Large Database of Lipids studies, ranging from studies of measurement accuracy, to dyslipidemia categorization, to biomarker associations, to characterization of rare lipid disorders, are investigator-initiated and utilize peer-reviewed statistical analysis plans to address a priori hypotheses/aims. In the first database harvest (Very Large Database of Lipids 1.0) from 2009 to 2011, there were 1 340 614 adult and 10 294 pediatric patients; the adult sample had a median age of 59 years (interquartile range, 49-70 years) with even representation by sex. Lipid distributions closely matched those from the population-representative National Health and Nutrition Examination Survey. The second harvest of the database (Very Large Database of Lipids 2.0) is underway. Overall, the Very Large Database of Lipids database provides an opportunity for collaboration and new knowledge generation through careful examination of granular lipid data on a large scale. © 2013 Wiley Periodicals, Inc.
Standard-Chinese Lexical Neighborhood Test in normal-hearing young children.
Liu, Chang; Liu, Sha; Zhang, Ning; Yang, Yilin; Kong, Ying; Zhang, Luo
2011-06-01
The purposes of the present study were to establish the Standard-Chinese version of Lexical Neighborhood Test (LNT) and to examine the lexical and age effects on spoken-word recognition in normal-hearing children. Six lists of monosyllabic and six lists of disyllabic words (20 words/list) were selected from the database of daily speech materials for normal-hearing (NH) children of ages 3-5 years. The lists were further divided into "easy" and "hard" halves according to the word frequency and neighborhood density in the database based on the theory of Neighborhood Activation Model (NAM). Ninety-six NH children (age ranged between 4.0 and 7.0 years) were divided into three different age groups of 1-year intervals. Speech-perception tests were conducted using the Standard-Chinese monosyllabic and disyllabic LNT. The inter-list performance was found to be equivalent and inter-rater reliability was high with 92.5-95% consistency. Results of word-recognition scores showed that the lexical effects were all significant. Children scored higher with disyllabic words than with monosyllabic words. "Easy" words scored higher than "hard" words. The word-recognition performance also increased with age in each lexical category. A multiple linear regression analysis showed that neighborhood density, age, and word frequency appeared to have increasingly more contributions to Chinese word recognition. The results of the present study indicated that performances of Chinese word recognition were influenced by word frequency, age, and neighborhood density, with word frequency playing a major role. These results were consistent with those in other languages, supporting the application of NAM in the Chinese language. The development of Standard-Chinese version of LNT and the establishment of a database of children of 4-6 years old can provide a reliable means for spoken-word recognition test in children with hearing impairment. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Vogel, Markus; Kaisers, Wolfgang; Wassmuth, Ralf; Mayatepek, Ertan
2015-11-03
Clinical documentation has undergone a change due to the usage of electronic health records. The core element is to capture clinical findings and document therapy electronically. Health care personnel spend a significant portion of their time on the computer. Alternatives to self-typing, such as speech recognition, are currently believed to increase documentation efficiency and quality, as well as satisfaction of health professionals while accomplishing clinical documentation, but few studies in this area have been published to date. This study describes the effects of using a Web-based medical speech recognition system for clinical documentation in a university hospital on (1) documentation speed, (2) document length, and (3) physician satisfaction. Reports of 28 physicians were randomized to be created with (intervention) or without (control) the assistance of a Web-based system of medical automatic speech recognition (ASR) in the German language. The documentation was entered into a browser's text area and the time to complete the documentation including all necessary corrections, correction effort, number of characters, and mood of participant were stored in a database. The underlying time comprised text entering, text correction, and finalization of the documentation event. Participants self-assessed their moods on a scale of 1-3 (1=good, 2=moderate, 3=bad). Statistical analysis was done using permutation tests. The number of clinical reports eligible for further analysis stood at 1455. Out of 1455 reports, 718 (49.35%) were assisted by ASR and 737 (50.65%) were not assisted by ASR. Average documentation speed without ASR was 173 (SD 101) characters per minute, while it was 217 (SD 120) characters per minute using ASR. The overall increase in documentation speed through Web-based ASR assistance was 26% (P=.04). Participants documented an average of 356 (SD 388) characters per report when not assisted by ASR and 649 (SD 561) characters per report when assisted by ASR. Participants' average mood rating was 1.3 (SD 0.6) using ASR assistance compared to 1.6 (SD 0.7) without ASR assistance (P<.001). We conclude that medical documentation with the assistance of Web-based speech recognition leads to an increase in documentation speed, document length, and participant mood when compared to self-typing. Speech recognition is a meaningful and effective tool for the clinical documentation process.
1997-08-22
A former Pinellas County, FL public health worker, [name removed], is charged with using a government AIDS surveillance database for his own personal dating scheme. He kept the county health department records on his own laptop computer and used the information to screen potential dates for himself and his friends. [Name removed] filed a pretrial free speech argument contending that his First Amendment rights were being violated. The Pinellas County judge dismissed that argument, clearing the way for a September trial. [Name removed] could face a year in prison on a first-degree misdemeanor charge.
Smith, Colleen
On May 27, 2016, the Food and Drug Administration (FDA) announced that it was adopting a new rule that requires food manufacturers to list—on the already mandated Nutrition Facts label—how many grams of sugar have been added to a food product. Many opponents have criticized this “added sugars” rule on First Amendment grounds, arguing that the rule violates the commercial speech rights of food manufacturers. Whether the rule would survive constitutional scrutiny or not is an open question because the compelled commercial speech doctrine is anything but clear. Courts are split over whether Zauderer’s rational basis test, Central Hudson’s intermediate scrutiny, or some combination of the two should apply to a mandated disclosure like FDA’s added sugars rule. This Paper explains that the added sugars rule is unique in the history of mandated nutrition labeling in that the rule is motivated largely by public health concerns and backed by reports that assert that consumers should limit their intake of added sugars. In contrast, correcting and preventing consumer deception has been a major driving force behind the remainder of FDA’s mandated nutrition labeling. Because of this distinct rationale, the added sugars rule does not fit neatly into any currently existing compelled commercial speech test. This Paper uses the added sugars rule to highlight the deficiencies in the existing tests. Finally, this Paper proposes a new compelled commercial speech test that would adequately balance the interest of all of the effected parties: the government, the public, and food manufacturers.
Neumann, K; Holler-Zittlau, I; van Minnen, S; Sick, U; Zaretsky, Y; Euler, H A
2011-01-01
The German Kindersprachscreening (KiSS) is a universal speech and language screening test for large-scale identification of Hessian kindergarten children requiring special educational language training or clinical speech/language therapy. To calculate the procedural screening validity, 257 children (aged 4.0 to 4.5 years) were tested using KiSS and four language tests (Reynell Development Language Scales III, Patholinguistische Diagnostik, PLAKSS, AWST-R). The majority or consensus judgements of three speech-language professionals, based on the language test results, served as a reference criterion. The base (fail) rates of the professionals were either self-determined or preset based on known prevalence rates. Screening validity was higher for preset than for self-determined base rates due to higher inter-judge agreement. The confusion matrices of the overall index classification of the KiSS (speech-language abnormalities with educational or clinical needs) with the fixed base rate expert judgement about language impairment, including fluency or voice disorders, yielded a sensitivity of 88% and a specificity of 78%, for just language impairment 84% and 75%, respectively. Specificities for disorders requiring clinical diagnostics in the KiSS (language impairment alone or combined with fluency/voice disorders) related to the test-based consensus expert judgment was about 93%. Sensitivities were unsatisfactory because the differentiation between educational and clinical needs requires improvement. Since the judgement concordances between the speech-language professionals was only moderate, the development of a comprehensive German reference test for speech and language disorders with evidence-based algorithmic decision rules rather than subjective clinical judgement is advocated.
Landwehr, Markus; Fürstenberg, Dirk; Walger, Martin; von Wedel, Hasso; Meister, Hartmut
2014-01-01
Advances in speech coding strategies and electrode array designs for cochlear implants (CIs) predominantly aim at improving speech perception. Current efforts are also directed at transmitting appropriate cues of the fundamental frequency (F0) to the auditory nerve with respect to speech quality, prosody, and music perception. The aim of this study was to examine the effects of various electrode configurations and coding strategies on speech intonation identification, speaker gender identification, and music quality rating. In six MED-EL CI users electrodes were selectively deactivated in order to simulate different insertion depths and inter-electrode distances when using the high definition continuous interleaved sampling (HDCIS) and fine structure processing (FSP) speech coding strategies. Identification of intonation and speaker gender was determined and music quality rating was assessed. For intonation identification HDCIS was robust against the different electrode configurations, whereas fine structure processing showed significantly worse results when a short electrode depth was simulated. In contrast, speaker gender recognition was not affected by electrode configuration or speech coding strategy. Music quality rating was sensitive to electrode configuration. In conclusion, the three experiments revealed different outcomes, even though they all addressed the reception of F0 cues. Rapid changes in F0, as seen with intonation, were the most sensitive to electrode configurations and coding strategies. In contrast, electrode configurations and coding strategies did not show large effects when F0 information was available over a longer time period, as seen with speaker gender. Music quality relies on additional spectral cues other than F0, and was poorest when a shallow insertion was simulated.
Stuttering as a trait or state - an ALE meta-analysis of neuroimaging studies.
Belyk, Michel; Kraft, Shelly Jo; Brown, Steven
2015-01-01
Stuttering is a speech disorder characterised by repetitions, prolongations and blocks that disrupt the forward movement of speech. An earlier meta-analysis of brain imaging studies of stuttering (Brown et al., 2005) revealed a general trend towards rightward lateralization of brain activations and hyperactivity in the larynx motor cortex bilaterally. The present study sought not only to update that meta-analysis with recent work but to introduce an important distinction not present in the first study, namely the difference between 'trait' and 'state' stuttering. The analysis of trait stuttering compares people who stutter (PWS) with people who do not stutter when behaviour is controlled for, i.e., when speech is fluent in both groups. In contrast, the analysis of state stuttering examines PWS during episodes of stuttered speech compared with episodes of fluent speech. Seventeen studies were analysed using activation likelihood estimation. Trait stuttering was characterised by the well-known rightward shift in lateralization for language and speech areas. State stuttering revealed a more diverse pattern. Abnormal activation of larynx and lip motor cortex was common to the two analyses. State stuttering was associated with overactivation in the right hemisphere larynx and lip motor cortex. Trait stuttering was associated with overactivation of lip motor cortex in the right hemisphere but underactivation of larynx motor cortex in the left hemisphere. These results support a large literature highlighting laryngeal and lip involvement in the symptomatology of stuttering, and disambiguate two possible sources of activation in neuroimaging studies of persistent developmental stuttering. © 2014 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Damage to the Left Precentral Gyrus Is Associated With Apraxia of Speech in Acute Stroke.
Itabashi, Ryo; Nishio, Yoshiyuki; Kataoka, Yuka; Yazawa, Yukako; Furui, Eisuke; Matsuda, Minoru; Mori, Etsuro
2016-01-01
Apraxia of speech (AOS) is a motor speech disorder, which is clinically characterized by the combination of phonemic segmental changes and articulatory distortions. AOS has been believed to arise from impairment in motor speech planning/programming and differentiated from both aphasia and dysarthria. The brain regions associated with AOS are still a matter of debate. The aim of this study was to address this issue in a large number of consecutive acute ischemic stroke patients. We retrospectively studied 136 patients with isolated nonlacunar infarcts in the left middle cerebral artery territory (70.5±12.9 years old, 79 males). In accordance with speech and language assessments, the patients were classified into the following groups: pure form of AOS (pure AOS), AOS with aphasia (AOS-aphasia), and without AOS (non-AOS). Voxel-based lesion-symptom mapping analysis was performed on T2-weighted images or fluid-attenuated inversion recovery images. Using the Liebermeister method, group-wise comparisons were made between the all AOS (pure AOS plus AOS-aphasia) and non-AOS, pure AOS and non-AOS, AOS-aphasia and non-AOS, and pure AOS and AOS-aphasia groups. Of the 136 patients, 22 patients were diagnosed with AOS (7 patients with pure AOS and 15 patients with AOS-aphasia). The voxel-based lesion-symptom mapping analysis demonstrated that the brain regions associated with AOS were centered on the left precentral gyrus. Damage to the left precentral gyrus is associated with AOS in acute to subacute stroke patients, suggesting a role of this brain region in motor speech production. © 2015 American Heart Association, Inc.