Toward a dual-learning systems model of speech category learning
Chandrasekaran, Bharath; Koslov, Seth R.; Maddox, W. T.
2014-01-01
More than two decades of work in vision posits the existence of dual-learning systems of category learning. The reflective system uses working memory to develop and test rules for classifying in an explicit fashion, while the reflexive system operates by implicitly associating perception with actions that lead to reinforcement. Dual-learning systems models hypothesize that in learning natural categories, learners initially use the reflective system and, with practice, transfer control to the reflexive system. The role of reflective and reflexive systems in auditory category learning and more specifically in speech category learning has not been systematically examined. In this article, we describe a neurobiologically constrained dual-learning systems theoretical framework that is currently being developed in speech category learning and review recent applications of this framework. Using behavioral and computational modeling approaches, we provide evidence that speech category learning is predominantly mediated by the reflexive learning system. In one application, we explore the effects of normal aging on non-speech and speech category learning. Prominently, we find a large age-related deficit in speech learning. The computational modeling suggests that older adults are less likely to transition from simple, reflective, unidimensional rules to more complex, reflexive, multi-dimensional rules. In a second application, we summarize a recent study examining auditory category learning in individuals with elevated depressive symptoms. We find a deficit in reflective-optimal and an enhancement in reflexive-optimal auditory category learning. Interestingly, individuals with elevated depressive symptoms also show an advantage in learning speech categories. We end with a brief summary and description of a number of future directions. PMID:25132827
Yildiz, Izzet B.; von Kriegstein, Katharina; Kiebel, Stefan J.
2013-01-01
Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents—an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments. PMID:24068902
Yildiz, Izzet B; von Kriegstein, Katharina; Kiebel, Stefan J
2013-01-01
Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents-an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments.
How may the basal ganglia contribute to auditory categorization and speech perception?
Lim, Sung-Joo; Fiez, Julie A.; Holt, Lori L.
2014-01-01
Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox of mechanisms available to speech perception. This perspective invites consideration of findings from cognitive neuroscience literatures outside of the speech domain as a means of constraining models of speech perception. Although neurobiological models of speech perception have mainly focused on cerebral cortex, research outside the speech domain is consistent with the possibility of significant subcortical contributions in category learning. Here, we review the functional role of one such structure, the basal ganglia. We examine research from animal electrophysiology, human neuroimaging, and behavior to consider characteristics of basal ganglia processing that may be advantageous for speech category learning. We also present emerging evidence for a direct role for basal ganglia in learning auditory categories in a complex, naturalistic task intended to model the incidental manner in which speech categories are acquired. To conclude, we highlight new research questions that arise in incorporating the broader neuroscience research literature in modeling speech perception, and suggest how understanding contributions of the basal ganglia can inform attempts to optimize training protocols for learning non-native speech categories in adulthood. PMID:25136291
Dual-learning systems during speech category learning
Chandrasekaran, Bharath; Yi, Han-Gyol; Maddox, W. Todd
2013-01-01
Dual-systems models of visual category learning posit the existence of an explicit, hypothesis-testing ‘reflective’ system, as well as an implicit, procedural-based ‘reflexive’ system. The reflective and reflexive learning systems are competitive and neurally dissociable. Relatively little is known about the role of these domain-general learning systems in speech category learning. Given the multidimensional, redundant, and variable nature of acoustic cues in speech categories, our working hypothesis is that speech categories are learned reflexively. To this end, we examined the relative contribution of these learning systems to speech learning in adults. Native English speakers learned to categorize Mandarin tone categories over 480 trials. The training protocol involved trial-by-trial feedback and multiple talkers. Experiment 1 and 2 examined the effect of manipulating the timing (immediate vs. delayed) and information content (full vs. minimal) of feedback. Dual-systems models of visual category learning predict that delayed feedback and providing rich, informational feedback enhance reflective learning, while immediate and minimally informative feedback enhance reflexive learning. Across the two experiments, our results show feedback manipulations that targeted reflexive learning enhanced category learning success. In Experiment 3, we examined the role of trial-to-trial talker information (mixed vs. blocked presentation) on speech category learning success. We hypothesized that the mixed condition would enhance reflexive learning by not allowing an association between talker-related acoustic cues and speech categories. Our results show that the mixed talker condition led to relatively greater accuracies. Our experiments demonstrate that speech categories are optimally learned by training methods that target the reflexive learning system. PMID:24002965
Effect of Reflective Practice on Student Recall of Acoustics for Speech Science
ERIC Educational Resources Information Center
Walden, Patrick R.; Bell-Berti, Fredericka
2013-01-01
Researchers have developed models of learning through experience; however, these models are rarely named as a conceptual frame for educational research in the sciences. This study examined the effect of reflective learning responses on student recall of speech acoustics concepts. Two groups of undergraduate students enrolled in a speech science…
[Modeling developmental aspects of sensorimotor control of speech production].
Kröger, B J; Birkholz, P; Neuschaefer-Rube, C
2007-05-01
Detailed knowledge of the neurophysiology of speech acquisition is important for understanding the developmental aspects of speech perception and production and for understanding developmental disorders of speech perception and production. A computer implemented neural model of sensorimotor control of speech production was developed. The model is capable of demonstrating the neural functions of different cortical areas during speech production in detail. (i) Two sensory and two motor maps or neural representations and the appertaining neural mappings or projections establish the sensorimotor feedback control system. These maps and mappings are already formed and trained during the prelinguistic phase of speech acquisition. (ii) The feedforward sensorimotor control system comprises the lexical map (representations of sounds, syllables, and words of the first language) and the mappings from lexical to sensory and to motor maps. The training of the appertaining mappings form the linguistic phase of speech acquisition. (iii) Three prelinguistic learning phases--i. e. silent mouthing, quasi stationary vocalic articulation, and realisation of articulatory protogestures--can be defined on the basis of our simulation studies using the computational neural model. These learning phases can be associated with temporal phases of prelinguistic speech acquisition obtained from natural data. The neural model illuminates the detailed function of specific cortical areas during speech production. In particular it can be shown that developmental disorders of speech production may result from a delayed or incorrect process within one of the prelinguistic learning phases defined by the neural model.
Modeling the Development of Audiovisual Cue Integration in Speech Perception
Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.
2017-01-01
Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558
Modeling the Development of Audiovisual Cue Integration in Speech Perception.
Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C
2017-03-21
Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
The Role of Corticostriatal Systems in Speech Category Learning
Yi, Han-Gyol; Maddox, W. Todd; Mumford, Jeanette A.; Chandrasekaran, Bharath
2016-01-01
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. Keywords: category learning, fMRI, corticostriatal systems, speech, putamen PMID:25331600
Analyzing Distributional Learning of Phonemic Categories in Unsupervised Deep Neural Networks
Räsänen, Okko; Nagamine, Tasha; Mesgarani, Nima
2017-01-01
Infants’ speech perception adapts to the phonemic categories of their native language, a process assumed to be driven by the distributional properties of speech. This study investigates whether deep neural networks (DNNs), the current state-of-the-art in distributional feature learning, are capable of learning phoneme-like representations of speech in an unsupervised manner. We trained DNNs with unlabeled and labeled speech and analyzed the activations of each layer with respect to the phones in the input segments. The analyses reveal that the emergence of phonemic invariance in DNNs is dependent on the availability of phonemic labeling of the input during the training. No increased phonemic selectivity of the hidden layers was observed in the purely unsupervised networks despite successful learning of low-dimensional representations for speech. This suggests that additional learning constraints or more sophisticated models are needed to account for the emergence of phone-like categories in distributional learning operating on natural speech. PMID:29359204
Coutinho, Eduardo; Schuller, Björn
2017-01-01
Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies-the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain.
Evolution of a Rapidly Learned Representation for Speech.
ERIC Educational Resources Information Center
Nakisa, Ramin Charles; Plunkett, Kim
1998-01-01
Describes a connectionist model accounting for newborn infants' ability to finely discriminate almost all human speech contrasts and the fact that their phonemic category boundaries are identical, even for phonemes outside their target language. The model posits an innately guided learning in which an artificial neural network is stored in a…
Learning Models and Real-Time Speech Recognition.
ERIC Educational Resources Information Center
Danforth, Douglas G.; And Others
This report describes the construction and testing of two "psychological" learning models for the purpose of computer recognition of human speech over the telephone. One of the two models was found to be superior in all tests. A regression analysis yielded a 92.3% recognition rate for 14 subjects ranging in age from 6 to 13 years. Tests…
Houston, Derek M.; Bergeson, Tonya R.
2013-01-01
The advent of cochlear implantation has provided thousands of deaf infants and children access to speech and the opportunity to learn spoken language. Whether or not deaf infants successfully learn spoken language after implantation may depend in part on the extent to which they listen to speech rather than just hear it. We explore this question by examining the role that attention to speech plays in early language development according to a prominent model of infant speech perception – Jusczyk’s WRAPSA model – and by reviewing the kinds of speech input that maintains normal-hearing infants’ attention. We then review recent findings suggesting that cochlear-implanted infants’ attention to speech is reduced compared to normal-hearing infants and that speech input to these infants differs from input to infants with normal hearing. Finally, we discuss possible roles attention to speech may play on deaf children’s language acquisition after cochlear implantation in light of these findings and predictions from Jusczyk’s WRAPSA model. PMID:24729634
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore » decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.« less
Enhanced procedural learning of speech sound categories in a genetic variant of FOXP2.
Chandrasekaran, Bharath; Yi, Han-Gyol; Blanco, Nathaniel J; McGeary, John E; Maddox, W Todd
2015-05-20
A mutation of the forkhead box protein P2 (FOXP2) gene is associated with severe deficits in human speech and language acquisition. In rodents, the humanized form of FOXP2 promotes faster switching from declarative to procedural learning strategies when the two learning systems compete. Here, we examined a polymorphism of FOXP2 (rs6980093) in humans (214 adults; 111 females) for associations with non-native speech category learning success. Neurocomputational modeling results showed that individuals with the GG genotype shifted faster to procedural learning strategies, which are optimal for the task. These findings support an adaptive role for the FOXP2 gene in modulating the function of neural learning systems that have a direct bearing on human speech category learning. Copyright © 2015 the authors 0270-6474/15/357808-05$15.00/0.
Visual Feedback of Tongue Movement for Novel Speech Sound Learning
Katz, William F.; Mehta, Sonya
2015-01-01
Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one's own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker's learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA) was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ/; a voiced, coronal, palatal stop) before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers' productions were evaluated using kinematic (tongue-tip spatial positioning) and acoustic (burst spectra) measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing. PMID:26635571
Expressive facial animation synthesis by learning speech coarticulation and expression spaces.
Deng, Zhigang; Neumann, Ulrich; Lewis, J P; Kim, Tae-Yong; Bulut, Murtaza; Narayanan, Shrikanth
2006-01-01
Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A Phoneme-Independent Expression Eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and Principal Component Analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation.
Evaluating deep learning architectures for Speech Emotion Recognition.
Fayek, Haytham M; Lech, Margaret; Cavedon, Lawrence
2017-08-01
Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances. Copyright © 2017 Elsevier Ltd. All rights reserved.
Schuller, Björn
2017-01-01
Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies—the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain. PMID:28658285
Effect of explicit dimension instruction on speech category learning
Chandrasekaran, Bharath; Yi, Han-Gyol; Smayda, Kirsten E.; Maddox, W. Todd
2015-01-01
Learning non-native speech categories is often considered a challenging task in adulthood. This difficulty is driven by cross-language differences in weighting critical auditory dimensions that differentiate speech categories. For example, previous studies have shown that differentiating Mandarin tonal categories requires attending to dimensions related to pitch height and direction. Relative to native speakers of Mandarin, the pitch direction dimension is under-weighted by native English speakers. In the current study, we examined the effect of explicit instructions (dimension instruction) on native English speakers' Mandarin tone category learning within the framework of a dual-learning systems (DLS) model. This model predicts that successful speech category learning is initially mediated by an explicit, reflective learning system that frequently utilizes unidimensional rules, with an eventual switch to a more implicit, reflexive learning system that utilizes multidimensional rules. Participants were explicitly instructed to focus and/or ignore the pitch height dimension, the pitch direction dimension, or were given no explicit prime. Our results show that instruction instructing participants to focus on pitch direction, and instruction diverting attention away from pitch height resulted in enhanced tone categorization. Computational modeling of participant responses suggested that instruction related to pitch direction led to faster and more frequent use of multidimensional reflexive strategies, and enhanced perceptual selectivity along the previously underweighted pitch direction dimension. PMID:26542400
Teaching autistic children conversational speech using video modeling.
Charlop, M H; Milstein, J P
1989-01-01
We assessed the effects of video modeling on acquisition and generalization of conversational skills among autistic children. Three autistic boys observed videotaped conversations consisting of two people discussing specific toys. When criterion for learning was met, generalization of conversational skills was assessed with untrained topics of conversation; new stimuli (toys); unfamiliar persons, siblings, and autistic peers; and other settings. The results indicated that the children learned through video modeling, generalized their conversational skills, and maintained conversational speech over a 15-month period. Video modeling shows much promise as a rapid and effective procedure for teaching complex verbal skills such as conversational speech. PMID:2793634
Text-to-audiovisual speech synthesizer for children with learning disabilities.
Mendi, Engin; Bayrak, Coskun
2013-01-01
Learning disabilities affect the ability of children to learn, despite their having normal intelligence. Assistive tools can highly increase functional capabilities of children with learning disorders such as writing, reading, or listening. In this article, we describe a text-to-audiovisual synthesizer that can serve as an assistive tool for such children. The system automatically converts an input text to audiovisual speech, providing synchronization of the head, eye, and lip movements of the three-dimensional face model with appropriate facial expressions and word flow of the text. The proposed system can enhance speech perception and help children having learning deficits to improve their chances of success.
Long short-term memory for speaker generalization in supervised speech separation
Chen, Jitong; Wang, DeLiang
2017-01-01
Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation. PMID:28679261
Co-occurrence statistics as a language-dependent cue for speech segmentation.
Saksida, Amanda; Langus, Alan; Nespor, Marina
2017-05-01
To what extent can language acquisition be explained in terms of different associative learning mechanisms? It has been hypothesized that distributional regularities in spoken languages are strong enough to elicit statistical learning about dependencies among speech units. Distributional regularities could be a useful cue for word learning even without rich language-specific knowledge. However, it is not clear how strong and reliable the distributional cues are that humans might use to segment speech. We investigate cross-linguistic viability of different statistical learning strategies by analyzing child-directed speech corpora from nine languages and by modeling possible statistics-based speech segmentations. We show that languages vary as to which statistical segmentation strategies are most successful. The variability of the results can be partially explained by systematic differences between languages, such as rhythmical differences. The results confirm previous findings that different statistical learning strategies are successful in different languages and suggest that infants may have to primarily rely on non-statistical cues when they begin their process of speech segmentation. © 2016 John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Prata, David Nadler; Baker, Ryan S. J. d.; Costa, Evandro d. B.; Rose, Carolyn P.; Cui, Yue; de Carvalho, Adriana M. J. B.
2009-01-01
This paper presents a model which can automatically detect a variety of student speech acts as students collaborate within a computer supported collaborative learning environment. In addition, an analysis is presented which gives substantial insight as to how students' learning is associated with students' speech acts, knowledge that will…
Speech Perception in the Classroom.
ERIC Educational Resources Information Center
Smaldino, Joseph J.; Crandell, Carl C.
1999-01-01
This article discusses how poor room acoustics can make speech inaudible and presents a speech-perception model demonstrating the linkage between adequacy of classroom acoustics and the development of a speech and language systems. It argues both aspects must be considered when evaluating barriers to listening and learning in a classroom.…
The Role of Corticostriatal Systems in Speech Category Learning.
Yi, Han-Gyol; Maddox, W Todd; Mumford, Jeanette A; Chandrasekaran, Bharath
2016-04-01
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Horn, Nynne Thorup; Sørensen, Stine Derdau; McGregor, William B.; Wallentin, Mikkel
2015-01-01
Models of speech learning suggest that adaptations to foreign language sound categories take place within 6 to 12 months of exposure to a foreign language. Results from laboratory language training show effects of very targeted training on nonnative speech contrasts within only 1 to 4 weeks of training. Results from immersion studies are inconclusive, but some suggest continued effects on nonnative speech perception after 6 to 8 years of experience. We investigated this apparent discrepancy in the timing of adaptation to foreign speech sounds in a longitudinal study of foreign language learning. We examined two groups of Danish language officer cadets learning either Arabic (Modern Standard Arabic and Egyptian Arabic) or Dari (Afghan Farsi) through intensive multifaceted language training. We conducted two experiments (identification and discrimination) with the cadets who were tested four times: at the start (T0), after 3 weeks (T1), 6 months (T2), and 19 months (T3). We used a phonemic Arabic contrast (pharyngeal vs. glottal frication) and a phonemic Dari contrast (sibilant voicing) as stimuli. We observed an effect of learning on the Dari learners’ identification of the Dari stimuli already after 3 weeks of language training, which was sustained, but not improved, after 6 and 19 months. The changes in the Dari learners’ identification functions were positively correlated with their grades after 6 months. We observed no other learning effects at the group level. We discuss the results in the light of predictions from speech learning models. PMID:27551355
Simultaneous acquisition of multiple auditory-motor transformations in speech
Rochet-Capellan, Amelie; Ostry, David J.
2011-01-01
The brain easily generates the movement that is needed in a given situation. Yet surprisingly, the results of experimental studies suggest that it is difficult to acquire more than one skill at a time. To do so, it has generally been necessary to link the required movement to arbitrary cues. In the present study, we show that speech motor learning provides an informative model for the acquisition of multiple sensorimotor skills. During training, subjects are required to repeat aloud individual words in random order while auditory feedback is altered in real-time in different ways for the different words. We find that subjects can quite readily and simultaneously modify their speech movements to correct for these different auditory transformations. This multiple learning occurs effortlessly without explicit cues and without any apparent awareness of the perturbation. The ability to simultaneously learn several different auditory-motor transformations is consistent with the idea that in speech motor learning, the brain acquires instance specific memories. The results support the hypothesis that speech motor learning is fundamentally local. PMID:21325534
Serpanos, Yula C; Senzer, Deborah
2015-05-01
This study presents a piloted training model of experiential instruction in outer and middle ear (OE-ME) screening for graduate speech-language pathology students with peer teaching by doctor of audiology (AuD) students. Six individual experiential training sessions in screening otoscopy and tympanometry were conducted for 36 graduate-level speech-language pathology students led by a supervised AuD student. Postexperiential training, survey outcomes from 24 speech-language pathology students revealed a significant improvement (p = .01) in perceptions of attaining adequate knowledge and comfort in performing screening otoscopy (handheld and video otoscopy) and tympanometry. In a group of matched controls who did not receive experiential training in OE-ME screening (n = 24), ratings on the same learning outcomes survey in otoscopy and tympanometry were significantly poorer (p = .01) compared with students who did receive experiential training. A training model of experiential instruction for speech-language pathology students by AuD students improved learning outcomes, illustrating its promise in affecting clinical practices. The instructional model also meets the Council on Academic Accreditation in Audiology and Speech-Language Pathology (CAA; American Speech-Language-Hearing Association, 2008) and American Speech-Language-Hearing Association (2014) Certificate of Clinical Competence (ASHA CCC) standards for speech-language pathology in OE-ME screening and CAA (2008) and ASHA (2012) CCC standards in the supervisory process for audiology.
Learning diagnostic models using speech and language measures.
Peintner, Bart; Jarrold, William; Vergyriy, Dimitra; Richey, Colleen; Tempini, Maria Luisa Gorno; Ogar, Jennifer
2008-01-01
We describe results that show the effectiveness of machine learning in the automatic diagnosis of certain neurodegenerative diseases, several of which alter speech and language production. We analyzed audio from 9 control subjects and 30 patients diagnosed with one of three subtypes of Frontotemporal Lobar Degeneration. From this data, we extracted features of the audio signal and the words the patient used, which were obtained using our automated transcription technologies. We then automatically learned models that predict the diagnosis of the patient using these features. Our results show that learned models over these features predict diagnosis with accuracy significantly better than random. Future studies using higher quality recordings will likely improve these results.
NASA Astrophysics Data System (ADS)
Lecun, Yann; Bengio, Yoshua; Hinton, Geoffrey
2015-05-01
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
2015-05-28
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Zhang, Yong; Li, Peng; Jin, Yingyezhe; Choe, Yoonsuck
2015-11-01
This paper presents a bioinspired digital liquid-state machine (LSM) for low-power very-large-scale-integration (VLSI)-based machine learning applications. To the best of the authors' knowledge, this is the first work that employs a bioinspired spike-based learning algorithm for the LSM. With the proposed online learning, the LSM extracts information from input patterns on the fly without needing intermediate data storage as required in offline learning methods such as ridge regression. The proposed learning rule is local such that each synaptic weight update is based only upon the firing activities of the corresponding presynaptic and postsynaptic neurons without incurring global communications across the neural network. Compared with the backpropagation-based learning, the locality of computation in the proposed approach lends itself to efficient parallel VLSI implementation. We use subsets of the TI46 speech corpus to benchmark the bioinspired digital LSM. To reduce the complexity of the spiking neural network model without performance degradation for speech recognition, we study the impacts of synaptic models on the fading memory of the reservoir and hence the network performance. Moreover, we examine the tradeoffs between synaptic weight resolution, reservoir size, and recognition performance and present techniques to further reduce the overhead of hardware implementation. Our simulation results show that in terms of isolated word recognition evaluated using the TI46 speech corpus, the proposed digital LSM rivals the state-of-the-art hidden Markov-model-based recognizer Sphinx-4 and outperforms all other reported recognizers including the ones that are based upon the LSM or neural networks.
Towards a psychology of literacy: on the relations between speech and writing.
Olson, D R
1996-07-01
A variety of graphic systems have been developed for preserving and communicating information, among them pictures, charts, graphs, flags, tartans and hallmarks. Writing systems which constitute a species of these graphic systems are distinctive in that they bear a direct relation to speech; in this paper it is argued that writing serves as a model for various properties of speech including sentences, words and for alphabets, phonemes. On this view, the history of writing and the acquisition of literacy are less matters of learning how to transcribe speech than a matter of learning to hear and think about one's own language in a new way. A number of lines of evidence are advanced to support the "model" view and the conclusion that literacy contributes to conceptual structure rather than merely reporting it.
Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies.
Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol Y; Saltzman, Elliot; Goldstein, Louis
2010-09-13
Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed "speech-inversion." This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.
Humanistic Speech Education to Create Leadership Models.
ERIC Educational Resources Information Center
Oka, Beverley Jeanne
A theoretical framework based primarily on the humanistic psychology of Abraham Maslow is used in developing a humanistic approach to speech education. The holistic view of human learning and behavior, inherent in this approach, is seen to be compatible with a model of effective leadership. Specific applications of this approach to speech…
Animal models of speech and vocal communication deficits associated with psychiatric disorders
Konopka, Genevieve; Roberts, Todd F.
2015-01-01
Disruptions in speech, language and vocal communication are hallmarks of several neuropsychiatric disorders, most notably autism spectrum disorders. Historically, the use of animal models to dissect molecular pathways and connect them to behavioral endophenotypes in cognitive disorders has proven to be an effective approach for developing and testing disease-relevant therapeutics. The unique aspects of human language when compared to vocal behaviors in other animals make such an approach potentially more challenging. However, the study of vocal learning in species with analogous brain circuits to humans may provide entry points for understanding this human-specific phenotype and diseases. Here, we review animal models of vocal learning and vocal communication, and specifically link phenotypes of psychiatric disorders to relevant model systems. Evolutionary constraints in the organization of neural circuits and synaptic plasticity result in similarities in the brain mechanisms for vocal learning and vocal communication. Comparative approaches and careful consideration of the behavioral limitations among different animal models can provide critical avenues for dissecting the molecular pathways underlying cognitive disorders that disrupt speech, language and vocal communication. PMID:26232298
Neural representations and mechanisms for the performance of simple speech sequences
Bohland, Jason W.; Bullock, Daniel; Guenther, Frank H.
2010-01-01
Speakers plan the phonological content of their utterances prior to their release as speech motor acts. Using a finite alphabet of learned phonemes and a relatively small number of syllable structures, speakers are able to rapidly plan and produce arbitrary syllable sequences that fall within the rules of their language. The class of computational models of sequence planning and performance termed competitive queuing (CQ) models have followed Lashley (1951) in assuming that inherently parallel neural representations underlie serial action, and this idea is increasingly supported by experimental evidence. In this paper we develop a neural model that extends the existing DIVA model of speech production in two complementary ways. The new model includes paired structure and content subsystems (cf. MacNeilage, 1998) that provide parallel representations of a forthcoming speech plan, as well as mechanisms for interfacing these phonological planning representations with learned sensorimotor programs to enable stepping through multi-syllabic speech plans. On the basis of previous reports, the model’s components are hypothesized to be localized to specific cortical and subcortical structures, including the left inferior frontal sulcus, the medial premotor cortex, the basal ganglia and thalamus. The new model, called GODIVA (Gradient Order DIVA), thus fills a void in current speech research by providing formal mechanistic hypotheses about both phonological and phonetic processes that are grounded by neuroanatomy and physiology. This framework also generates predictions that can be tested in future neuroimaging and clinical case studies. PMID:19583476
ERIC Educational Resources Information Center
Rasanen, Okko
2011-01-01
Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…
Dopamine Regulation of Human Speech and Bird Song: A Critical Review
ERIC Educational Resources Information Center
Simonyan, Kristina; Horwitz, Barry; Jarvis, Erich D.
2012-01-01
To understand the neural basis of human speech control, extensive research has been done using a variety of methodologies in a range of experimental models. Nevertheless, several critical questions about learned vocal motor control still remain open. One of them is the mechanism(s) by which neurotransmitters, such as dopamine, modulate speech and…
Perrier, Pascal; Schwartz, Jean-Luc; Diard, Julien
2018-01-01
Shifts in perceptual boundaries resulting from speech motor learning induced by perturbations of the auditory feedback were taken as evidence for the involvement of motor functions in auditory speech perception. Beyond this general statement, the precise mechanisms underlying this involvement are not yet fully understood. In this paper we propose a quantitative evaluation of some hypotheses concerning the motor and auditory updates that could result from motor learning, in the context of various assumptions about the roles of the auditory and somatosensory pathways in speech perception. This analysis was made possible thanks to the use of a Bayesian model that implements these hypotheses by expressing the relationships between speech production and speech perception in a joint probability distribution. The evaluation focuses on how the hypotheses can (1) predict the location of perceptual boundary shifts once the perturbation has been removed, (2) account for the magnitude of the compensation in presence of the perturbation, and (3) describe the correlation between these two behavioral characteristics. Experimental findings about changes in speech perception following adaptation to auditory feedback perturbations serve as reference. Simulations suggest that they are compatible with a framework in which motor adaptation updates both the auditory-motor internal model and the auditory characterization of the perturbed phoneme, and where perception involves both auditory and somatosensory pathways. PMID:29357357
Building phonetic categories: an argument for the role of sleep
Earle, F. Sayako; Myers, Emily B.
2014-01-01
The current review provides specific predictions for the role of sleep-mediated memory consolidation in the formation of new speech sound representations. Specifically, this discussion will highlight selected literature on the different ideas concerning category representation in speech, followed by a broad overview of memory consolidation and how it relates to human behavior, as relevant to speech/perceptual learning. In combining behavioral and physiological accounts from animal models with insights from the human consolidation literature on auditory skill/word learning, we are in the early stages of understanding how the transfer of experiential information between brain structures during sleep manifests in changes to online perception. Arriving at the conclusion that this process is crucial in perceptual learning and the formation of novel categories, further speculation yields the adjacent claim that the habitual disruption in this process leads to impoverished quality in the representation of speech sounds. PMID:25477828
Nonhomogeneous transfer reveals specificity in speech motor learning.
Rochet-Capellan, Amélie; Richer, Lara; Ostry, David J
2012-03-01
Does motor learning generalize to new situations that are not experienced during training, or is motor learning essentially specific to the training situation? In the present experiments, we use speech production as a model to investigate generalization in motor learning. We tested for generalization from training to transfer utterances by varying the acoustical similarity between these two sets of utterances. During the training phase of the experiment, subjects received auditory feedback that was altered in real time as they repeated a single consonant-vowel-consonant utterance. Different groups of subjects were trained with different consonant-vowel-consonant utterances, which differed from a subsequent transfer utterance in terms of the initial consonant or vowel. During the adaptation phase of the experiment, we observed that subjects in all groups progressively changed their speech output to compensate for the perturbation (altered auditory feedback). After learning, we tested for generalization by having all subjects produce the same single transfer utterance while receiving unaltered auditory feedback. We observed limited transfer of learning, which depended on the acoustical similarity between the training and the transfer utterances. The gradients of generalization observed here are comparable to those observed in limb movement. The present findings are consistent with the conclusion that speech learning remains specific to individual instances of learning.
Nonhomogeneous transfer reveals specificity in speech motor learning
Rochet-Capellan, Amélie; Richer, Lara
2012-01-01
Does motor learning generalize to new situations that are not experienced during training, or is motor learning essentially specific to the training situation? In the present experiments, we use speech production as a model to investigate generalization in motor learning. We tested for generalization from training to transfer utterances by varying the acoustical similarity between these two sets of utterances. During the training phase of the experiment, subjects received auditory feedback that was altered in real time as they repeated a single consonant-vowel-consonant utterance. Different groups of subjects were trained with different consonant-vowel-consonant utterances, which differed from a subsequent transfer utterance in terms of the initial consonant or vowel. During the adaptation phase of the experiment, we observed that subjects in all groups progressively changed their speech output to compensate for the perturbation (altered auditory feedback). After learning, we tested for generalization by having all subjects produce the same single transfer utterance while receiving unaltered auditory feedback. We observed limited transfer of learning, which depended on the acoustical similarity between the training and the transfer utterances. The gradients of generalization observed here are comparable to those observed in limb movement. The present findings are consistent with the conclusion that speech learning remains specific to individual instances of learning. PMID:22190628
Kim, Jongin; Park, Hyeong-jun
2016-01-01
The purpose of this study is to classify EEG data on imagined speech in a single trial. We recorded EEG data while five subjects imagined different vowels, /a/, /e/, /i/, /o/, and /u/. We divided each single trial dataset into thirty segments and extracted features (mean, variance, standard deviation, and skewness) from all segments. To reduce the dimension of the feature vector, we applied a feature selection algorithm based on the sparse regression model. These features were classified using a support vector machine with a radial basis function kernel, an extreme learning machine, and two variants of an extreme learning machine with different kernels. Because each single trial consisted of thirty segments, our algorithm decided the label of the single trial by selecting the most frequent output among the outputs of the thirty segments. As a result, we observed that the extreme learning machine and its variants achieved better classification rates than the support vector machine with a radial basis function kernel and linear discrimination analysis. Thus, our results suggested that EEG responses to imagined speech could be successfully classified in a single trial using an extreme learning machine with a radial basis function and linear kernel. This study with classification of imagined speech might contribute to the development of silent speech BCI systems. PMID:28097128
Auditory-Perceptual Learning Improves Speech Motor Adaptation in Children
Shiller, Douglas M.; Rochon, Marie-Lyne
2015-01-01
Auditory feedback plays an important role in children’s speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback, however it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5–7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children’s ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation. PMID:24842067
The sensorimotor and social sides of the architecture of speech.
Pezzulo, Giovanni; Barca, Laura; D'Ausilio, Alessando
2014-12-01
Speech is a complex skill to master. In addition to sophisticated phono-articulatory abilities, speech acquisition requires neuronal systems configured for vocal learning, with adaptable sensorimotor maps that couple heard speech sounds with motor programs for speech production; imitation and self-imitation mechanisms that can train the sensorimotor maps to reproduce heard speech sounds; and a "pedagogical" learning environment that supports tutor learning.
Plasticity in the Human Speech Motor System Drives Changes in Speech Perception
Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.
2014-01-01
Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. PMID:25080594
ERIC Educational Resources Information Center
Gweon, Gahgene; Jain, Mahaveer; McDonough, John; Raj, Bhiksha; Rose, Carolyn P.
2013-01-01
This paper contributes to a theory-grounded methodological foundation for automatic collaborative learning process analysis. It does this by illustrating how insights from the social psychology and sociolinguistics of speech style provide a theoretical framework to inform the design of a computational model. The purpose of that model is to detect…
NASA Astrophysics Data System (ADS)
Wang, Hongcui; Kawahara, Tatsuya
CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.
Recent Advances in the Genetics of Vocal Learning
Condro, Michael C.; White, Stephanie A.
2015-01-01
Language is a complex communicative behavior unique to humans, and its genetic basis is poorly understood. Genes associated with human speech and language disorders provide some insights, originating with the FOXP2 transcription factor, a mutation in which is the source of an inherited form of developmental verbal dyspraxia. Subsequently, targets of FOXP2 regulation have been associated with speech and language disorders, along with other genes. Here, we review these recent findings that implicate genetic factors in human speech. Due to the exclusivity of language to humans, no single animal model is sufficient to study the complete behavioral effects of these genes. Fortunately, some animals possess subcomponents of language. One such subcomponent is vocal learning, which though rare in the animal kingdom, is shared with songbirds. We therefore discuss how songbird studies have contributed to the current understanding of genetic factors that impact human speech, and support the continued use of this animal model for such studies in the future. PMID:26052371
Perceptual Learning of Speech under Optimal and Adverse Conditions
Zhang, Xujin; Samuel, Arthur G.
2014-01-01
Humans have a remarkable ability to understand spoken language despite the large amount of variability in speech. Previous research has shown that listeners can use lexical information to guide their interpretation of atypical sounds in speech (Norris, McQueen, & Cutler, 2003). This kind of lexically induced perceptual learning enables people to adjust to the variations in utterances due to talker-specific characteristics, such as individual identity and dialect. The current study investigated perceptual learning in two optimal conditions: conversational speech (Experiment 1) vs. clear speech (Experiment 2), and three adverse conditions: noise (Experiment 3a) vs. two cognitive loads (Experiments 4a & 4b). Perceptual learning occurred in the two optimal conditions and in the two cognitive load conditions, but not in the noise condition. Furthermore, perceptual learning occurred only in the first of two sessions for each participant, and only for atypical /s/ sounds and not for atypical /f/ sounds. This pattern of learning and non-learning reflects a balance between flexibility and stability that the speech system must have to deal with speech variability in the diverse conditions that speech is encountered. PMID:23815478
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.
Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina
2015-07-01
It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line with the 'auditory-visual view' of auditory speech perception, which assumes that auditory speech recognition is optimized by using predictions from previously encoded speaker-specific audio-visual internal models. Copyright © 2015 Elsevier Ltd. All rights reserved.
Gesture helps learners learn, but not merely by guiding their visual attention.
Wakefield, Elizabeth; Novack, Miriam A; Congdon, Eliza L; Franconeri, Steven; Goldin-Meadow, Susan
2018-04-16
Teaching a new concept through gestures-hand movements that accompany speech-facilitates learning above-and-beyond instruction through speech alone (e.g., Singer & Goldin-Meadow, ). However, the mechanisms underlying this phenomenon are still under investigation. Here, we use eye tracking to explore one often proposed mechanism-gesture's ability to direct visual attention. Behaviorally, we replicate previous findings: Children perform significantly better on a posttest after learning through Speech+Gesture instruction than through Speech Alone instruction. Using eye tracking measures, we show that children who watch a math lesson with gesture do allocate their visual attention differently from children who watch a math lesson without gesture-they look more to the problem being explained, less to the instructor, and are more likely to synchronize their visual attention with information presented in the instructor's speech (i.e., follow along with speech) than children who watch the no-gesture lesson. The striking finding is that, even though these looking patterns positively predict learning outcomes, the patterns do not mediate the effects of training condition (Speech Alone vs. Speech+Gesture) on posttest success. We find instead a complex relation between gesture and visual attention in which gesture moderates the impact of visual looking patterns on learning-following along with speech predicts learning for children in the Speech+Gesture condition, but not for children in the Speech Alone condition. Gesture's beneficial effects on learning thus come not merely from its ability to guide visual attention, but also from its ability to synchronize with speech and affect what learners glean from that speech. © 2018 John Wiley & Sons Ltd.
Modeling Co-evolution of Speech and Biology.
de Boer, Bart
2016-04-01
Two computer simulations are investigated that model interaction of cultural evolution of language and biological evolution of adaptations to language. Both are agent-based models in which a population of agents imitates each other using realistic vowels. The agents evolve under selective pressure for good imitation. In one model, the evolution of the vocal tract is modeled; in the other, a cognitive mechanism for perceiving speech accurately is modeled. In both cases, biological adaptations to using and learning speech evolve, even though the system of speech sounds itself changes at a more rapid time scale than biological evolution. However, the fact that the available acoustic space is used maximally (a self-organized result of cultural evolution) is constant, and therefore biological evolution does have a stable target. This work shows that when cultural and biological traits are continuous, their co-evolution may lead to cognitive adaptations that are strong enough to detect empirically. Copyright © 2016 Cognitive Science Society, Inc.
Kaipa, Ramesh; Jones, Richard D; Robb, Michael P
2016-07-01
The benefits of different practice conditions in limb-based rehabilitation of motor disorders are well documented. Conversely, the role of practice structure in the treatment of motor-based speech disorders has only been minimally investigated. Considering this limitation, the current study aimed to investigate the effectiveness of selected practice conditions in spatial and temporal learning of novel speech utterances in individuals with Parkinson's disease (PD). Participants included 16 individuals with PD who were randomly and equally assigned to constant, variable, random, and blocked practice conditions. Participants in all four groups practiced a speech phrase for two consecutive days, and reproduced the speech phrase on the third day without further practice or feedback. There were no significant differences (p > 0.05) between participants across the four practice conditions with respect to either spatial or temporal learning of the speech phrase. Overall, PD participants demonstrated diminished spatial and temporal learning in comparison to healthy controls. Tests of strength of association between participants' demographic/clinical characteristics and speech-motor learning outcomes did not reveal any significant correlations. The findings from the current study suggest that repeated practice facilitates speech-motor learning in individuals with PD irrespective of the type of practice. Clinicians need to be cautious in applying practice conditions to treat speech deficits associated with PD based on the findings of non-speech-motor learning tasks. Copyright © 2016 Elsevier Ltd. All rights reserved.
Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel
Kleinschmidt, Dave F.; Jaeger, T. Florian
2016-01-01
Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker’s /p/ might be physically indistinguishable from another talker’s /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively non-stationary world and propose that the speech perception system overcomes this challenge by (1) recognizing previously encountered situations, (2) generalizing to other situations based on previous similar experience, and (3) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (1) to (3) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on two critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these two aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension. PMID:25844873
The Role of Visual Speech Information in Supporting Perceptual Learning of Degraded Speech
ERIC Educational Resources Information Center
Wayne, Rachel V.; Johnsrude, Ingrid S.
2012-01-01
Following cochlear implantation, hearing-impaired listeners must adapt to speech as heard through their prosthesis. Visual speech information (VSI; the lip and facial movements of speech) is typically available in everyday conversation. Here, we investigate whether learning to understand a popular auditory simulation of speech as transduced by a…
Transferring of speech movements from video to 3D face space.
Pei, Yuru; Zha, Hongbin
2007-01-01
We present a novel method for transferring speech animation recorded in low quality videos to high resolution 3D face models. The basic idea is to synthesize the animated faces by an interpolation based on a small set of 3D key face shapes which span a 3D face space. The 3D key shapes are extracted by an unsupervised learning process in 2D video space to form a set of 2D visemes which are then mapped to the 3D face space. The learning process consists of two main phases: 1) Isomap-based nonlinear dimensionality reduction to embed the video speech movements into a low-dimensional manifold and 2) K-means clustering in the low-dimensional space to extract 2D key viseme frames. Our main contribution is that we use the Isomap-based learning method to extract intrinsic geometry of the speech video space and thus to make it possible to define the 3D key viseme shapes. To do so, we need only to capture a limited number of 3D key face models by using a general 3D scanner. Moreover, we also develop a skull movement recovery method based on simple anatomical structures to enhance 3D realism in local mouth movements. Experimental results show that our method can achieve realistic 3D animation effects with a small number of 3D key face models.
ERIC Educational Resources Information Center
Guo, Shesen
2003-01-01
A computer-assisted learning/teaching model is conceived with implications of constructivist theory and an analogy between the traditional art form Shuanghuang and the teaching/learning environment. The virtual character of the model interacts with the learner, in the form of human behavior and speech supported by recognition biometrics,…
Noise Hampers Children’s Expressive Word Learning
Riley, Kristine Grohne; McGregor, Karla K.
2013-01-01
Purpose To determine the effects of noise and speech style on word learning in typically developing school-age children. Method Thirty-one participants ages 9;0 (years; months) to 10;11 attempted to learn 2 sets of 8 novel words and their referents. They heard all of the words 13 times each within meaningful narrative discourse. Signal-to-noise ratio (noise vs. quiet) and speech style (plain vs. clear) were manipulated such that half of the children heard the new words in broadband white noise and half heard them in quiet; within those conditions, each child heard one set of words produced in a plain speech style and another set in a clear speech style. Results Children who were trained in quiet learned to produce the word forms more accurately than those who were trained in noise. Clear speech resulted in more accurate word form productions than plain speech, whether the children had learned in noise or quiet. Learning from clear speech in noise and plain speech in quiet produced comparable results. Conclusion Noise limits expressive vocabulary growth in children, reducing the quality of word form representation in the lexicon. Clear speech input can aid expressive vocabulary growth in children, even in noisy environments. PMID:22411494
Neger, Thordis M.; Rietveld, Toni; Janse, Esther
2014-01-01
Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly) draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech. In the present study, 73 older adults (aged over 60 years) and 60 younger adults (aged between 18 and 30 years) performed a visual artificial grammar learning task and were presented with 60 meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory, and processing speed). Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly. PMID:25225475
Neger, Thordis M; Rietveld, Toni; Janse, Esther
2014-01-01
Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly) draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech. In the present study, 73 older adults (aged over 60 years) and 60 younger adults (aged between 18 and 30 years) performed a visual artificial grammar learning task and were presented with 60 meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory, and processing speed). Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly.
Oppenheim, Gary M; Dell, Gary S; Schwartz, Myrna F
2010-02-01
Naming a picture of a dog primes the subsequent naming of a picture of a dog (repetition priming) and interferes with the subsequent naming of a picture of a cat (semantic interference). Behavioral studies suggest that these effects derive from persistent changes in the way that words are activated and selected for production, and some have claimed that the findings are only understandable by positing a competitive mechanism for lexical selection. We present a simple model of lexical retrieval in speech production that applies error-driven learning to its lexical activation network. This model naturally produces repetition priming and semantic interference effects. It predicts the major findings from several published experiments, demonstrating that these effects may arise from incremental learning. Furthermore, analysis of the model suggests that competition during lexical selection is not necessary for semantic interference if the learning process is itself competitive. Copyright 2009 Elsevier B.V. All rights reserved.
Winograd, Claudia; Ceman, Stephanie
2012-01-01
Fragile X syndrome (FXS) is the most common cause of inherited intellectual disability and presents with markedly atypical speech-language, likely due to impaired vocal learning. Although current models have been useful for studies of some aspects of FXS, zebra finch is the only tractable lab model for vocal learning. The neural circuits for vocal learning in the zebra finch have clear relationships to the pathways in the human brain that may be affected in FXS. Further, finch vocal learning may be quantified using software designed specifically for this purpose. Knockdown of the zebra finch FMR1 gene may ultimately enable novel tests of therapies that are modality-specific, using drugs or even social strategies, to ameliorate deficits in vocal development and function. In this chapter, we describe the utility of the zebra finch model and present a hypothesis for the role of FMRP in the developing neural circuitry for vocalization.
Rhythm Perception and Its Role in Perception and Learning of Dysrhythmic Speech.
Borrie, Stephanie A; Lansford, Kaitlin L; Barrett, Tyson S
2017-03-01
The perception of rhythm cues plays an important role in recognizing spoken language, especially in adverse listening conditions. Indeed, this has been shown to hold true even when the rhythm cues themselves are dysrhythmic. This study investigates whether expertise in rhythm perception provides a processing advantage for perception (initial intelligibility) and learning (intelligibility improvement) of naturally dysrhythmic speech, dysarthria. Fifty young adults with typical hearing participated in 3 key tests, including a rhythm perception test, a receptive vocabulary test, and a speech perception and learning test, with standard pretest, familiarization, and posttest phases. Initial intelligibility scores were calculated as the proportion of correct pretest words, while intelligibility improvement scores were calculated by subtracting this proportion from the proportion of correct posttest words. Rhythm perception scores predicted intelligibility improvement scores but not initial intelligibility. On the other hand, receptive vocabulary scores predicted initial intelligibility scores but not intelligibility improvement. Expertise in rhythm perception appears to provide an advantage for processing dysrhythmic speech, but a familiarization experience is required for the advantage to be realized. Findings are discussed in relation to the role of rhythm in speech processing and shed light on processing models that consider the consequence of rhythm abnormalities in dysarthria.
Effects of synthetic speech output in the learning of graphic symbols of varied iconicity.
Koul, Rajinder; Schlosser, Ralf
To examine the effects of additional auditory feedback from synthetic speech on the learning of high translucent symbols versus low translucent symbols. Two adults with little or no functional speech and severe intellectual disabilities served as participants. A single-subject ABACA/ACABA design was used to study the relative effects of two treatments: symbol training in the presence and absence of synthetic speech output. The results clearly indicated that the two treatments, rather than extraneous variables were responsible for gains in the symbol learning. Both participants learned either more low translucent symbols or reached their maximum learning of low translucent symbols in the speech output condition. The results of this preliminary study replicate and extend the iconicity hypothesis to a new set of learning conditions involving speech output, and suggest that feedback from speech output may assist adults with profound intellectual disabilities in coding particularly those symbols whose association with their referent cannot be coded via their visual resemblance with the referent.
Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten
2018-01-01
Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.
Pattern learning with deep neural networks in EMG-based speech recognition.
Wand, Michael; Schultz, Tanja
2014-01-01
We report on classification of phones and phonetic features from facial electromyographic (EMG) data, within the context of our EMG-based Silent Speech interface. In this paper we show that a Deep Neural Network can be used to perform this classification task, yielding a significant improvement over conventional Gaussian Mixture models. Our central contribution is the visualization of patterns which are learned by the neural network. With increasing network depth, these patterns represent more and more intricate electromyographic activity.
Speech processing using maximum likelihood continuity mapping
Hogden, John E.
2000-01-01
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.E.
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Lip Movement Exaggerations During Infant-Directed Speech
Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana
2011-01-01
Purpose Although a growing body of literature has indentified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their infants. Method Lip movements were recorded from 25 mothers as they spoke to their infants and other adults. Lip shapes were analyzed for differences across speaking conditions. The maximum fundamental frequency, duration, acoustic intensity, and first and second formant frequency of each vowel also were measured. Results Lip movements were significantly larger during IDS than during adult-directed speech, although the exaggerations were vowel specific. All of the vowels produced during IDS were characterized by an elevated vocal pitch and a slowed speaking rate when compared with vowels produced during adult-directed speech. Conclusion The pattern of lip-shape exaggerations did not provide support for the hypothesis that mothers produce exemplar visual models of vowels during IDS. Future work is required to determine whether the observed increases in vertical lip aperture engender visual and acoustic enhancements that facilitate the early learning of speech. PMID:20699342
Promoting lexical learning in the speech and language therapy of children with cochlear implants.
Ronkainen, Riitta; Laakso, Minna; Lonka, Eila; Tykkyläinen, Tuula
2017-01-01
This study examines lexical intervention sessions in speech and language therapy for children with cochlear implants (CIs). Particular focus is on the therapist's professional practices in doing the therapy. The participants in this study are three congenitally deaf children with CIs together with their speech and language therapist. The video recorded therapy sessions of these children are studied using conversation analysis. The analysis reveals the ways in which the speech and language therapist formulates her speaking turns to support the children's lexical learning in task interaction. The therapist's multimodal practices, for example linguistic and acoustic highlighting, focus both on the lexical meaning and the phonological form of the words. Using these means, the therapist expands the child's lexical networks, specifies and corrects the meaning of the target words, and models the correct phonological form of the words. The findings of this study are useful in providing information for clinicians and speech and language therapy students working with children who have CIs as well as for the children's parents.
Elevated depressive symptoms enhance reflexive but not reflective auditory category learning.
Maddox, W Todd; Chandrasekaran, Bharath; Smayda, Kirsten; Yi, Han-Gyol; Koslov, Seth; Beevers, Christopher G
2014-09-01
In vision an extensive literature supports the existence of competitive dual-processing systems of category learning that are grounded in neuroscience and are partially-dissociable. The reflective system is prefrontally-mediated and uses working memory and executive attention to develop and test rules for classifying in an explicit fashion. The reflexive system is striatally-mediated and operates by implicitly associating perception with actions that lead to reinforcement. Although categorization is fundamental to auditory processing, little is known about the learning systems that mediate auditory categorization and even less is known about the effects of individual difference in the relative efficiency of the two learning systems. Previous studies have shown that individuals with elevated depressive symptoms show deficits in reflective processing. We exploit this finding to test critical predictions of the dual-learning systems model in audition. Specifically, we examine the extent to which the two systems are dissociable and competitive. We predicted that elevated depressive symptoms would lead to reflective-optimal learning deficits but reflexive-optimal learning advantages. Because natural speech category learning is reflexive in nature, we made the prediction that elevated depressive symptoms would lead to superior speech learning. In support of our predictions, individuals with elevated depressive symptoms showed a deficit in reflective-optimal auditory category learning, but an advantage in reflexive-optimal auditory category learning. In addition, individuals with elevated depressive symptoms showed an advantage in learning a non-native speech category structure. Computational modeling suggested that the elevated depressive symptom advantage was due to faster, more accurate, and more frequent use of reflexive category learning strategies in individuals with elevated depressive symptoms. The implications of this work for dual-process approach to auditory learning and depression are discussed. Copyright © 2014 Elsevier Ltd. All rights reserved.
Elevated Depressive Symptoms Enhance Reflexive but not Reflective Auditory Category Learning
Maddox, W. Todd; Chandrasekaran, Bharath; Smayda, Kirsten; Yi, Han-Gyol; Koslov, Seth; Beevers, Christopher G.
2014-01-01
In vision an extensive literature supports the existence of competitive dual-processing systems of category learning that are grounded in neuroscience and are partially-dissociable. The reflective system is prefrontally-mediated and uses working memory and executive attention to develop and test rules for classifying in an explicit fashion. The reflexive system is striatally-mediated and operates by implicitly associating perception with actions that lead to reinforcement. Although categorization is fundamental to auditory processing, little is known about the learning systems that mediate auditory categorization and even less is known about the effects of individual difference in the relative efficiency of the two learning systems. Previous studies have shown that individuals with elevated depressive symptoms show deficits in reflective processing. We exploit this finding to test critical predictions of the dual-learning systems model in audition. Specifically, we examine the extent to which the two systems are dissociable and competitive. We predicted that elevated depressive symptoms would lead to reflective-optimal learning deficits but reflexive-optimal learning advantages. Because natural speech category learning is reflexive in nature, we made the prediction that elevated depressive symptoms would lead to superior speech learning. In support of our predictions, individuals with elevated depressive symptoms showed a deficit in reflective-optimal auditory category learning, but an advantage in reflexive-optimal auditory category learning. In addition, individuals with elevated depressive symptoms showed an advantage in learning a non-native speech category structure. Computational modeling suggested that the elevated depressive symptom advantage was due to faster, more accurate, and more frequent use of reflexive category learning strategies in individuals with elevated depressive symptoms. The implications of this work for dual-process approach to auditory learning and depression are discussed. PMID:25041936
McCartney, Elspeth; Boyle, James; Ellis, Sue; Bannatyne, Susan; Turnbull, Mary
2011-01-01
A manualized language therapy developed via a randomized controlled trial had proved efficacious in the short-term in developing expressive language for mainstream primary school children with persistent language impairment. This therapy had been delivered to a predetermined schedule by speech and language therapists or speech and language therapy assistants to children individually or in groups. However, this model of service delivery is no longer the most common model in UK schools, where indirect consultancy approaches with intervention delivered by school staff are often used. A cohort study was undertaken to investigate whether the therapy was equally efficacious when delivered to comparable children by school staff, rather than speech and language therapists or speech and language therapy assistants. Children in the cohort study were selected using the same criteria as in the randomized controlled trial, and the same manualized therapy was used, but delivered by mainstream school staff using a consultancy model common in the UK. Outcomes were compared with those of randomized controlled trial participants. The gains in expressive language measured in the randomized controlled trial were not replicated in the cohort study. Less language-learning activity was recorded than had been planned, and less than was delivered in the randomized controlled trial. Implications for 'consultancy' speech and language therapist service delivery models in mainstream schools are outlined. At present, the more efficacious therapy is that delivered by speech and language therapists or speech and language therapy assistants to children individually or in groups. This may be related to more faithful adherence to the interventions schedule, and to a probably greater amount of language-learning activity undertaken. Intervention delivered via school-based 'consultancy' approaches in schools will require to be carefully monitored by schools and SLT services. © 2010 Royal College of Speech & Language Therapists.
Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech
ERIC Educational Resources Information Center
Pilling, Michael; Thomas, Sharon
2011-01-01
Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…
Perceptual Learning of Time-Compressed Speech: More than Rapid Adaptation
Banai, Karen; Lavner, Yizhar
2012-01-01
Background Time-compressed speech, a form of rapidly presented speech, is harder to comprehend than natural speech, especially for non-native speakers. Although it is possible to adapt to time-compressed speech after a brief exposure, it is not known whether additional perceptual learning occurs with further practice. Here, we ask whether multiday training on time-compressed speech yields more learning than that observed during the initial adaptation phase and whether the pattern of generalization following successful learning is different than that observed with initial adaptation only. Methodology/Principal Findings Two groups of non-native Hebrew speakers were tested on five different conditions of time-compressed speech identification in two assessments conducted 10–14 days apart. Between those assessments, one group of listeners received five practice sessions on one of the time-compressed conditions. Between the two assessments, trained listeners improved significantly more than untrained listeners on the trained condition. Furthermore, the trained group generalized its learning to two untrained conditions in which different talkers presented the trained speech materials. In addition, when the performance of the non-native speakers was compared to that of a group of naïve native Hebrew speakers, performance of the trained group was equivalent to that of the native speakers on all conditions on which learning occurred, whereas performance of the untrained non-native listeners was substantially poorer. Conclusions/Significance Multiday training on time-compressed speech results in significantly more perceptual learning than brief adaptation. Compared to previous studies of adaptation, the training induced learning is more stimulus specific. Taken together, the perceptual learning of time-compressed speech appears to progress from an initial, rapid adaptation phase to a subsequent prolonged and more stimulus specific phase. These findings are consistent with the predictions of the Reverse Hierarchy Theory of perceptual learning and suggest constraints on the use of perceptual-learning regimens during second language acquisition. PMID:23056592
Perceptual statistical learning over one week in child speech production.
Richtsmeier, Peter T; Goffman, Lisa
2017-07-01
What cognitive mechanisms account for the trajectory of speech sound development, in particular, gradually increasing accuracy during childhood? An intriguing potential contributor is statistical learning, a type of learning that has been studied frequently in infant perception but less often in child speech production. To assess the relevance of statistical learning to developing speech accuracy, we carried out a statistical learning experiment with four- and five-year-olds in which statistical learning was examined over one week. Children were familiarized with and tested on word-medial consonant sequences in novel words. There was only modest evidence for statistical learning, primarily in the first few productions of the first session. This initial learning effect nevertheless aligns with previous statistical learning research. Furthermore, the overall learning effect was similar to an estimate of weekly accuracy growth based on normative studies. The results implicate other important factors in speech sound development, particularly learning via production. Copyright © 2017 Elsevier Inc. All rights reserved.
Interprofessional Peer-Assisted Learning as a Model of Instruction in Doctor of Audiology Programs.
Serpanos, Yula C; Senzer, Deborah; Gordon, Daryl M
2017-09-18
This study reports on interprofessional peer-assisted learning (PAL) as a model of instruction in the preparation of doctoral audiology students. Ten Doctor of Audiology (AuD) students provided training in audiologic screening for 53 graduate speech-language pathology students in 9 individual PAL sessions. Pre- and post-surveys assessed the peer teaching experience for AuD students in 5 areas of their confidence in audiologic screening: knowledge, skill, making a referral based on outcomes, teaching, and supervising. Pre- and post-learning outcomes in audiologic screening for the speech-language pathology student trainees determined the effectiveness of training by their AuD student peers. Survey outcomes revealed significant (p < .001) improvement in the overall confidence of AuD student peer instructors. Speech-language pathology students trained by their AuD peers exhibited significant (p = .003) improvements in their knowledge and skill and making outcome-based referrals in audiologic screening, supporting the effectiveness of the PAL paradigm. In addition to meeting required accreditation and professional certification competency standards, the PAL instructional model offers an innovative curricular approach in interprofessional education and in the teaching and supervisory preparation of students in doctoral audiology programs.
Large-Scale Modeling of Wordform Learning and Representation
ERIC Educational Resources Information Center
Sibley, Daragh E.; Kello, Christopher T.; Plaut, David C.; Elman, Jeffrey L.
2008-01-01
The forms of words as they appear in text and speech are central to theories and models of lexical processing. Nonetheless, current methods for simulating their learning and representation fail to approach the scale and heterogeneity of real wordform lexicons. A connectionist architecture termed the "sequence encoder" is used to learn…
Perceptual learning of speech under optimal and adverse conditions.
Zhang, Xujin; Samuel, Arthur G
2014-02-01
Humans have a remarkable ability to understand spoken language despite the large amount of variability in speech. Previous research has shown that listeners can use lexical information to guide their interpretation of atypical sounds in speech (Norris, McQueen, & Cutler, 2003). This kind of lexically induced perceptual learning enables people to adjust to the variations in utterances due to talker-specific characteristics, such as individual identity and dialect. The current study investigated perceptual learning in two optimal conditions: conversational speech (Experiment 1) versus clear speech (Experiment 2), and three adverse conditions: noise (Experiment 3a) versus two cognitive loads (Experiments 4a and 4b). Perceptual learning occurred in the two optimal conditions and in the two cognitive load conditions, but not in the noise condition. Furthermore, perceptual learning occurred only in the first of two sessions for each participant, and only for atypical /s/ sounds and not for atypical /f/ sounds. This pattern of learning and nonlearning reflects a balance between flexibility and stability that the speech system must have to deal with speech variability in the diverse conditions that speech is encountered. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Foxp2 mutations impair auditory-motor association learning.
Kurt, Simone; Fisher, Simon E; Ehret, Günter
2012-01-01
Heterozygous mutations of the human FOXP2 transcription factor gene cause the best-described examples of monogenic speech and language disorders. Acquisition of proficient spoken language involves auditory-guided vocal learning, a specialized form of sensory-motor association learning. The impact of etiological Foxp2 mutations on learning of auditory-motor associations in mammals has not been determined yet. Here, we directly assess this type of learning using a newly developed conditioned avoidance paradigm in a shuttle-box for mice. We show striking deficits in mice heterozygous for either of two different Foxp2 mutations previously implicated in human speech disorders. Both mutations cause delays in acquiring new motor skills. The magnitude of impairments in association learning, however, depends on the nature of the mutation. Mice with a missense mutation in the DNA-binding domain are able to learn, but at a much slower rate than wild type animals, while mice carrying an early nonsense mutation learn very little. These results are consistent with expression of Foxp2 in distributed circuits of the cortex, striatum and cerebellum that are known to play key roles in acquisition of motor skills and sensory-motor association learning, and suggest differing in vivo effects for distinct variants of the Foxp2 protein. Given the importance of such networks for the acquisition of human spoken language, and the fact that similar mutations in human FOXP2 cause problems with speech development, this work opens up a new perspective on the use of mouse models for understanding pathways underlying speech and language disorders.
Social interaction shapes babbling: Testing parallels between birdsong and speech
NASA Astrophysics Data System (ADS)
Goldstein, Michael H.; King, Andrew P.; West, Meredith J.
2003-06-01
Birdsong is considered a model of human speech development at behavioral and neural levels. Few direct tests of the proposed analogs exist, however. Here we test a mechanism of phonological development in human infants that is based on social shaping, a selective learning process first documented in songbirds. By manipulating mothers' reactions to their 8-month-old infants' vocalizations, we demonstrate that phonological features of babbling are sensitive to nonimitative social stimulation. Contingent, but not noncontingent, maternal behavior facilitates more complex and mature vocal behavior. Changes in vocalizations persist after the manipulation. The data show that human infants use social feedback, facilitating immediate transitions in vocal behavior. Social interaction creates rapid shifts to developmentally more advanced sounds. These transitions mirror the normal development of speech, supporting the predictions of the avian social shaping model. These data provide strong support for a parallel in function between vocal precursors of songbirds and infants. Because imitation is usually considered the mechanism for vocal learning in both taxa, the findings introduce social shaping as a general process underlying the development of speech and song.
Howard, Ian S.; Messum, Piers
2014-01-01
Words are made up of speech sounds. Almost all accounts of child speech development assume that children learn the pronunciation of first language (L1) speech sounds by imitation, most claiming that the child performs some kind of auditory matching to the elements of ambient speech. However, there is evidence to support an alternative account and we investigate the non-imitative child behavior and well-attested caregiver behavior that this account posits using Elija, a computational model of an infant. Through unsupervised active learning, Elija began by discovering motor patterns, which produced sounds. In separate interaction experiments, native speakers of English, French and German then played the role of his caregiver. In their first interactions with Elija, they were allowed to respond to his sounds if they felt this was natural. We analyzed the interactions through phonemic transcriptions of the caregivers' utterances and found that they interpreted his output within the framework of their native languages. Their form of response was almost always a reformulation of Elija's utterance into well-formed sounds of L1. Elija retained those motor patterns to which a caregiver responded and formed associations between his motor pattern and the response it provoked. Thus in a second phase of interaction, he was able to parse input utterances in terms of the caregiver responses he had heard previously, and respond using his associated motor patterns. This capacity enabled the caregivers to teach Elija to pronounce some simple words in their native languages, by his serial imitation of the words' component speech sounds. Overall, our results demonstrate that the natural responses and behaviors of human subjects to infant-like vocalizations can take a computational model from a biologically plausible initial state through to word pronunciation. This provides support for an alternative to current auditory matching hypotheses for how children learn to pronounce. PMID:25333740
ERIC Educational Resources Information Center
Cordier, Deborah
2009-01-01
A renewed focus on foreign language (FL) learning and speech for communication has resulted in computer-assisted language learning (CALL) software developed with Automatic Speech Recognition (ASR). ASR features for FL pronunciation (Lafford, 2004) are functional components of CALL designs used for FL teaching and learning. The ASR features…
STANFORD ARTIFICIAL INTELLIGENCE PROJECT.
ARTIFICIAL INTELLIGENCE , GAME THEORY, DECISION MAKING, BIONICS, AUTOMATA, SPEECH RECOGNITION, GEOMETRIC FORMS, LEARNING MACHINES, MATHEMATICAL MODELS, PATTERN RECOGNITION, SERVOMECHANISMS, SIMULATION, BIBLIOGRAPHIES.
Multi-sensory learning and learning to read.
Blomert, Leo; Froyen, Dries
2010-09-01
The basis of literacy acquisition in alphabetic orthographies is the learning of the associations between the letters and the corresponding speech sounds. In spite of this primacy in learning to read, there is only scarce knowledge on how this audiovisual integration process works and which mechanisms are involved. Recent electrophysiological studies of letter-speech sound processing have revealed that normally developing readers take years to automate these associations and dyslexic readers hardly exhibit automation of these associations. It is argued that the reason for this effortful learning may reside in the nature of the audiovisual process that is recruited for the integration of in principle arbitrarily linked elements. It is shown that letter-speech sound integration does not resemble the processes involved in the integration of natural audiovisual objects such as audiovisual speech. The automatic symmetrical recruitment of the assumedly uni-sensory visual and auditory cortices in audiovisual speech integration does not occur for letter and speech sound integration. It is also argued that letter-speech sound integration only partly resembles the integration of arbitrarily linked unfamiliar audiovisual objects. Letter-sound integration and artificial audiovisual objects share the necessity of a narrow time window for integration to occur. However, they differ from these artificial objects, because they constitute an integration of partly familiar elements which acquire meaning through the learning of an orthography. Although letter-speech sound pairs share similarities with audiovisual speech processing as well as with unfamiliar, arbitrary objects, it seems that letter-speech sound pairs develop into unique audiovisual objects that furthermore have to be processed in a unique way in order to enable fluent reading and thus very likely recruit other neurobiological learning mechanisms than the ones involved in learning natural or arbitrary unfamiliar audiovisual associations. Copyright 2010 Elsevier B.V. All rights reserved.
Congdon, Eliza L; Novack, Miriam A; Brooks, Neon; Hemani-Lopez, Naureen; O'Keefe, Lucy; Goldin-Meadow, Susan
2017-08-01
When teachers gesture during instruction, children retain and generalize what they are taught (Goldin-Meadow, 2014). But why does gesture have such a powerful effect on learning? Previous research shows that children learn most from a math lesson when teachers present one problem-solving strategy in speech while simultaneously presenting a different, but complementary, strategy in gesture (Singer & Goldin-Meadow, 2005). One possibility is that gesture is powerful in this context because it presents information simultaneously with speech. Alternatively, gesture may be effective simply because it involves the body, in which case the timing of information presented in speech and gesture may be less important for learning. Here we find evidence for the importance of simultaneity: 3 rd grade children retain and generalize what they learn from a math lesson better when given instruction containing simultaneous speech and gesture than when given instruction containing sequential speech and gesture. Interpreting these results in the context of theories of multimodal learning, we find that gesture capitalizes on its synchrony with speech to promote learning that lasts and can be generalized.
Translating birdsong: songbirds as a model for basic and applied medical research.
Brainard, Michael S; Doupe, Allison J
2013-07-08
Songbirds, long of interest to basic neuroscience, have great potential as a model system for translational neuroscience. Songbirds learn their complex vocal behavior in a manner that exemplifies general processes of perceptual and motor skill learning and, more specifically, resembles human speech learning. Song is subserved by circuitry that is specialized for vocal learning and production but that has strong similarities to mammalian brain pathways. The combination of highly quantifiable behavior and discrete neural substrates facilitates understanding links between brain and behavior, both in normal states and in disease. Here we highlight (a) behavioral and mechanistic parallels between birdsong and aspects of speech and social communication, including insights into mirror neurons, the function of auditory feedback, and genes underlying social communication disorders, and (b) contributions of songbirds to understanding cortical-basal ganglia circuit function and dysfunction, including the possibility of harnessing adult neurogenesis for brain repair.
Translating Birdsong: Songbirds as a model for basic and applied medical research
2014-01-01
Songbirds, long of interest to basic neuroscientists, have great potential as a model system for translational neuroscience. Songbirds learn their complex vocal behavior in a manner that exemplifies general processes of perceptual and motor skill learning, and more specifically resembles human speech learning. Song is subserved by circuitry that is specialized for vocal learning and production, but that has strong similarities to mammalian brain pathways. The combination of a highly quantifiable behavior and discrete neural substrates facilitates understanding links between brain and behavior, both normally and in disease. Here we highlight 1) behavioral and mechanistic parallels between birdsong and aspects of speech and social communication, including insights into mirror neurons, the function of auditory feedback, and genes underlying social communication disorders, and 2) contributions of songbirds to understanding cortical-basal ganglia circuit function and dysfunction, including the possibility of harnessing adult neurogenesis for brain repair. PMID:23750515
Reviewing the connection between speech and obstructive sleep apnea.
Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A
2016-02-20
Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea-hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients' condition. We first evaluate AHI prediction using state-of-the-art speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.
Terband, H; Maassen, B; Guenther, F H; Brumberg, J
2014-01-01
Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. The reader will be able to: (1) identify the difficulties in studying disordered speech motor development; (2) describe the differences in speech motor characteristics between SSD and subtype CAS; (3) describe the different types of learning that occur in the sensory-motor system during babbling and early speech acquisition; (4) identify the neural control subsystems involved in speech production; (5) describe the potential role of auditory self-monitoring in developmental speech disorders. Copyright © 2014 Elsevier Inc. All rights reserved.
Sleep, offline processing, and vocal learning
Margoliash, Daniel; Schmidt, Marc F
2009-01-01
The study of song learning and the neural song system has provided an important comparative model system for the study of speech and language acquisition. We describe some recent advances in the bird song system, focusing on the role of offline processing including sleep in processing sensory information and in guiding developmental song learning. These observations motivate a new model of the organization and role of the sensory memories in vocal learning. PMID:19906416
Rapid Statistical Learning Supporting Word Extraction From Continuous Speech.
Batterink, Laura J
2017-07-01
The identification of words in continuous speech, known as speech segmentation, is a critical early step in language acquisition. This process is partially supported by statistical learning, the ability to extract patterns from the environment. Given that speech segmentation represents a potential bottleneck for language acquisition, patterns in speech may be extracted very rapidly, without extensive exposure. This hypothesis was examined by exposing participants to continuous speech streams composed of novel repeating nonsense words. Learning was measured on-line using a reaction time task. After merely one exposure to an embedded novel word, learners demonstrated significant learning effects, as revealed by faster responses to predictable than to unpredictable syllables. These results demonstrate that learners gained sensitivity to the statistical structure of unfamiliar speech on a very rapid timescale. This ability may play an essential role in early stages of language acquisition, allowing learners to rapidly identify word candidates and "break in" to an unfamiliar language.
Nikopoulos, Christos K; Nikopoulou-Smyrni, Panagiota; Konstantopoulos, Kostas
2013-01-01
Research has shown that traumatic brain injury (TBI) can affect a person's ability to perform previously learned skills. Dysexecutive syndrome and inattention, for example, alongside a number of other cognitive and behavioural impairments such as memory loss and lack of motivation, significantly affect day-to-day functioning following TBI. This study examined the efficacy of video modelling in emerging speech in an adult male with TBI caused by an assault. In an effort to identify functional relations between this novice intervention and the target behaviour, experimental control was achieved by using within-system research methodology, overcoming difficulties of forming groups for such an highly non-homogeneous population. Across a number of conditions, the participant watched a videotape in which another adult modelled a selection of 19 spoken words. When this modelled behaviour was performed in vivo, then generalization across 76 other words in the absence of a videotape took place. It was revealed that video modelling can promote the performance of previously learned behaviours related to speech, but more significantly it can facilitate the generalization of this verbal behaviour across untrained words. Video modelling could well be added within the rehabilitation programmes for this population.
2017-03-01
the Center for Technology Enhanced Language Learning (CTELL), a research cell in the Department of Foreign Languages, United States Military Academy...models for automatic speech recognition (ASR), and to, thereby, investigate the utility of ASR in pedagogical technology . The corpus is a sample of...lexical resources, language technology 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF
Noise and communication: a three-year update.
Brammer, Anthony J; Laroche, Chantal
2012-01-01
Noise is omnipresent and impacts us all in many aspects of daily living. Noise can interfere with communication not only in industrial workplaces, but also in other work settings (e.g. open-plan offices, construction, and mining) and within buildings (e.g. residences, arenas, and schools). The interference of noise with communication can have significant social consequences, especially for persons with hearing loss, and may compromise safety (e.g. failure to perceive auditory warning signals), influence worker productivity and learning in children, affect health (e.g. vocal pathology, noise-induced hearing loss), compromise speech privacy, and impact social participation by the elderly. For workers, attempts have been made to: 1) Better define the auditory performance needed to function effectively and to directly measure these abilities when assessing Auditory Fitness for Duty, 2) design hearing protection devices that can improve speech understanding while offering adequate protection against loud noises, and 3) improve speech privacy in open-plan offices. As the elderly are particularly vulnerable to the effects of noise, an understanding of the interplay between auditory, cognitive, and social factors and its effect on speech communication and social participation is also critical. Classroom acoustics and speech intelligibility in children have also gained renewed interest because of the importance of effective speech comprehension in noise on learning. Finally, substantial work has been made in developing models aimed at better predicting speech intelligibility. Despite progress in various fields, the design of alarm signals continues to lag behind advancements in knowledge. This summary of the last three years' research highlights some of the most recent issues for the workplace, for older adults, and for children, as well as the effectiveness of warning sounds and models for predicting speech intelligibility. Suggestions for future work are also discussed.
Use of Computer Speech Technologies To Enhance Learning.
ERIC Educational Resources Information Center
Ferrell, Joe
1999-01-01
Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)
Live Speech Driven Head-and-Eye Motion Generators.
Le, Binh H; Ma, Xiaohan; Deng, Zhigang
2012-11-01
This paper describes a fully automated framework to generate realistic head motion, eye gaze, and eyelid motion simultaneously based on live (or recorded) speech input. Its central idea is to learn separate yet interrelated statistical models for each component (head motion, gaze, or eyelid motion) from a prerecorded facial motion data set: 1) Gaussian Mixture Models and gradient descent optimization algorithm are employed to generate head motion from speech features; 2) Nonlinear Dynamic Canonical Correlation Analysis model is used to synthesize eye gaze from head motion and speech features, and 3) nonnegative linear regression is used to model voluntary eye lid motion and log-normal distribution is used to describe involuntary eye blinks. Several user studies are conducted to evaluate the effectiveness of the proposed speech-driven head and eye motion generator using the well-established paired comparison methodology. Our evaluation results clearly show that this approach can significantly outperform the state-of-the-art head and eye motion generation algorithms. In addition, a novel mocap+video hybrid data acquisition technique is introduced to record high-fidelity head movement, eye gaze, and eyelid motion simultaneously.
Daliri, Ayoub; Max, Ludo
2018-02-01
Auditory modulation during speech movement planning is limited in adults who stutter (AWS), but the functional relevance of the phenomenon itself remains unknown. We investigated for AWS and adults who do not stutter (AWNS) (a) a potential relationship between pre-speech auditory modulation and auditory feedback contributions to speech motor learning and (b) the effect on pre-speech auditory modulation of real-time versus delayed auditory feedback. Experiment I used a sensorimotor adaptation paradigm to estimate auditory-motor speech learning. Using acoustic speech recordings, we quantified subjects' formant frequency adjustments across trials when continually exposed to formant-shifted auditory feedback. In Experiment II, we used electroencephalography to determine the same subjects' extent of pre-speech auditory modulation (reductions in auditory evoked potential N1 amplitude) when probe tones were delivered prior to speaking versus not speaking. To manipulate subjects' ability to monitor real-time feedback, we included speaking conditions with non-altered auditory feedback (NAF) and delayed auditory feedback (DAF). Experiment I showed that auditory-motor learning was limited for AWS versus AWNS, and the extent of learning was negatively correlated with stuttering frequency. Experiment II yielded several key findings: (a) our prior finding of limited pre-speech auditory modulation in AWS was replicated; (b) DAF caused a decrease in auditory modulation for most AWNS but an increase for most AWS; and (c) for AWS, the amount of auditory modulation when speaking with DAF was positively correlated with stuttering frequency. Lastly, AWNS showed no correlation between pre-speech auditory modulation (Experiment II) and extent of auditory-motor learning (Experiment I) whereas AWS showed a negative correlation between these measures. Thus, findings suggest that AWS show deficits in both pre-speech auditory modulation and auditory-motor learning; however, limited pre-speech modulation is not directly related to limited auditory-motor adaptation; and in AWS, DAF paradoxically tends to normalize their otherwise limited pre-speech auditory modulation. Copyright © 2017 Elsevier Ltd. All rights reserved.
DiDonato, Roberta M.; Surprenant, Aimée M.
2015-01-01
Communication success under adverse conditions requires efficient and effective recruitment of both bottom-up (sensori-perceptual) and top-down (cognitive-linguistic) resources to decode the intended auditory-verbal message. Employing these limited capacity resources has been shown to vary across the lifespan, with evidence indicating that younger adults out-perform older adults for both comprehension and memory of the message. This study examined how sources of interference arising from the speaker (message spoken with conversational vs. clear speech technique), the listener (hearing-listening and cognitive-linguistic factors), and the environment (in competing speech babble noise vs. quiet) interact and influence learning and memory performance using more ecologically valid methods than has been done previously. The results suggest that when older adults listened to complex medical prescription instructions with “clear speech,” (presented at audible levels through insertion earphones) their learning efficiency, immediate, and delayed memory performance improved relative to their performance when they listened with a normal conversational speech rate (presented at audible levels in sound field). This better learning and memory performance for clear speech listening was maintained even in the presence of speech babble noise. The finding that there was the largest learning-practice effect on 2nd trial performance in the conversational speech when the clear speech listening condition was first is suggestive of greater experience-dependent perceptual learning or adaptation to the speaker's speech and voice pattern in clear speech. This suggests that experience-dependent perceptual learning plays a role in facilitating the language processing and comprehension of a message and subsequent memory encoding. PMID:26106353
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
The Digichaint Interactive Game as a Virtual Learning Environment for Irish
ERIC Educational Resources Information Center
Ní Chiaráin, Neasa; Ní Chasaide, Ailbhe
2016-01-01
Although Text-To-Speech (TTS) synthesis has been little used in Computer-Assisted Language Learning (CALL), it is ripe for deployment, particularly for minority and endangered languages, where learners have little access to native speaker models and where few genuinely interactive and engaging teaching/learning materials are available. These…
Measuring and Modeling Sound Interference and Reverberation Time in Classrooms
NASA Astrophysics Data System (ADS)
Gumina, Kaitlyn; Martell, Eric
2015-04-01
Research shows that children, even those without hearing difficulties, are affected by poor classroom acoustics, especially children with hearing loss, learning disabilities, speech delay, and attention problems. Poor acoustics can come in a variety of forms, including destructive interference causing ``dead spots'' and extended Reverberation Times (RT), where echoes persist too long, interfering with further speech. In this research, I measured sound intensity at locations throughout three different types of classrooms at frequencies commonly associated with human speech to see what effect seating position has on intensity. I also used a program called Wave Cloud to model the time necessary for intensity to decrease by 60 decibels (RT50), both in idealized classrooms and in classrooms modeled on the ones I studied.
Christiansen, Morten H.; Onnis, Luca; Hockema, Stephen A.
2009-01-01
When learning language young children are faced with many seemingly formidable challenges, including discovering words embedded in a continuous stream of sounds and determining what role these words play in syntactic constructions. We suggest that knowledge of phoneme distributions may play a crucial part in helping children segment words and determine their lexical category, and propose an integrated model of how children might go from unsegmented speech to lexical categories. We corroborated this theoretical model using a two-stage computational analysis of a large corpus of English child-directed speech. First, we used transition probabilities between phonemes to find words in unsegmented speech. Second, we used distributional information about word edges—the beginning and ending phonemes of words—to predict whether the segmented words from the first stage were nouns, verbs, or something else. The results indicate that discovering lexical units and their associated syntactic category in child-directed speech is possible by attending to the statistics of single phoneme transitions and word-initial and final phonemes. Thus, we suggest that a core computational principle in language acquisition is that the same source of information is used to learn about different aspects of linguistic structure. PMID:19371361
Sengupta, Ranit
2015-01-01
Despite recent progress in our understanding of sensorimotor integration in speech learning, a comprehensive framework to investigate its neural basis is lacking at behaviorally relevant timescales. Structural and functional imaging studies in humans have helped us identify brain networks that support speech but fail to capture the precise spatiotemporal coordination within the networks that takes place during speech learning. Here we use neuronal oscillations to investigate interactions within speech motor networks in a paradigm of speech motor adaptation under altered feedback with continuous recording of EEG in which subjects adapted to the real-time auditory perturbation of a target vowel sound. As subjects adapted to the task, concurrent changes were observed in the theta-gamma phase coherence during speech planning at several distinct scalp regions that is consistent with the establishment of a feedforward map. In particular, there was an increase in coherence over the central region and a decrease over the fronto-temporal regions, revealing a redistribution of coherence over an interacting network of brain regions that could be a general feature of error-based motor learning in general. Our findings have implications for understanding the neural basis of speech motor learning and could elucidate how transient breakdown of neuronal communication within speech networks relates to speech disorders. PMID:25632078
Sleep underpins the plasticity of language production.
Gaskell, M Gareth; Warker, Jill; Lindsay, Shane; Frost, Rebecca; Guest, James; Snowdon, Reza; Stackhouse, Abigail
2014-07-01
The constraints that govern acceptable phoneme combinations in speech perception and production have considerable plasticity. We addressed whether sleep influences the acquisition of new constraints and their integration into the speech-production system. Participants repeated sequences of syllables in which two phonemes were artificially restricted to syllable onset or syllable coda, depending on the vowel in that sequence. After 48 sequences, participants either had a 90-min nap or remained awake. Participants then repeated 96 sequences so implicit constraint learning could be examined, and then were tested for constraint generalization in a forced-choice task. The sleep group, but not the wake group, produced speech errors at test that were consistent with restrictions on the placement of phonemes in training. Furthermore, only the sleep group generalized their learning to new materials. Polysomnography data showed that implicit constraint learning was associated with slow-wave sleep. These results show that sleep facilitates the integration of new linguistic knowledge with existing production constraints. These data have relevance for systems-consolidation models of sleep. © The Author(s) 2014.
Audiovisual cues and perceptual learning of spectrally distorted speech.
Pilling, Michael; Thomas, Sharon
2011-12-01
Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties of a cochlear implant with a 6 mm place mismatch: Experiment I found that participants showed significantly greater improvement in perceiving noise-vocoded speech when training gave AV cues than when it gave auditory cues alone. Experiment 2 compared training with AV cues with training which gave written feedback. These two methods did not significantly differ in the pattern of training they produced. Suggestions are made about the types of circumstances in which the two training methods might be found to differ in facilitating auditory perceptual learning of speech.
NASA Astrophysics Data System (ADS)
Gjaja, Marin N.
1997-11-01
Neural networks for supervised and unsupervised learning are developed and applied to problems in remote sensing, continuous map learning, and speech perception. Adaptive Resonance Theory (ART) models are real-time neural networks for category learning, pattern recognition, and prediction. Unsupervised fuzzy ART networks synthesize fuzzy logic and neural networks, and supervised ARTMAP networks incorporate ART modules for prediction and classification. New ART and ARTMAP methods resulting from analyses of data structure, parameter specification, and category selection are developed. Architectural modifications providing flexibility for a variety of applications are also introduced and explored. A new methodology for automatic mapping from Landsat Thematic Mapper (TM) and terrain data, based on fuzzy ARTMAP, is developed. System capabilities are tested on a challenging remote sensing problem, prediction of vegetation classes in the Cleveland National Forest from spectral and terrain features. After training at the pixel level, performance is tested at the stand level, using sites not seen during training. Results are compared to those of maximum likelihood classifiers, back propagation neural networks, and K-nearest neighbor algorithms. Best performance is obtained using a hybrid system based on a convex combination of fuzzy ARTMAP and maximum likelihood predictions. This work forms the foundation for additional studies exploring fuzzy ARTMAP's capability to estimate class mixture composition for non-homogeneous sites. Exploratory simulations apply ARTMAP to the problem of learning continuous multidimensional mappings. A novel system architecture retains basic ARTMAP properties of incremental and fast learning in an on-line setting while adding components to solve this class of problems. The perceptual magnet effect is a language-specific phenomenon arising early in infant speech development that is characterized by a warping of speech sound perception. An unsupervised neural network model is proposed that embodies two principal hypotheses supported by experimental data--that sensory experience guides language-specific development of an auditory neural map and that a population vector can predict psychological phenomena based on map cell activities. Model simulations show how a nonuniform distribution of map cell firing preferences can develop from language-specific input and give rise to the magnet effect.
Simulated learning environments in speech-language pathology: an Australian response.
MacBean, Naomi; Theodoros, Deborah; Davidson, Bronwyn; Hill, Anne E
2013-06-01
The rising demand for health professionals to service the Australian population is placing pressure on traditional approaches to clinical education in the allied health professions. Existing research suggests that simulated learning environments (SLEs) have the potential to increase student placement capacity while providing quality learning experiences with comparable or superior outcomes to traditional methods. This project investigated the current use of SLEs in Australian speech-language pathology curricula, and the potential future applications of SLEs to the clinical education curricula through an extensive consultative process with stakeholders (all 10 Australian universities offering speech-language pathology programs in 2010, Speech Pathology Australia, members of the speech-language pathology profession, and current student body). Current use of SLEs in speech-language pathology education was found to be limited, with additional resources required to further develop SLEs and maintain their use within the curriculum. Perceived benefits included: students' increased clinical skills prior to workforce placement, additional exposure to specialized areas of speech-language pathology practice, inter-professional learning, and richer observational experiences for novice students. Stakeholders perceived SLEs to have considerable potential for clinical learning. A nationally endorsed recommendation for SLE development and curricula integration was prepared.
Arnold, Denis; Tomaschek, Fabian; Sering, Konstantin; Lopez, Florence; Baayen, R Harald
2017-01-01
Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20-44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a 'wide' yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.
Räsänen, Okko; Kakouros, Sofoklis; Soderstrom, Melanie
2018-06-06
The exaggerated intonation and special rhythmic properties of infant-directed speech (IDS) have been hypothesized to attract infants' attention to the speech stream. However, there has been little work actually connecting the properties of IDS to models of attentional processing or perceptual learning. A number of such attention models suggest that surprising or novel perceptual inputs attract attention, where novelty can be operationalized as the statistical (un)predictability of the stimulus in the given context. Since prosodic patterns such as F0 contours are accessible to young infants who are also known to be adept statistical learners, the present paper investigates a hypothesis that F0 contours in IDS are less predictable than those in adult-directed speech (ADS), given previous exposure to both speaking styles, thereby potentially tapping into basic attentional mechanisms of the listeners in a similar manner that relative probabilities of other linguistic patterns are known to modulate attentional processing in infants and adults. Computational modeling analyses with naturalistic IDS and ADS speech from matched speakers and contexts show that IDS intonation has lower overall temporal predictability even when the F0 contours of both speaking styles are normalized to have equal means and variances. A closer analysis reveals that there is a tendency of IDS intonation to be less predictable at the end of short utterances, whereas ADS exhibits more stable average predictability patterns across the full extent of the utterances. The difference between IDS and ADS persists even when the proportion of IDS and ADS exposure is varied substantially, simulating different relative amounts of IDS heard in different family and cultural environments. Exposure to IDS is also found to be more efficient for predicting ADS intonation contours in new utterances than exposure to the equal amount of ADS speech. This indicates that the more variable prosodic contours of IDS also generalize to ADS, and may therefore enhance prosodic learning in infancy. Overall, the study suggests that one reason behind infant preference for IDS could be its higher information value at the prosodic level, as measured by the amount of surprisal in the F0 contours. This provides the first formal link between the properties of IDS and the models of attentional processing and statistical learning in the brain. However, this finding does not rule out the possibility that other differences between the IDS and ADS also play a role. Copyright © 2018 Elsevier B.V. All rights reserved.
Recognizing speech in a novel accent: the motor theory of speech perception reframed.
Moulin-Frier, Clément; Arbib, Michael A
2013-08-01
The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.
Learning-induced neural plasticity of speech processing before birth
Partanen, Eino; Kujala, Teija; Näätänen, Risto; Liitola, Auli; Sambeth, Anke; Huotilainen, Minna
2013-01-01
Learning, the foundation of adaptive and intelligent behavior, is based on plastic changes in neural assemblies, reflected by the modulation of electric brain responses. In infancy, auditory learning implicates the formation and strengthening of neural long-term memory traces, improving discrimination skills, in particular those forming the prerequisites for speech perception and understanding. Although previous behavioral observations show that newborns react differentially to unfamiliar sounds vs. familiar sound material that they were exposed to as fetuses, the neural basis of fetal learning has not thus far been investigated. Here we demonstrate direct neural correlates of human fetal learning of speech-like auditory stimuli. We presented variants of words to fetuses; unlike infants with no exposure to these stimuli, the exposed fetuses showed enhanced brain activity (mismatch responses) in response to pitch changes for the trained variants after birth. Furthermore, a significant correlation existed between the amount of prenatal exposure and brain activity, with greater activity being associated with a higher amount of prenatal speech exposure. Moreover, the learning effect was generalized to other types of similar speech sounds not included in the training material. Consequently, our results indicate neural commitment specifically tuned to the speech features heard before birth and their memory representations. PMID:23980148
ERIC Educational Resources Information Center
Fritz-Ocock, Amy
2016-01-01
The role of the school speech-language pathologist (SLP) has recently evolved to reflect national trends of educational reform. In an era of accountability for all student learning, Response to Intervention (RTI) has become the predominant vehicle for providing preventative, intensified instruction to students at risk. School SLPs in Indiana have…
Stochastic Online Learning in Dynamic Networks under Unknown Models
2016-08-02
Repeated Game with Incomplete Information, IEEE International Conference on Acoustics, Speech, and Signal Processing. 20-MAR-16, Shanghai, China...in a game theoretic framework for the application of multi-seller dynamic pricing with unknown demand models. We formulated the problem as an...infinitely repeated game with incomplete information and developed a dynamic pricing strategy referred to as Competitive and Cooperative Demand Learning
ERIC Educational Resources Information Center
Hula, Shannon N. Austermann; Robin, Donald A.; Maas, Edwin; Ballard, Kirrie J.; Schmidt, Richard A.
2008-01-01
Purpose: Two studies examined speech skill learning in persons with apraxia of speech (AOS). Motor-learning research shows that delaying or reducing the frequency of feedback promotes retention and transfer of skills. By contrast, immediate or frequent feedback promotes temporary performance enhancement but interferes with retention and transfer.…
Cracking the Language Code: Neural Mechanisms Underlying Speech Parsing
McNealy, Kristin; Mazziotta, John C.; Dapretto, Mirella
2013-01-01
Word segmentation, detecting word boundaries in continuous speech, is a critical aspect of language learning. Previous research in infants and adults demonstrated that a stream of speech can be readily segmented based solely on the statistical and speech cues afforded by the input. Using functional magnetic resonance imaging (fMRI), the neural substrate of word segmentation was examined on-line as participants listened to three streams of concatenated syllables, containing either statistical regularities alone, statistical regularities and speech cues, or no cues. Despite the participants’ inability to explicitly detect differences between the speech streams, neural activity differed significantly across conditions, with left-lateralized signal increases in temporal cortices observed only when participants listened to streams containing statistical regularities, particularly the stream containing speech cues. In a second fMRI study, designed to verify that word segmentation had implicitly taken place, participants listened to trisyllabic combinations that occurred with different frequencies in the streams of speech they just heard (“words,” 45 times; “partwords,” 15 times; “nonwords,” once). Reliably greater activity in left inferior and middle frontal gyri was observed when comparing words with partwords and, to a lesser extent, when comparing partwords with nonwords. Activity in these regions, taken to index the implicit detection of word boundaries, was positively correlated with participants’ rapid auditory processing skills. These findings provide a neural signature of on-line word segmentation in the mature brain and an initial model with which to study developmental changes in the neural architecture involved in processing speech cues during language learning. PMID:16855090
Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi
2017-10-01
Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.
Classroom Acoustics: Understanding Barriers to Learning.
ERIC Educational Resources Information Center
Crandell, Carl C., Ed.; Smaldino, Joseph J., Ed.
2001-01-01
This booklet explores classroom acoustics and their importance on the learning potential of children with hearing loss and related disabilities. The booklet also reviews research on classroom acoustics and the need for the development of classroom acoustics standards. Chapters examine: 1) a speech-perception model demonstrating the linkage between…
Sleep restores loss of generalized but not rote learning of synthetic speech.
Fenn, Kimberly M; Margoliash, Daniel; Nusbaum, Howard C
2013-09-01
Sleep-dependent consolidation has been demonstrated for declarative and procedural memory but few theories of consolidation distinguish between rote and generalized learning, suggesting similar consolidation should occur for both. However, studies using rote and generalized learning have suggested different patterns of consolidation may occur, although different tasks have been used across studies. Here we directly compared consolidation of rote and generalized learning using a single speech identification task. Training on a large set of novel stimuli resulted in substantial generalized learning, and sleep restored performance that had degraded after 12 waking hours. Training on a small set of repeated stimuli primarily resulted in rote learning and performance also degraded after 12 waking hours but was not restored by sleep. Moreover performance was significantly worse 24-h after rote training. Our results suggest a functional dissociation between the mechanisms of consolidation for rote and generalized learning which has broad implications for memory models. Copyright © 2013 Elsevier B.V. All rights reserved.
Eberhardt, Silvio P; Auer, Edward T; Bernstein, Lynne E
2014-01-01
In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee's primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee's lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).
Eberhardt, Silvio P.; Auer Jr., Edward T.; Bernstein, Lynne E.
2014-01-01
In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee’s primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee’s lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT). PMID:25400566
Characteristics of speaking style and implications for speech recognition.
Shinozaki, Takahiro; Ostendorf, Mari; Atlas, Les
2009-09-01
Differences in speaking style are associated with more or less spectral variability, as well as different modulation characteristics. The greater variation in some styles (e.g., spontaneous speech and infant-directed speech) poses challenges for recognition but possibly also opportunities for learning more robust models, as evidenced by prior work and motivated by child language acquisition studies. In order to investigate this possibility, this work proposes a new method for characterizing speaking style (the modulation spectrum), examines spontaneous, read, adult-directed, and infant-directed styles in this space, and conducts pilot experiments in style detection and sampling for improved speech recognizer training. Speaking style classification is improved by using the modulation spectrum in combination with standard pitch and energy variation. Speech recognition experiments on a small vocabulary conversational speech recognition task show that sampling methods for training with a small amount of data benefit from the new features.
Dopamine regulation of human speech and bird song: A critical review
Simonyan, Kristina; Horwitz, Barry; Jarvis, Erich D.
2012-01-01
To understand the neural basis of human speech control, extensive research has been done using a variety of methodologies in a range of experimental models. Nevertheless, several critical questions about learned vocal motor control still remain open. One of them is the mechanism(s) by which neurotransmitters, such as dopamine, modulate speech and song production. In this review, we bring together the two fields of investigations of dopamine action on voice control in humans and songbirds, who share similar behavioral and neural mechanisms for speech and song production. While human studies investigating the role of dopamine in speech control are limited to reports in neurological patients, research on dopaminergic modulation of bird song control has recently expanded our views on how this system might be organized. We discuss the parallels between bird song and human speech from the perspective of dopaminergic control as well as outline important differences between these species. PMID:22284300
Cognitive Bias for Learning Speech Sounds From a Continuous Signal Space Seems Nonlinguistic.
van der Ham, Sabine; de Boer, Bart
2015-10-01
When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults' generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants' reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories.
Cognitive Bias for Learning Speech Sounds From a Continuous Signal Space Seems Nonlinguistic
de Boer, Bart
2015-01-01
When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults’ generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants’ reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories. PMID:27648212
Vandervert, Larry
2017-01-01
Mathematicians and scientists have struggled to adequately describe the ultimate foundations of mathematics. Nobel laureates Albert Einstein and Eugene Wigner were perplexed by this issue, with Wigner concluding that the workability of mathematics in the real world is a mystery we cannot explain. In response to this classic enigma, the major purpose of this article is to provide a theoretical model of the ultimate origin of mathematics and "number sense" (as defined by S. Dehaene) that is proposed to involve the learning of inverse dynamics models through the collaboration of the cerebellum and the cerebral cortex (but prominently cerebellum-driven). This model is based upon (1) the modern definition of mathematics as the "science of patterns," (2) cerebellar sequence (pattern) detection, and (3) findings that the manipulation of numbers is automated in the cerebellum. This cerebro-cerebellar approach does not necessarily conflict with mathematics or number sense models that focus on brain functions associated with especially the intraparietal sulcus region of the cerebral cortex. A direct corollary purpose of this article is to offer a cerebellar inner speech explanation for difficulty in developing "number sense" in developmental dyscalculia. It is argued that during infancy the cerebellum learns (1) a first tier of internal models for a primitive physics that constitutes the foundations of visual-spatial working memory, and (2) a second (and more abstract) tier of internal models based on (1) that learns "number" and relationships among dimensions across the primitive physics of the first tier. Within this context it is further argued that difficulty in the early development of the second tier of abstraction (and "number sense") is based on the more demanding attentional requirements imposed on cerebellar inner speech executive control during the learning of cerebellar inverse dynamics models. Finally, it is argued that finger counting improves (does not originate) "number sense" by extending focus of attention in executive control of silent cerebellar inner speech. It is suggested that (1) the origin of mathematics has historically been an enigma only because it is learned below the level of conscious awareness in cerebellar internal models, (2) understandings of the development of "number sense" and developmental dyscalculia can be advanced by first understanding the ultimate foundations of number and mathematics do not simply originate in the cerebral cortex, but rather in cerebro-cerebellar collaboration (predominately driven by the cerebellum). It is concluded that difficulty with "number sense" results from the extended demands on executive control in learning inverse dynamics models associated with cerebellar inner speech related to the second tier of abstraction (numbers) of the infant's primitive physics.
Learning Disabilities at Twenty-Five: The Early Adulthood of a Maturing Concept.
ERIC Educational Resources Information Center
Levine, Melvin D.
1989-01-01
The keynote speech identifies six categories of problem areas for children with learning disabilities: (1) synchronization, (2) consistency, (3) methodology, (4) cohesion, (5) saliency determination, and (6) tempo. A model of neurodevelopmental functions and performance elements to guide researchers and practitioners is offered. (DB)
Does Verb Bias Modulate Syntactic Priming?
ERIC Educational Resources Information Center
Bernolet, Sarah; Hartsuiker, Robert J.
2010-01-01
In a corpus analysis of spontaneous speech Jaeger and Snider (2007) found that the strength of structural priming is correlated with verb alternation bias. This finding is consistent with an implicit learning account of syntactic priming: because the implicit learning model implemented by Chang (2002), Chang, Dell, and Bock (2006), and Chang,…
Mispronunciation Detection for Language Learning and Speech Recognition Adaptation
ERIC Educational Resources Information Center
Ge, Zhenhao
2013-01-01
The areas of "mispronunciation detection" (or "accent detection" more specifically) within the speech recognition community are receiving increased attention now. Two application areas, namely language learning and speech recognition adaptation, are largely driving this research interest and are the focal points of this work.…
ERIC Educational Resources Information Center
Silvén, Maarit; Voeten, Marinus; Kouvo, Anna; Lundén, Maija
2014-01-01
Growth modeling was applied to monolingual (N = 26) and bilingual (N = 28) word learning from 14 to 36 months. Level and growth rate of vocabulary were lower for Finnish-Russian bilinguals than for Finnish monolinguals. Processing of Finnish speech sounds at 7 but not at 11 months predicted level, but not growth rate of vocabulary in both Finnish…
Perceptual learning of degraded speech by minimizing prediction error.
Sohoglu, Ediz; Davis, Matthew H
2016-03-22
Human perception is shaped by past experience on multiple timescales. Sudden and dramatic changes in perception occur when prior knowledge or expectations match stimulus content. These immediate effects contrast with the longer-term, more gradual improvements that are characteristic of perceptual learning. Despite extensive investigation of these two experience-dependent phenomena, there is considerable debate about whether they result from common or dissociable neural mechanisms. Here we test single- and dual-mechanism accounts of experience-dependent changes in perception using concurrent magnetoencephalographic and EEG recordings of neural responses evoked by degraded speech. When speech clarity was enhanced by prior knowledge obtained from matching text, we observed reduced neural activity in a peri-auditory region of the superior temporal gyrus (STG). Critically, longer-term improvements in the accuracy of speech recognition following perceptual learning resulted in reduced activity in a nearly identical STG region. Moreover, short-term neural changes caused by prior knowledge and longer-term neural changes arising from perceptual learning were correlated across subjects with the magnitude of learning-induced changes in recognition accuracy. These experience-dependent effects on neural processing could be dissociated from the neural effect of hearing physically clearer speech, which similarly enhanced perception but increased rather than decreased STG responses. Hence, the observed neural effects of prior knowledge and perceptual learning cannot be attributed to epiphenomenal changes in listening effort that accompany enhanced perception. Instead, our results support a predictive coding account of speech perception; computational simulations show how a single mechanism, minimization of prediction error, can drive immediate perceptual effects of prior knowledge and longer-term perceptual learning of degraded speech.
Perceptual learning of degraded speech by minimizing prediction error
Sohoglu, Ediz
2016-01-01
Human perception is shaped by past experience on multiple timescales. Sudden and dramatic changes in perception occur when prior knowledge or expectations match stimulus content. These immediate effects contrast with the longer-term, more gradual improvements that are characteristic of perceptual learning. Despite extensive investigation of these two experience-dependent phenomena, there is considerable debate about whether they result from common or dissociable neural mechanisms. Here we test single- and dual-mechanism accounts of experience-dependent changes in perception using concurrent magnetoencephalographic and EEG recordings of neural responses evoked by degraded speech. When speech clarity was enhanced by prior knowledge obtained from matching text, we observed reduced neural activity in a peri-auditory region of the superior temporal gyrus (STG). Critically, longer-term improvements in the accuracy of speech recognition following perceptual learning resulted in reduced activity in a nearly identical STG region. Moreover, short-term neural changes caused by prior knowledge and longer-term neural changes arising from perceptual learning were correlated across subjects with the magnitude of learning-induced changes in recognition accuracy. These experience-dependent effects on neural processing could be dissociated from the neural effect of hearing physically clearer speech, which similarly enhanced perception but increased rather than decreased STG responses. Hence, the observed neural effects of prior knowledge and perceptual learning cannot be attributed to epiphenomenal changes in listening effort that accompany enhanced perception. Instead, our results support a predictive coding account of speech perception; computational simulations show how a single mechanism, minimization of prediction error, can drive immediate perceptual effects of prior knowledge and longer-term perceptual learning of degraded speech. PMID:26957596
Rapid Learning of Syllable Classes from a Perceptually Continuous Speech Stream
ERIC Educational Resources Information Center
Endress, Ansgar D.; Bonatti, Luca L.
2007-01-01
To learn a language, speakers must learn its words and rules from fluent speech; in particular, they must learn dependencies among linguistic classes. We show that when familiarized with a short artificial, subliminally bracketed stream, participants can learn relations about the structure of its words, which specify the classes of syllables…
Accurate visible speech synthesis based on concatenating variable length motion capture data.
Ma, Jiyong; Cole, Ron; Pellom, Bryan; Ward, Wayne; Wise, Barbara
2006-01-01
We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is desrcribed. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergarten through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.
Communicative signals support abstract rule learning by 7-month-old infants
Ferguson, Brock; Lew-Williams, Casey
2016-01-01
The mechanisms underlying the discovery of abstract rules like those found in natural language may be evolutionarily tuned to speech, according to previous research. When infants hear speech sounds, they can learn rules that govern their combination, but when they hear non-speech sounds such as sine-wave tones, they fail to do so. Here we show that infants’ rule learning is not tied to speech per se, but is instead enhanced more broadly by communicative signals. In two experiments, infants succeeded in learning and generalizing rules from tones that were introduced as if they could be used to communicate. In two control experiments, infants failed to learn the very same rules when familiarized to tones outside of a communicative exchange. These results reveal that infants’ attention to social agents and communication catalyzes a fundamental achievement of human learning. PMID:27150270
The Cross-Cultural Study of Language Acquisition.
ERIC Educational Resources Information Center
Heath, Shirley Brice
1985-01-01
One approach to studying the nature of diverse speech exchange systems across sociocultural groups starts from the premise that all learning is cultural learning, and that language socialization is the way individuals become members of both their primary speech community and their secondary speech communities. Researchers must recognize that the…
Yuen, Kevin C P
2014-01-01
One of the recent developments in the education of speech-language pathology is to include literacy disorders and learning disabilities as key training components in the training curriculum. Disorders in reading and writing are interwoven with disorders in speaking and listening, which should be managed holistically, particularly in children and adolescents. With extensive training in clinical linguistics, language disorders, and other theoretical knowledge and clinical skills, speech-language pathologists (SLPs) are the best equipped and most competent professionals to screen, identify, diagnose, and manage individuals with literacy disorders. To tackle the challenges of and the huge demand for services in literacy as well as language and learning disorders, the Hong Kong Institute of Education has recently developed the Master of Science Programme in Educational Speech-Language Pathology and Learning Disabilities, which is one of the very first speech-language pathology training programmes in Asia to blend training components of learning disabilities, literacy disorders, and social-emotional-behavioural-developmental disabilities into a developmentally and medically oriented speech-language pathology training programme. This new training programme aims to prepare a new generation of SLPs to be able to offer comprehensive support to individuals with speech, language, literacy, learning, communication, and swallowing disorders of different developmental or neurogenic origins, particularly to infants and adolescents as well as to their family and educational team. © 2015 S. Karger AG, Basel.
Using the structure of natural scenes and sounds to predict neural response properties in the brain
NASA Astrophysics Data System (ADS)
Deweese, Michael
2014-03-01
The natural scenes and sounds we encounter in the world are highly structured. The fact that animals and humans are so efficient at processing these sensory signals compared with the latest algorithms running on the fastest modern computers suggests that our brains can exploit this structure. We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogra representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus (MGBv) and primary auditory cortex (A1), and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds. We have also developed a biologically-inspired neural network model of primary visual cortex (V1) that can learn a sparse representation of natural scenes using spiking neurons and strictly local plasticity rules. The representation learned by our model is in good agreement with measured receptive fields in V1, demonstrating that sparse sensory coding can be achieved in a realistic biological setting.
Whitfield, Jason A; Goberman, Alexander M
2017-06-22
Everyday communication is carried out concurrently with other tasks. Therefore, determining how dual tasks interfere with newly learned speech motor skills can offer insight into the cognitive mechanisms underlying speech motor learning in Parkinson disease (PD). The current investigation examines a recently learned speech motor sequence under dual-task conditions. A previously learned sequence of 6 monosyllabic nonwords was examined using a dual-task paradigm. Participants repeated the sequence while concurrently performing a visuomotor task, and performance on both tasks was measured in single- and dual-task conditions. The younger adult group exhibited little to no dual-task interference on the accuracy and duration of the sequence. The older adult group exhibited variability in dual-task costs, with the group as a whole exhibiting an intermediate, though significant, amount of dual-task interference. The PD group exhibited the largest degree of bidirectional dual-task interference among all the groups. These data suggest that PD affects the later stages of speech motor learning, as the dual-task condition interfered with production of the recently learned sequence beyond the effect of normal aging. Because the basal ganglia is critical for the later stages of motor sequence learning, the observed deficits may result from the underlying neural dysfunction associated with PD.
Berk, L E; Landau, S
1993-04-01
Learning disabled (LD) children are often targets for cognitive-behavioral interventions designed to train them in effective use of a self-directed speech. The purpose of this study was to determine if, indeed, these children display immature private speech in the naturalistic classroom setting. Comparisons were made of the private speech, motor accompaniment to task, and attention of LD and normally achieving classmates during academic seatwork. Setting effects were examined by comparing classroom data with observations during academic seatwork and puzzle solving in the laboratory. Finally, a subgroup of LD children symptomatic of attention-deficit hyperactivity disorder (ADHD) was compared with pure LD and normally achieving controls to determine if the presumed immature private speech is a function of a learning disability or externalizing behavior problems. Results indicated that LD children used more task-relevant private speech than controls, an effect that was especially pronounced for the LD/ADHD subgroup. Use of private speech was setting- and task-specific. Implications for intervention and future research methodology are discussed.
Learning in Complex Environments: The Effects of Background Speech on Early Word Learning
ERIC Educational Resources Information Center
McMillan, Brianna T. M.; Saffran, Jenny R.
2016-01-01
Although most studies of language learning take place in quiet laboratory settings, everyday language learning occurs under noisy conditions. The current research investigated the effects of background speech on word learning. Both younger (22- to 24-month-olds; n = 40) and older (28- to 30-month-olds; n = 40) toddlers successfully learned novel…
Warker, Jill A.
2013-01-01
Adults can rapidly learn artificial phonotactic constraints such as /f/ only occurs at the beginning of syllables by producing syllables that contain those constraints. This implicit learning is then reflected in their speech errors. However, second-order constraints in which the placement of a phoneme depends on another characteristic of the syllable (e.g., if the vowel is /æ/, /f/ occurs at the beginning of syllables and /s/ occurs at the end of syllables but if the vowel is /I/, the reverse is true) require a longer learning period. Two experiments question the transience of second-order learning and whether consolidation plays a role in learning phonological dependencies. Using speech errors as a measure of learning, Experiment 1 investigated the durability of learning, and Experiment 2 investigated the time-course of learning. Experiment 1 found that learning is still present in speech errors a week later. Experiment 2 looked at whether more time in the form of a consolidation period or more experience in the form of more trials was necessary for learning to be revealed in speech errors. Both consolidation and more trials led to learning; however, consolidation provided a more substantial benefit. PMID:22686839
Sleep, Off-Line Processing, and Vocal Learning
ERIC Educational Resources Information Center
Margoliash, Daniel; Schmidt, Marc F.
2010-01-01
The study of song learning and the neural song system has provided an important comparative model system for the study of speech and language acquisition. We describe some recent advances in the bird song system, focusing on the role of off-line processing including sleep in processing sensory information and in guiding developmental song…
2010-08-01
they begin or end an instance. The learned model uses part-of-speech features to identify typical music group names (e.g., The Beatles , The Ramones...APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. STINFO COPY The views and...conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either
Distance Learning: Effectiveness of an Interdisciplinary Course in Speech Pathology and Dentistry
ERIC Educational Resources Information Center
Ramos, Janine Santos; da Silva, Letícia Korb; Pinzan, Arnaldo; de Castro Rodrigues, Antonio; Berretin-Felix, Giédre
2015-01-01
Objective: Evaluate the effectiveness of distance learning courses for the purpose of interdisciplinary continuing education in Speech Pathology and Dentistry. Methods: The online course was made available on the Moodle platform. A total of 30 undergraduates participated in the study (15 from the Dentistry course and 15 from the Speech Pathology…
ERIC Educational Resources Information Center
Altvater-Mackensen, Nicole; Grossmann, Tobias
2015-01-01
Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…
Hegemonic Speech Genres of Classrooms and Their Importance for RE Learning
ERIC Educational Resources Information Center
Osbeck, C.; Lied, S.
2012-01-01
The "purpose" of this article is to illustrate how potential learning is related to hegemonic speech genres. This we do through examples from two religious education (RE) classrooms, one Norwegian and one Swedish. By presenting important dimensions of speech genres used in RE classrooms, the article also contributes to developing theory.…
Repeated Speech Errors: Evidence for Learning
ERIC Educational Resources Information Center
Humphreys, Karin R.; Menzies, Heather; Lake, Johanna K.
2010-01-01
Three experiments elicited phonological speech errors using the SLIP procedure to investigate whether there is a tendency for speech errors on specific words to reoccur, and whether this effect can be attributed to implicit learning of an incorrect mapping from lemma to phonology for that word. In Experiment 1, when speakers made a phonological…
Articulatory Control in Childhood Apraxia of Speech in a Novel Word-Learning Task
ERIC Educational Resources Information Center
Case, Julie; Grigos, Maria I.
2016-01-01
Purpose: Articulatory control and speech production accuracy were examined in children with childhood apraxia of speech (CAS) and typically developing (TD) controls within a novel word-learning task to better understand the influence of planning and programming deficits in the production of unfamiliar words. Method: Participants included 16…
Discourse Analysis and Language Learning [Summary of a Symposium].
ERIC Educational Resources Information Center
Hatch, Evelyn
1981-01-01
A symposium on discourse analysis and language learning is summarized. Discourse analysis can be divided into six fields of research: syntax, the amount of syntactic organization required for different types of discourse, large speech events, intra-sentential cohesion in text, speech acts, and unequal power discourse. Research on speech events and…
Schall, Sonja; von Kriegstein, Katharina
2014-01-01
It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers' voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker's face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Widening the lens: what the manual modality reveals about language, learning and cognition.
Goldin-Meadow, Susan
2014-09-19
The goal of this paper is to widen the lens on language to include the manual modality. We look first at hearing children who are acquiring language from a spoken language model and find that even before they use speech to communicate, they use gesture. Moreover, those gestures precede, and predict, the acquisition of structures in speech. We look next at deaf children whose hearing losses prevent them from using the oral modality, and whose hearing parents have not presented them with a language model in the manual modality. These children fall back on the manual modality to communicate and use gestures, which take on many of the forms and functions of natural language. These homemade gesture systems constitute the first step in the emergence of manual sign systems that are shared within deaf communities and are full-fledged languages. We end by widening the lens on sign language to include gesture and find that signers not only gesture, but they also use gesture in learning contexts just as speakers do. These findings suggest that what is key in gesture's ability to predict learning is its ability to add a second representational format to communication, rather than a second modality. Gesture can thus be language, assuming linguistic forms and functions, when other vehicles are not available; but when speech or sign is possible, gesture works along with language, providing an additional representational format that can promote learning. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Foreign Subtitles Help but Native-Language Subtitles Harm Foreign Speech Perception
Mitterer, Holger; McQueen, James M.
2009-01-01
Understanding foreign speech is difficult, in part because of unusual mappings between sounds and words. It is known that listeners in their native language can use lexical knowledge (about how words ought to sound) to learn how to interpret unusual speech-sounds. We therefore investigated whether subtitles, which provide lexical information, support perceptual learning about foreign speech. Dutch participants, unfamiliar with Scottish and Australian regional accents of English, watched Scottish or Australian English videos with Dutch, English or no subtitles, and then repeated audio fragments of both accents. Repetition of novel fragments was worse after Dutch-subtitle exposure but better after English-subtitle exposure. Native-language subtitles appear to create lexical interference, but foreign-language subtitles assist speech learning by indicating which words (and hence sounds) are being spoken. PMID:19918371
Towards Artificial Speech Therapy: A Neural System for Impaired Speech Segmentation.
Iliya, Sunday; Neri, Ferrante
2016-09-01
This paper presents a neural system-based technique for segmenting short impaired speech utterances into silent, unvoiced, and voiced sections. Moreover, the proposed technique identifies those points of the (voiced) speech where the spectrum becomes steady. The resulting technique thus aims at detecting that limited section of the speech which contains the information about the potential impairment of the speech. This section is of interest to the speech therapist as it corresponds to the possibly incorrect movements of speech organs (lower lip and tongue with respect to the vocal tract). Two segmentation models to detect and identify the various sections of the disordered (impaired) speech signals have been developed and compared. The first makes use of a combination of four artificial neural networks. The second is based on a support vector machine (SVM). The SVM has been trained by means of an ad hoc nested algorithm whose outer layer is a metaheuristic while the inner layer is a convex optimization algorithm. Several metaheuristics have been tested and compared leading to the conclusion that some variants of the compact differential evolution (CDE) algorithm appears to be well-suited to address this problem. Numerical results show that the SVM model with a radial basis function is capable of effective detection of the portion of speech that is of interest to a therapist. The best performance has been achieved when the system is trained by the nested algorithm whose outer layer is hybrid-population-based/CDE. A population-based approach displays the best performance for the isolation of silence/noise sections, and the detection of unvoiced sections. On the other hand, a compact approach appears to be clearly well-suited to detect the beginning of the steady state of the voiced signal. Both the proposed segmentation models display outperformed two modern segmentation techniques based on Gaussian mixture model and deep learning.
NASA Astrophysics Data System (ADS)
Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.
1996-01-01
A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.
Word Learning in Clear and Plain Speech in Quiet and Noisy Listening Conditions
ERIC Educational Resources Information Center
Riley, Kristine Marie Grohne
2010-01-01
Previous research demonstrates enhanced speech perception abilities for typically hearing and hearing-impaired listeners when speakers use clear versus plain speech, particularly in the presence of background noise. To date, very few studies have investigated the effects of noise on word learning and no studies have examined the effects of clear…
ERIC Educational Resources Information Center
Seitz, Aaron R.; Protopapas, Athanassios; Tsushima, Yoshiaki; Vlahou, Eleni L.; Gori, Simone; Grossberg, Stephen; Watanabe, Takeo
2010-01-01
Learning a second language as an adult is particularly effortful when new phonetic representations must be formed. Therefore the processes that allow learning of speech sounds are of great theoretical and practical interest. Here we examined whether perception of single formant transitions, that is, sound components critical in speech perception,…
Brief Training with Co-Speech Gesture Lends a Hand to Word Learning in a Foreign Language
ERIC Educational Resources Information Center
Kelly, Spencer D.; McDevitt, Tara; Esch, Megan
2009-01-01
Recent research in psychology and neuroscience has demonstrated that co-speech gestures are semantically integrated with speech during language comprehension and development. The present study explored whether gestures also play a role in language learning in adults. In Experiment 1, we exposed adults to a brief training session presenting novel…
The Acquisition of Relative Clauses in Spontaneous Child Speech in Mandarin Chinese
ERIC Educational Resources Information Center
Chen, Jidong; Shirai, Yasuhiro
2015-01-01
This study investigates the developmental trajectory of relative clauses (RCs) in Mandarin-learning children's speech. We analyze the spontaneous production of RCs by four monolingual Mandarin-learning children (0;11 to 3;5) and their input from a longitudinal naturalistic speech corpus (Min, 1994). The results reveal that in terms of the…
Speech segmentation in aphasia
Peñaloza, Claudia; Benetello, Annalisa; Tuomiranta, Leena; Heikius, Ida-Maria; Järvinen, Sonja; Majos, Maria Carmen; Cardona, Pedro; Juncadella, Montserrat; Laine, Matti; Martin, Nadine; Rodríguez-Fornells, Antoni
2017-01-01
Background Speech segmentation is one of the initial and mandatory phases of language learning. Although some people with aphasia have shown a preserved ability to learn novel words, their speech segmentation abilities have not been explored. Aims We examined the ability of individuals with chronic aphasia to segment words from running speech via statistical learning. We also explored the relationships between speech segmentation and aphasia severity, and short-term memory capacity. We further examined the role of lesion location in speech segmentation and short-term memory performance. Methods & Procedures The experimental task was first validated with a group of young adults (n = 120). Participants with chronic aphasia (n = 14) were exposed to an artificial language and were evaluated in their ability to segment words using a speech segmentation test. Their performance was contrasted against chance level and compared to that of a group of elderly matched controls (n = 14) using group and case-by-case analyses. Outcomes & Results As a group, participants with aphasia were significantly above chance level in their ability to segment words from the novel language and did not significantly differ from the group of elderly controls. Speech segmentation ability in the aphasic participants was not associated with aphasia severity although it significantly correlated with word pointing span, a measure of verbal short-term memory. Case-by-case analyses identified four individuals with aphasia who performed above chance level on the speech segmentation task, all with predominantly posterior lesions and mild fluent aphasia. Their short-term memory capacity was also better preserved than in the rest of the group. Conclusions Our findings indicate that speech segmentation via statistical learning can remain functional in people with chronic aphasia and suggest that this initial language learning mechanism is associated with the functionality of the verbal short-term memory system and the integrity of the left inferior frontal region. PMID:28824218
Generalization of Perceptual Learning of Vocoded Speech
ERIC Educational Resources Information Center
Hervais-Adelman, Alexis G.; Davis, Matthew H.; Johnsrude, Ingrid S.; Taylor, Karen J.; Carlyon, Robert P.
2011-01-01
Recent work demonstrates that learning to understand noise-vocoded (NV) speech alters sublexical perceptual processes but is enhanced by the simultaneous provision of higher-level, phonological, but not lexical content (Hervais-Adelman, Davis, Johnsrude, & Carlyon, 2008), consistent with top-down learning (Davis, Johnsrude, Hervais-Adelman,…
Watch what you say, your computer might be listening: A review of automated speech recognition
NASA Technical Reports Server (NTRS)
Degennaro, Stephen V.
1991-01-01
Spoken language is the most convenient and natural means by which people interact with each other and is, therefore, a promising candidate for human-machine interactions. Speech also offers an additional channel for hands-busy applications, complementing the use of motor output channels for control. Current speech recognition systems vary considerably across a number of important characteristics, including vocabulary size, speaking mode, training requirements for new speakers, robustness to acoustic environments, and accuracy. Algorithmically, these systems range from rule-based techniques through more probabilistic or self-learning approaches such as hidden Markov modeling and neural networks. This tutorial begins with a brief summary of the relevant features of current speech recognition systems and the strengths and weaknesses of the various algorithmic approaches.
Training parents to use the natural language paradigm to increase their autistic children's speech.
Laski, K E; Charlop, M H; Schreibman, L
1988-01-01
Parents of four nonverbal and four echolalic autistic children were trained to increase their children's speech by using the Natural Language Paradigm (NLP), a loosely structured procedure conducted in a play environment with a variety of toys. Parents were initially trained to use the NLP in a clinic setting, with subsequent parent-child speech sessions occurring at home. The results indicated that following training, parents increased the frequency with which they required their children to speak (i.e., modeled words and phrases, prompted answers to questions). Correspondingly, all children increased the frequency of their verbalizations in three nontraining settings. Thus, the NLP appears to be an efficacious program for parents to learn and use in the home to increase their children's speech. PMID:3225256
Segal, Osnat; Hejli-Assi, Saja; Kishon-Rabin, Liat
2016-02-01
Infant speech discrimination can follow multiple trajectories depending on the language and the specific phonemes involved. Two understudied languages in terms of the development of infants' speech discrimination are Arabic and Hebrew. The purpose of the present study was to examine the influence of listening experience with the native language on the discrimination of the voicing contrast /ba-pa/ in Arabic-learning infants whose native language includes only the phoneme /b/ and in Hebrew-learning infants whose native language includes both phonemes. 128 Arabic-learning infants and Hebrew-learning infants, 4-to-6 and 10-to-12-month-old infants, were tested with the Visual Habituation Procedure. The results showed that 4-to-6-month-old infants discriminated between /ba-pa/ regardless of their native language and order of presentation. However, only 10-to-12-month-old infants learning Hebrew retained this ability. 10-to-12-month-old infants learning Arabic did not discriminate the change from /ba/ to /pa/ but showed a tendency for discriminating the change from /pa/ to /ba/. This is the first study to report on the reduced discrimination of /ba-pa/ in older infants learning Arabic. Our findings are consistent with the notion that experience with the native language changes discrimination abilities and alters sensitivity to non-native contrasts, thus providing evidence for 'top-down' processing in young infants. The directional asymmetry in older infants learning Arabic can be explained by assimilation of the non-native consonant /p/ to the native Arabic category /b/ as predicted by current speech perception models. Copyright © 2015 Elsevier Inc. All rights reserved.
The Affordance of Speech Recognition Technology for EFL Learning in an Elementary School Setting
ERIC Educational Resources Information Center
Liaw, Meei-Ling
2014-01-01
This study examined the use of speech recognition (SR) technology to support a group of elementary school children's learning of English as a foreign language (EFL). SR technology has been used in various language learning contexts. Its application to EFL teaching and learning is still relatively recent, but a solid understanding of its…
Songs as an aid for language acquisition.
Schön, Daniele; Boyer, Maud; Moreno, Sylvain; Besson, Mireille; Peretz, Isabelle; Kolinsky, Régine
2008-02-01
In previous research, Saffran and colleagues [Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926-1928; Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of distributional cues. Journal of Memory and Language, 35, 606-621.] have shown that adults and infants can use the statistical properties of syllable sequences to extract words from continuous speech. They also showed that a similar learning mechanism operates with musical stimuli [Saffran, J. R., Johnson, R. E. K., Aslin, N., & Newport, E. L. (1999). Abstract Statistical learning of tone sequences by human infants and adults. Cognition, 70, 27-52.]. In this work we combined linguistic and musical information and we compared language learning based on speech sequences to language learning based on sung sequences. We hypothesized that, compared to speech sequences, a consistent mapping of linguistic and musical information would enhance learning. Results confirmed the hypothesis showing a strong learning facilitation of song compared to speech. Most importantly, the present results show that learning a new language, especially in the first learning phase wherein one needs to segment new words, may largely benefit of the motivational and structuring properties of music in song.
Schall, Sonja; von Kriegstein, Katharina
2014-01-01
It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas. PMID:24466026
Studer-Eichenberger, Esther; Studer-Eichenberger, Felix; Koenig, Thomas
2016-01-01
The objectives of the present study were to investigate temporal/spectral sound-feature processing in preschool children (4 to 7 years old) with peripheral hearing loss compared with age-matched controls. The results verified the presence of statistical learning, which was diminished in children with hearing impairments (HIs), and elucidated possible perceptual mediators of speech production. Perception and production of the syllables /ba/, /da/, /ta/, and /na/ were recorded in 13 children with normal hearing and 13 children with HI. Perception was assessed physiologically through event-related potentials (ERPs) recorded by EEG in a multifeature mismatch negativity paradigm and behaviorally through a discrimination task. Temporal and spectral features of the ERPs during speech perception were analyzed, and speech production was quantitatively evaluated using speech motor maximum performance tasks. Proximal to stimulus onset, children with HI displayed a difference in map topography, indicating diminished statistical learning. In later ERP components, children with HI exhibited reduced amplitudes in the N2 and early parts of the late disciminative negativity components specifically, which are associated with temporal and spectral control mechanisms. Abnormalities of speech perception were only subtly reflected in speech production, as the lone difference found in speech production studies was a mild delay in regulating speech intensity. In addition to previously reported deficits of sound-feature discriminations, the present study results reflect diminished statistical learning in children with HI, which plays an early and important, but so far neglected, role in phonological processing. Furthermore, the lack of corresponding behavioral abnormalities in speech production implies that impaired perceptual capacities do not necessarily translate into productive deficits.
Going Covert: Inner and Private Speech in Language Learning
ERIC Educational Resources Information Center
de Guerrero, María C. M.
2018-01-01
Roughly 30 years ago researchers in the second language acquisition (SLA) field started to take a focused interest in the study of inner speech (IS) and private speech (PS) processes in second language (L2) learning and use. The purpose of this review is to assess the status of current research and the progress made during the last ten years on…
ERIC Educational Resources Information Center
Boucher, Victor J.
2006-01-01
Language learning requires a capacity to recall novel series of speech sounds. Research shows that prosodic marks create grouping effects enhancing serial recall. However, any restriction on memory affecting the reproduction of prosody would limit the set of patterns that could be learned and subsequently used in speech. By implication, grouping…
Rosemann, Stephanie; Gießing, Carsten; Özyurt, Jale; Carroll, Rebecca; Puschmann, Sebastian; Thiel, Christiane M.
2017-01-01
Noise-vocoded speech is commonly used to simulate the sensation after cochlear implantation as it consists of spectrally degraded speech. High individual variability exists in learning to understand both noise-vocoded speech and speech perceived through a cochlear implant (CI). This variability is partly ascribed to differing cognitive abilities like working memory, verbal skills or attention. Although clinically highly relevant, up to now, no consensus has been achieved about which cognitive factors exactly predict the intelligibility of speech in noise-vocoded situations in healthy subjects or in patients after cochlear implantation. We aimed to establish a test battery that can be used to predict speech understanding in patients prior to receiving a CI. Young and old healthy listeners completed a noise-vocoded speech test in addition to cognitive tests tapping on verbal memory, working memory, lexicon and retrieval skills as well as cognitive flexibility and attention. Partial-least-squares analysis revealed that six variables were important to significantly predict vocoded-speech performance. These were the ability to perceive visually degraded speech tested by the Text Reception Threshold, vocabulary size assessed with the Multiple Choice Word Test, working memory gauged with the Operation Span Test, verbal learning and recall of the Verbal Learning and Retention Test and task switching abilities tested by the Comprehensive Trail-Making Test. Thus, these cognitive abilities explain individual differences in noise-vocoded speech understanding and should be considered when aiming to predict hearing-aid outcome. PMID:28638329
Integrating Articulatory Constraints into Models of Second Language Phonological Acquisition
ERIC Educational Resources Information Center
Colantoni, Laura; Steele, Jeffrey
2008-01-01
Models such as Eckman's markedness differential hypothesis, Flege's speech learning model, and Brown's feature-based theory of perception seek to explain and predict the relative difficulty second language (L2) learners face when acquiring new or similar sounds. In this paper, we test their predictive adequacy as concerns native English speakers'…
2016-09-12
on learning word triggers with an attention mechanism. 5.1. Model descriptions 5.1.1. Attention layer A minimalistic attention mechanism can be...word trigger model based on a minimalistic attention mechanism: it already showed some interesting qualitative re- sults, while the performance was not
ERIC Educational Resources Information Center
Scheflen, Sarah Clifford; Freeman, Stephanny F. N.; Paparella, Tanya
2012-01-01
Four children with autism were taught play skills through the use of video modeling. Video instruction was used to model play and appropriate language through a developmental sequence of play levels integrated with language techniques. Results showed that children with autism could successfully use video modeling to learn how to play appropriately…
A role for the developing lexicon in phonetic category acquisition
Feldman, Naomi H.; Griffiths, Thomas L.; Goldwater, Sharon; Morgan, James L.
2013-01-01
Infants segment words from fluent speech during the same period when they are learning phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which sounds appear. We use a Bayesian model to illustrate how feedback from segmented words might constrain phonetic category learning by providing information about which sounds occur together in words. Simulations demonstrate that word-level information can successfully disambiguate overlapping English vowel categories. Learning patterns in the model are shown to parallel human behavior from artificial language learning tasks. These findings point to a central role for the developing lexicon in phonetic category acquisition and provide a framework for incorporating top-down constraints into models of category learning. PMID:24219848
Learning to perceive and recognize a second language: the L2LP model revised.
van Leussen, Jan-Willem; Escudero, Paola
2015-01-01
We present a test of a revised version of the Second Language Linguistic Perception (L2LP) model, a computational model of the acquisition of second language (L2) speech perception and recognition. The model draws on phonetic, phonological, and psycholinguistic constructs to explain a number of L2 learning scenarios. However, a recent computational implementation failed to validate a theoretical proposal for a learning scenario where the L2 has less phonemic categories than the native language (L1) along a given acoustic continuum. According to the L2LP, learners faced with this learning scenario must not only shift their old L1 phoneme boundaries but also reduce the number of categories employed in perception. Our proposed revision to L2LP successfully accounts for this updating in the number of perceptual categories as a process driven by the meaning of lexical items, rather than by the learners' awareness of the number and type of phonemes that are relevant in their new language, as the previous version of L2LP assumed. Results of our simulations show that meaning-driven learning correctly predicts the developmental path of L2 phoneme perception seen in empirical studies. Additionally, and to contribute to a long-standing debate in psycholinguistics, we test two versions of the model, with the stages of phonemic perception and lexical recognition being either sequential or interactive. Both versions succeed in learning to recognize minimal pairs in the new L2, but make diverging predictions on learners' resulting phonological representations. In sum, the proposed revision to the L2LP model contributes to our understanding of L2 acquisition, with implications for speech processing in general.
Drinking Songs: Alcohol Effects on Learned Song of Zebra Finches
Olson, Christopher R.; Owen, Devin C.; Ryabinin, Andrey E.; Mello, Claudio V.
2014-01-01
Speech impairment is one of the most intriguing and least understood effects of alcohol on cognitive function, largely due to the lack of data on alcohol effects on vocalizations in the context of an appropriate experimental model organism. Zebra finches, a representative songbird and a premier model for understanding the neurobiology of vocal production and learning, learn song in a manner analogous to how humans learn speech. Here we show that when allowed access, finches readily drink alcohol, increase their blood ethanol concentrations (BEC) significantly, and sing a song with altered acoustic structure. The most pronounced effects were decreased amplitude and increased entropy, the latter likely reflecting a disruption in the birds’ ability to maintain the spectral structure of song under alcohol. Furthermore, specific syllables, which have distinct acoustic structures, were differentially influenced by alcohol, likely reflecting a diversity in the neural mechanisms required for their production. Remarkably, these effects on vocalizations occurred without overt effects on general behavioral measures, and importantly, they occurred within a range of BEC that can be considered risky for humans. Our results suggest that the variable effects of alcohol on finch song reflect differential alcohol sensitivity of the brain circuitry elements that control different aspects of song production. They also point to finches as an informative model for understanding how alcohol affects the neuronal circuits that control the production of learned motor behaviors. PMID:25536524
Speech-Enabled Tools for Augmented Interaction in E-Learning Applications
ERIC Educational Resources Information Center
Selouani, Sid-Ahmed A.; Lê, Tang-Hô; Benahmed, Yacine; O'Shaughnessy, Douglas
2008-01-01
This article presents systems that use speech technology, to emulate the one-on-one interaction a student can get from a virtual instructor. A web-based learning tool, the Learn IN Context (LINC+) system, designed and used in a real mixed-mode learning context for a computer (C++ language) programming course taught at the Université de Moncton…
Orthographic Learning and the Role of Text-to-Speech Software in Dutch Disabled Readers
ERIC Educational Resources Information Center
Staels, Eva; Van den Broeck, Wim
2015-01-01
In this study, we examined whether orthographic learning can be demonstrated in disabled readers learning to read in a transparent orthography (Dutch). In addition, we tested the effect of the use of text-to-speech software, a new form of direct instruction, on orthographic learning. Both research goals were investigated by replicating Share's…
Anderson, Nathaniel D; Dell, Gary S
2018-04-03
Speakers implicitly learn novel phonotactic patterns by producing strings of syllables. The learning is revealed in their speech errors. First-order patterns, such as "/f/ must be a syllable onset," can be distinguished from contingent, or second-order, patterns, such as "/f/ must be an onset if the vowel is /a/, but a coda if the vowel is /o/." A metaanalysis of 19 experiments clearly demonstrated that first-order patterns affect speech errors to a very great extent in a single experimental session, but second-order vowel-contingent patterns only affect errors on the second day of testing, suggesting the need for a consolidation period. Two experiments tested an analogue to these studies involving sequences of button pushes, with fingers as "consonants" and thumbs as "vowels." The button-push errors revealed two of the key speech-error findings: first-order patterns are learned quickly, but second-order thumb-contingent patterns are only strongly revealed in the errors on the second day of testing. The influence of computational complexity on the implicit learning of phonotactic patterns in speech production may be a general feature of sequence production.
A lab-controlled simulation of a letter-speech sound binding deficit in dyslexia.
Aravena, Sebastián; Snellings, Patrick; Tijms, Jurgen; van der Molen, Maurits W
2013-08-01
Dyslexic and non-dyslexic readers engaged in a short training aimed at learning eight basic letter-speech sound correspondences within an artificial orthography. We examined whether a letter-speech sound binding deficit is behaviorally detectable within the initial steps of learning a novel script. Both letter knowledge and word reading ability within the artificial script were assessed. An additional goal was to investigate the influence of instructional approach on the initial learning of letter-speech sound correspondences. We assigned children from both groups to one of three different training conditions: (a) explicit instruction, (b) implicit associative learning within a computer game environment, or (c) a combination of (a) and (b) in which explicit instruction is followed by implicit learning. Our results indicated that dyslexics were outperformed by the controls on a time-pressured binding task and a word reading task within the artificial orthography, providing empirical support for the view that a letter-speech sound binding deficit is a key factor in dyslexia. A combination of explicit instruction and implicit techniques proved to be a more powerful tool in the initial teaching of letter-sound correspondences than implicit training alone. Copyright © 2013 Elsevier Inc. All rights reserved.
Page, M. P. A.; Norris, D.
2009-01-01
We briefly review the considerable evidence for a common ordering mechanism underlying both immediate serial recall (ISR) tasks (e.g. digit span, non-word repetition) and the learning of phonological word forms. In addition, we discuss how recent work on the Hebb repetition effect is consistent with the idea that learning in this task is itself a laboratory analogue of the sequence-learning component of phonological word-form learning. In this light, we present a unifying modelling framework that seeks to account for ISR and Hebb repetition effects, while being extensible to word-form learning. Because word-form learning is performed in the service of later word recognition, our modelling framework also subsumes a mechanism for word recognition from continuous speech. Simulations of a computational implementation of the modelling framework are presented and are shown to be in accordance with data from the Hebb repetition paradigm. PMID:19933143
Relationships between music training, speech processing, and word learning: a network perspective.
Elmer, Stefan; Jäncke, Lutz
2018-03-15
Numerous studies have documented the behavioral advantages conferred on professional musicians and children undergoing music training in processing speech sounds varying in the spectral and temporal dimensions. These beneficial effects have previously often been associated with local functional and structural changes in the auditory cortex (AC). However, this perspective is oversimplified, in that it does not take into account the intrinsic organization of the human brain, namely, neural networks and oscillatory dynamics. Therefore, we propose a new framework for extending these previous findings to a network perspective by integrating multimodal imaging, electrophysiology, and neural oscillations. In particular, we provide concrete examples of how functional and structural connectivity can be used to model simple neural circuits exerting a modulatory influence on AC activity. In addition, we describe how such a network approach can be used for better comprehending the beneficial effects of music training on more complex speech functions, such as word learning. © 2018 New York Academy of Sciences.
ERIC Educational Resources Information Center
Kaplan, Peter S.; Danko, Christina M.; Diaz, Andres
2010-01-01
Prior research showed that 5- to 13-month-old infants of chronically depressed mothers did not learn to associate a segment of infant-directed speech produced by their own mothers or an unfamiliar nondepressed mother with a smiling female face, but showed better-than-normal learning when a segment of infant-directed speech produced by an…
ERIC Educational Resources Information Center
Derman, Serdar; Bardakçi, Mehmet; Öztürk, Mustafa Serkan
2017-01-01
Suprasegmental features are essential in conveying meaning; however, they are one of the neglected topics in teaching Turkish as a foreign/second language. This paper aims to examine read speech by Arabic students learning Turkish as a second language and describe their read speech in terms of stress and pause. Within this framework, 34 Syrian…
Convergent transcriptional specializations in the brains of humans and song-learning birds
Pfenning, Andreas R.; Hara, Erina; Whitney, Osceola; Rivas, Miriam V.; Wang, Rui; Roulhac, Petra L.; Howard, Jason T.; Wirthlin, Morgan; Lovell, Peter V.; Ganapathy, Ganeshkumar; Mouncastle, Jacquelyn; Moseley, M. Arthur; Thompson, J. Will; Soderblom, Erik J.; Iriki, Atsushi; Kato, Masaki; Gilbert, M. Thomas P.; Zhang, Guojie; Bakken, Trygve; Bongaarts, Angie; Bernard, Amy; Lein, Ed; Mello, Claudio V.; Hartemink, Alexander J.; Jarvis, Erich D.
2015-01-01
Song-learning birds and humans share independently evolved similarities in brain pathways for vocal learning that are essential for song and speech and are not found in most other species. Comparisons of brain transcriptomes of song-learning birds and humans relative to vocal nonlearners identified convergent gene expression specializations in specific song and speech brain regions of avian vocal learners and humans. The strongest shared profiles relate bird motor and striatal song-learning nuclei, respectively, with human laryngeal motor cortex and parts of the striatum that control speech production and learning. Most of the associated genes function in motor control and brain connectivity. Thus, convergent behavior and neural connectivity for a complex trait are associated with convergent specialized expression of multiple genes. PMID:25504733
Speech Perception Deficits in Mandarin-Speaking School-Aged Children with Poor Reading Comprehension
Liu, Huei-Mei; Tsao, Feng-Ming
2017-01-01
Previous studies have shown that children learning alphabetic writing systems who have language impairment or dyslexia exhibit speech perception deficits. However, whether such deficits exist in children learning logographic writing systems who have poor reading comprehension remains uncertain. To further explore this issue, the present study examined speech perception deficits in Mandarin-speaking children with poor reading comprehension. Two self-designed tasks, consonant categorical perception task and lexical tone discrimination task were used to compare speech perception performance in children (n = 31, age range = 7;4–10;2) with poor reading comprehension and an age-matched typically developing group (n = 31, age range = 7;7–9;10). Results showed that the children with poor reading comprehension were less accurate in consonant and lexical tone discrimination tasks and perceived speech contrasts less categorically than the matched group. The correlations between speech perception skills (i.e., consonant and lexical tone discrimination sensitivities and slope of consonant identification curve) and individuals’ oral language and reading comprehension were stronger than the correlations between speech perception ability and word recognition ability. In conclusion, the results revealed that Mandarin-speaking children with poor reading comprehension exhibit less-categorized speech perception, suggesting that imprecise speech perception, especially lexical tone perception, is essential to account for reading learning difficulties in Mandarin-speaking children. PMID:29312031
Increasing Parental Involvement in Speech-Sound Remediation
ERIC Educational Resources Information Center
Roberts, Micah Renee Ferguson
2014-01-01
Speech therapy homework is a key component of a successful speech therapy program, increasing carryover of learned speech sounds. Poor return rate of homework assigned, with a lack of parental involvement, is a problem. The purpose of this project study was to examine what may increase parental participation in speech therapy homework. Guided by…
Compressed Speech Technology: Implications for Learning and Instruction.
ERIC Educational Resources Information Center
Sullivan, LeRoy L.
This paper first traces the historical development of speech compression technology, which has made it possible to alter the spoken rate of a pre-recorded message without excessive distortion. Terms used to describe techniques employed as the technology evolved are discussed, including rapid speech, rate altered speech, cut-and-spliced speech, and…
Contemporary Reflections on Speech-Based Language Learning
ERIC Educational Resources Information Center
Gustafson, Marianne
2009-01-01
In "The Relation of Language to Mental Development and of Speech to Language Teaching," S.G. Davidson displayed several timeless insights into the role of speech in developing language and reasons for using speech as the basis for instruction for children who are deaf and hard of hearing. His understanding that speech includes more than merely…
Learning Vowel Categories from Maternal Speech in Gurindji Kriol
ERIC Educational Resources Information Center
Jones, Caroline; Meakins, Felicity; Muawiyath, Shujau
2012-01-01
Distributional learning is a proposal for how infants might learn early speech sound categories from acoustic input before they know many words. When categories in the input differ greatly in relative frequency and overlap in acoustic space, research in bilingual development suggests that this affects the course of development. In the present…
Speech and Nonspeech Sequence Skill Learning in Adults Who Stutter
ERIC Educational Resources Information Center
Smits-Bandstra, Sarah; De Nil, Luc; Saint-Cyr, Jean A.
2006-01-01
Two studies compared the speech and nonspeech sequence skill learning of nine persons who stutter (PWS) and nine matched fluent speakers (PNS). Sequence skill learning was defined as a continuing process of stable improvement in speed and/or accuracy of sequencing performance over practice and was measured by comparing PWS's and PNS's performance…
Parent-child interaction in motor speech therapy.
Namasivayam, Aravind Kumar; Jethava, Vibhuti; Pukonen, Margit; Huynh, Anna; Goshulak, Debra; Kroll, Robert; van Lieshout, Pascal
2018-01-01
This study measures the reliability and sensitivity of a modified Parent-Child Interaction Observation scale (PCIOs) used to monitor the quality of parent-child interaction. The scale is part of a home-training program employed with direct motor speech intervention for children with speech sound disorders. Eighty-four preschool age children with speech sound disorders were provided either high- (2×/week/10 weeks) or low-intensity (1×/week/10 weeks) motor speech intervention. Clinicians completed the PCIOs at the beginning, middle, and end of treatment. Inter-rater reliability (Kappa scores) was determined by an independent speech-language pathologist who assessed videotaped sessions at the midpoint of the treatment block. Intervention sensitivity of the scale was evaluated using a Friedman test for each item and then followed up with Wilcoxon pairwise comparisons where appropriate. We obtained fair-to-good inter-rater reliability (Kappa = 0.33-0.64) for the PCIOs using only video-based scoring. Child-related items were more strongly influenced by differences in treatment intensity than parent-related items, where a greater number of sessions positively influenced parent learning of treatment skills and child behaviors. The adapted PCIOs is reliable and sensitive to monitor the quality of parent-child interactions in a 10-week block of motor speech intervention with adjunct home therapy. Implications for rehabilitation Parent-centered therapy is considered a cost effective method of speech and language service delivery. However, parent-centered models may be difficult to implement for treatments such as developmental motor speech interventions that require a high degree of skill and training. For children with speech sound disorders and motor speech difficulties, a translated and adapted version of the parent-child observation scale was found to be sufficiently reliable and sensitive to assess changes in the quality of the parent-child interactions during intervention. In developmental motor speech interventions, high-intensity treatment (2×/week/10 weeks) facilitates greater changes in the parent-child interactions than low intensity treatment (1×/week/10 weeks). On one hand, parents may need to attend more than five sessions with the clinician to learn how to observe and address their child's speech difficulties. On the other hand, children with speech sound disorders may need more than 10 sessions to adapt to structured play settings even when activities and therapy materials are age-appropriate.
Sleep and Native Language Interference Affect Non-Native Speech Sound Learning
Earle, F. Sayako; Myers, Emily B.
2015-01-01
Adults learning a new language are faced with a significant challenge: non-native speech sounds that are perceptually similar to sounds in one’s native language can be very difficult to acquire. Sleep and native language interference, two factors that may help to explain this difficulty in acquisition, are addressed in three studies. Results of Experiment 1 showed that participants trained on a non-native contrast at night improved in discrimination 24 hours after training, while those trained in the morning showed no such improvement. Experiments 2 and 3 addressed the possibility that incidental exposure to perceptually similar native language speech sounds during the day interfered with maintenance in the morning group. Taken together, results show that the ultimate success of non-native speech sound learning depends not only on the similarity of learned sounds to the native language repertoire, but also to interference from native language sounds before sleep. PMID:26280264
Sleep and native language interference affect non-native speech sound learning.
Earle, F Sayako; Myers, Emily B
2015-12-01
Adults learning a new language are faced with a significant challenge: non-native speech sounds that are perceptually similar to sounds in one's native language can be very difficult to acquire. Sleep and native language interference, 2 factors that may help to explain this difficulty in acquisition, are addressed in 3 studies. Results of Experiment 1 showed that participants trained on a non-native contrast at night improved in discrimination 24 hr after training, while those trained in the morning showed no such improvement. Experiments 2 and 3 addressed the possibility that incidental exposure to perceptually similar native language speech sounds during the day interfered with maintenance in the morning group. Taken together, results show that the ultimate success of non-native speech sound learning depends not only on the similarity of learned sounds to the native language repertoire, but also to interference from native language sounds before sleep. (c) 2015 APA, all rights reserved).
Songs to syntax: the linguistics of birdsong.
Berwick, Robert C; Okanoya, Kazuo; Beckers, Gabriel J L; Bolhuis, Johan J
2011-03-01
Unlike our primate cousins, many species of bird share with humans a capacity for vocal learning, a crucial factor in speech acquisition. There are striking behavioural, neural and genetic similarities between auditory-vocal learning in birds and human infants. Recently, the linguistic parallels between birdsong and spoken language have begun to be investigated. Although both birdsong and human language are hierarchically organized according to particular syntactic constraints, birdsong structure is best characterized as 'phonological syntax', resembling aspects of human sound structure. Crucially, birdsong lacks semantics and words. Formal language and linguistic analysis remains essential for the proper characterization of birdsong as a model system for human speech and language, and for the study of the brain and cognition evolution. Copyright © 2011 Elsevier Ltd. All rights reserved.
Normativity in 18th century discourse on speech.
MacNamee, T
1984-11-01
Eighteenth century phoneticians, such as Dodart, Ferrein, and Hellwag, extended the taxonomy of visible articulatory processes into the realm of the invisible, notably with the exploration of the voicing mechanism. Remedial initiatives were not simply confined to consideration of the outward manifestations of speech and its disorders: The work of Haller, Kuestner, and Morgagni shows an acute awareness of the nervous organization underlying verbal behavior. There was a characteristic preoccupation with mechanical models of speech, which led to the attempts of Kempelen and other investigators to construct actual "speaking machines." Eighteenth century scholars regarded language as not only an innate capacity peculiar to human nature, but also as a bodily habit learned by experience. The function of the orthoepist was to teach the right speech habits, and the upward mobility of the bourgeoisie created a demand for his services.
Perruchet, Pierre; Tillmann, Barbara
2010-03-01
This study investigates the joint influences of three factors on the discovery of new word-like units in a continuous artificial speech stream: the statistical structure of the ongoing input, the initial word-likeness of parts of the speech flow, and the contextual information provided by the earlier emergence of other word-like units. Results of an experiment conducted with adult participants show that these sources of information have strong and interactive influences on word discovery. The authors then examine the ability of different models of word segmentation to account for these results. PARSER (Perruchet & Vinter, 1998) is compared to the view that word segmentation relies on the exploitation of transitional probabilities between successive syllables, and with the models based on the Minimum Description Length principle, such as INCDROP. The authors submit arguments suggesting that PARSER has the advantage of accounting for the whole pattern of data without ad-hoc modifications, while relying exclusively on general-purpose learning principles. This study strengthens the growing notion that nonspecific cognitive processes, mainly based on associative learning and memory principles, are able to account for a larger part of early language acquisition than previously assumed. Copyright © 2009 Cognitive Science Society, Inc.
Bio-Inspired Neural Model for Learning Dynamic Models
NASA Technical Reports Server (NTRS)
Duong, Tuan; Duong, Vu; Suri, Ronald
2009-01-01
A neural-network mathematical model that, relative to prior such models, places greater emphasis on some of the temporal aspects of real neural physical processes, has been proposed as a basis for massively parallel, distributed algorithms that learn dynamic models of possibly complex external processes by means of learning rules that are local in space and time. The algorithms could be made to perform such functions as recognition and prediction of words in speech and of objects depicted in video images. The approach embodied in this model is said to be "hardware-friendly" in the following sense: The algorithms would be amenable to execution by special-purpose computers implemented as very-large-scale integrated (VLSI) circuits that would operate at relatively high speeds and low power demands.
The effects of speech output technology in the learning of graphic symbols.
Schlosser, R W; Belfiore, P J; Nigam, R; Blischak, D; Hetzroni, O
1995-01-01
The effects of auditory stimuli in the form of synthetic speech output on the learning of graphic symbols were evaluated. Three adults with severe to profound mental retardation and communication impairments were taught to point to lexigrams when presented with words under two conditions. In the first condition, participants used a voice output communication aid to receive synthetic speech as antecedent and consequent stimuli. In the second condition, with a nonelectronic communications board, participants did not receive synthetic speech. A parallel treatments design was used to evaluate the effects of the synthetic speech output as an added component of the augmentative and alternative communication system. The 3 participants reached criterion when not provided with the auditory stimuli. Although 2 participants also reached criterion when not provided with the auditory stimuli, the addition of auditory stimuli resulted in more efficient learning and a decreased error rate. Maintenance results, however, indicated no differences between conditions. Finding suggest that auditory stimuli in the form of synthetic speech contribute to the efficient acquisition of graphic communication symbols. PMID:14743828
Speech and Communication Disorders
... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...
A Construction System for CALL Materials from TV News with Captions
NASA Astrophysics Data System (ADS)
Kobayashi, Satoshi; Tanaka, Takashi; Mori, Kazumasa; Nakagawa, Seiichi
Many language learning materials have been published. In language learning, although repetition training is obviously necessary, it is difficult to maintain the learner's interest/motivation using existing learning materials, because those materials are limited in their scope and contents. In addition, we doubt whether the speech sounds used in most materials are natural in various situations. Nowadays, some TV news programs (CNN, ABC, PBS, NHK, etc.) have closed/open captions corresponding to the announcer's speech. We have developed a system that makes Computer Assisted Language Learning (CALL) materials for both English learning by Japanese and Japanese learning by foreign students from such captioned newscasts. This system computes the synchronization between captions and speech by using HMMs and a forced alignment algorithm. Materials made by the system have following functions: full/partial text caption display, repetition listening, consulting an electronic dictionary, display of the user's/announcer's sound waveform and pitch contour, and automatic construction of a dictation test. Materials have following advantages: materials present polite and natural speech, various and timely topics. Furthermore, the materials have the following possibility: automatic creation of listening/understanding tests, and storage/retrieval of the many materials. In this paper, firstly, we present the organization of the system. Then, we describe results of questionnaires on trial use of the materials. As the result, we got enough accuracy on the synchronization between captions and speech. Speaking totally, we encouraged to research this system.
ERIC Educational Resources Information Center
Sidgi, Lina Fathi Sidig; Shaari, Ahmad Jelani
2017-01-01
The use of technology, such as computer-assisted language learning (CALL), is used in teaching and learning in the foreign language classrooms where it is most needed. One promising emerging technology that supports language learning is automatic speech recognition (ASR). Integrating such technology, especially in the instruction of pronunciation…
Speech Motor Sequence Learning: Acquisition and Retention in Parkinson Disease and Normal Aging
ERIC Educational Resources Information Center
Whitfield, Jason A.; Goberman, Alexander M.
2017-01-01
Purpose: The aim of the current investigation was to examine speech motor sequence learning in neurologically healthy younger adults, neurologically healthy older adults, and individuals with Parkinson disease (PD) over a 2-day period. Method: A sequential nonword repetition task was used to examine learning over 2 days. Participants practiced a…
Six-Year-Olds' Learning of Novel Words through Addressed and Overheard Speech
ERIC Educational Resources Information Center
Boderé, Anneleen; Jaspaert, Koen
2017-01-01
Recent research indicates that infants can learn novel words equally well through addressed speech as through overhearing two adult experimenters. The current study examined to which extent six-year-old children learn from overhearing opportunities in regular kindergarten classroom practices. Fifty-three children (M age = 5;6) were exposed to a…
NASA Technical Reports Server (NTRS)
Wolf, Jared J.
1977-01-01
The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.
Monaural room acoustic parameters from music and speech.
Kendrick, Paul; Cox, Trevor J; Li, Francis F; Zhang, Yonggang; Chambers, Jonathon A
2008-07-01
This paper compares two methods for extracting room acoustic parameters from reverberated speech and music. An approach which uses statistical machine learning, previously developed for speech, is extended to work with music. For speech, reverberation time estimations are within a perceptual difference limen of the true value. For music, virtually all early decay time estimations are within a difference limen of the true value. The estimation accuracy is not good enough in other cases due to differences between the simulated data set used to develop the empirical model and real rooms. The second method carries out a maximum likelihood estimation on decay phases at the end of notes or speech utterances. This paper extends the method to estimate parameters relating to the balance of early and late energies in the impulse response. For reverberation time and speech, the method provides estimations which are within the perceptual difference limen of the true value. For other parameters such as clarity, the estimations are not sufficiently accurate due to the natural reverberance of the excitation signals. Speech is a better test signal than music because of the greater periods of silence in the signal, although music is needed for low frequency measurement.
Automatic lip reading by using multimodal visual features
NASA Astrophysics Data System (ADS)
Takahashi, Shohei; Ohya, Jun
2013-12-01
Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.
Kelly, Spencer D.; Hirata, Yukari; Manansala, Michael; Huang, Jessica
2014-01-01
Co-speech hand gestures are a type of multimodal input that has received relatively little attention in the context of second language learning. The present study explored the role that observing and producing different types of gestures plays in learning novel speech sounds and word meanings in an L2. Naïve English-speakers were taught two components of Japanese—novel phonemic vowel length contrasts and vocabulary items comprised of those contrasts—in one of four different gesture conditions: Syllable Observe, Syllable Produce, Mora Observe, and Mora Produce. Half of the gestures conveyed intuitive information about syllable structure, and the other half, unintuitive information about Japanese mora structure. Within each Syllable and Mora condition, half of the participants only observed the gestures that accompanied speech during training, and the other half also produced the gestures that they observed along with the speech. The main finding was that participants across all four conditions had similar outcomes in two different types of auditory identification tasks and a vocabulary test. The results suggest that hand gestures may not be well suited for learning novel phonetic distinctions at the syllable level within a word, and thus, gesture-speech integration may break down at the lowest levels of language processing and learning. PMID:25071646
Meeuws, Matthias; Pascoal, David; Bermejo, Iñigo; Artaso, Miguel; De Ceulaer, Geert; Govaerts, Paul J
2017-07-01
The software application FOX ('Fitting to Outcome eXpert') is an intelligent agent to assist in the programing of cochlear implant (CI) processors. The current version utilizes a mixture of deterministic and probabilistic logic which is able to improve over time through a learning effect. This study aimed at assessing whether this learning capacity yields measurable improvements in speech understanding. A retrospective study was performed on 25 consecutive CI recipients with a median CI use experience of 10 years who came for their annual CI follow-up fitting session. All subjects were assessed by means of speech audiometry with open set monosyllables at 40, 55, 70, and 85 dB SPL in quiet with their home MAP. Other psychoacoustic tests were executed depending on the audiologist's clinical judgment. The home MAP and the corresponding test results were entered into FOX. If FOX suggested to make MAP changes, they were implemented and another speech audiometry was performed with the new MAP. FOX suggested MAP changes in 21 subjects (84%). The within-subject comparison showed a significant median improvement of 10, 3, 1, and 7% at 40, 55, 70, and 85 dB SPL, respectively. All but two subjects showed an instantaneous improvement in their mean speech audiometric score. Persons with long-term CI use, who received a FOX-assisted CI fitting at least 6 months ago, display improved speech understanding after MAP modifications, as recommended by the current version of FOX. This can be explained only by intrinsic improvements in FOX's algorithms, as they have resulted from learning. This learning is an inherent feature of artificial intelligence and it may yield measurable benefit in speech understanding even in long-term CI recipients.
Lessons Learned in Part-of-Speech Tagging of Conversational Speech
2010-10-01
for conversational speech recognition. In Plenary Meeting and Symposium on Prosody and Speech Processing. Slav Petrov and Dan Klein. 2007. Improved...inference for unlexicalized parsing. In HLT-NAACL. Slav Petrov. 2010. Products of random latent variable grammars. In HLT-NAACL. Brian Roark, Yang Liu
Consolidation and transfer of learning after observing hand gesture.
Cook, Susan Wagner; Duffy, Ryan G; Fenn, Kimberly M
2013-01-01
Children who observe gesture while learning mathematics perform better than children who do not, when tested immediately after training. How does observing gesture influence learning over time? Children (n = 184, ages = 7-10) were instructed with a videotaped lesson on mathematical equivalence and tested immediately after training and 24 hr later. The lesson either included speech and gesture or only speech. Children who saw gesture performed better overall and performance improved after 24 hr. Children who only heard speech did not improve after the delay. The gesture group also showed stronger transfer to different problem types. These findings suggest that gesture enhances learning of abstract concepts and affects how learning is consolidated over time. © 2013 The Authors. Child Development © 2013 Society for Research in Child Development, Inc.
A New Model of Sensorimotor Coupling in the Development of Speech
ERIC Educational Resources Information Center
Westermann, Gert; Miranda, Eduardo Reck
2004-01-01
We present a computational model that learns a coupling between motor parameters and their sensory consequences in vocal production during a babbling phase. Based on the coupling, preferred motor parameters and prototypically perceived sounds develop concurrently. Exposure to an ambient language modifies perception to coincide with the sounds from…
Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech
ERIC Educational Resources Information Center
Tyson, Na'im R.
2012-01-01
In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…
A comparison of sensory-motor activity during speech in first and second languages.
Simmonds, Anna J; Wise, Richard J S; Dhanjal, Novraj S; Leech, Robert
2011-07-01
A foreign language (L2) learned after childhood results in an accent. This functional neuroimaging study investigated speech in L2 as a sensory-motor skill. The hypothesis was that there would be an altered response in auditory and somatosensory association cortex, specifically the planum temporale and parietal operculum, respectively, when speaking in L2 relative to L1, independent of rate of speaking. These regions were selected for three reasons. First, an influential computational model proposes that these cortices integrate predictive feedforward and postarticulatory sensory feedback signals during articulation. Second, these adjacent regions (known as Spt) have been identified as a "sensory-motor interface" for speech production. Third, probabilistic anatomical atlases exist for these regions, to ensure the analyses are confined to sensory-motor differences between L2 and L1. The study used functional magnetic resonance imaging (fMRI), and participants produced connected overt speech. The first hypothesis was that there would be greater activity in the planum temporale and the parietal operculum when subjects spoke in L2 compared with L1, one interpretation being that there is less efficient postarticulatory sensory monitoring when speaking in the less familiar L2. The second hypothesis was that this effect would be observed in both cerebral hemispheres. Although Spt is considered to be left-lateralized, this is based on studies of covert speech, whereas overt speech is accompanied by sensory feedback to bilateral auditory and somatosensory cortices. Both hypotheses were confirmed by the results. These findings provide the basis for future investigations of sensory-motor aspects of language learning using serial fMRI studies.
ERIC Educational Resources Information Center
Harris, Karen R.
To investigate task performance and the use of private speech and to examine the effects of a cognitive training approach, 30 learning disabled (LD) and 30 nonLD Ss (7 to 8 years old) were given a 17 piece wooden puzzle rigged so that it could not be completed correctly. Six variables were measured: (1) proportion of private speech that was task…
Two Stage Data Augmentation for Low Resourced Speech Recognition (Author’s Manuscript)
2016-09-12
speech recognition, deep neural networks, data augmentation 1. Introduction When training data is limited—whether it be audio or text—the obvious...Schwartz, and S. Tsakalidis, “Enhancing low resource keyword spotting with au- tomatically retrieved web documents,” in Interspeech, 2015, pp. 839–843. [2...and F. Seide, “Feature learning in deep neural networks - a study on speech recognition tasks,” in International Conference on Learning Representations
ERIC Educational Resources Information Center
Burgess, Carol A.
Sixth grade students can use cinquain poems to explore language, learn grammar, and write creatively. Before learning about cinquains, students should be introduced to simpler poetic forms. To introduce cinquains, the teacher writes a simple example on the board and has the students informally figure out the parts of speech and grammatical…
An attention-gating recurrent working memory architecture for emergent speech representation
NASA Astrophysics Data System (ADS)
Elshaw, Mark; Moore, Roger K.; Klein, Michael
2010-06-01
This paper describes an attention-gating recurrent self-organising map approach for emergent speech representation. Inspired by evidence from human cognitive processing, the architecture combines two main neural components. The first component, the attention-gating mechanism, uses actor-critic learning to perform selective attention towards speech. Through this selective attention approach, the attention-gating mechanism controls access to working memory processing. The second component, the recurrent self-organising map memory, develops a temporal-distributed representation of speech using phone-like structures. Representing speech in terms of phonetic features in an emergent self-organised fashion, according to research on child cognitive development, recreates the approach found in infants. Using this representational approach, in a fashion similar to infants, should improve the performance of automatic recognition systems through aiding speech segmentation and fast word learning.
Hello World, It's Me: Bringing the Basic Speech Communication Course into the Digital Age
ERIC Educational Resources Information Center
Kirkwood, Jessica; Gutgold, Nichola D.; Manley, Destiny
2011-01-01
During the past decade, instructors of speech communication have been adapting the introductory speech course to keep up with the television age. Learning units in speech textbooks now teach how to speak well on television, as well as how to interpret speeches in the media. This article argues that the computer age invites adaptation of the…
ERIC Educational Resources Information Center
Santesso, Diane L.; Schmidt, Louis A.; Trainor, Laurel J.
2007-01-01
Many studies have shown that infants prefer infant-directed (ID) speech to adult-directed (AD) speech. ID speech functions to aid language learning, obtain and/or maintain an infant's attention, and create emotional communication between the infant and caregiver. We examined psychophysiological responses to ID speech that varied in affective…
Lip Movement Exaggerations during Infant-Directed Speech
ERIC Educational Resources Information Center
Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana
2010-01-01
Purpose: Although a growing body of literature has identified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their…
Perceptual Learning of Noise Vocoded Words: Effects of Feedback and Lexicality
ERIC Educational Resources Information Center
Hervais-Adelman, Alexis; Davis, Matthew H.; Johnsrude, Ingrid S.; Carlyon, Robert P.
2008-01-01
Speech comprehension is resistant to acoustic distortion in the input, reflecting listeners' ability to adjust perceptual processes to match the speech input. This adjustment is reflected in improved comprehension of distorted speech with experience. For noise vocoding, a manipulation that removes spectral detail from speech, listeners' word…
"Who" is saying "what"? Brain-based decoding of human voice and speech.
Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer
2008-11-07
Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
Orthographic learning and the role of text-to-speech software in Dutch disabled readers.
Staels, Eva; Van den Broeck, Wim
2015-01-01
In this study, we examined whether orthographic learning can be demonstrated in disabled readers learning to read in a transparent orthography (Dutch). In addition, we tested the effect of the use of text-to-speech software, a new form of direct instruction, on orthographic learning. Both research goals were investigated by replicating Share's self-teaching paradigm. A total of 65 disabled Dutch readers were asked to read eight stories containing embedded homophonic pseudoword targets (e.g., Blot/Blod), with or without the support of text-to-speech software. The amount of orthographic learning was assessed 3 or 7 days later by three measures of orthographic learning. First, the results supported the presence of orthographic learning during independent silent reading by demonstrating that target spellings were correctly identified more often, named more quickly, and spelled more accurately than their homophone foils. Our results support the hypothesis that all readers, even poor readers of transparent orthographies, are capable of developing word-specific knowledge. Second, a negative effect of text-to-speech software on orthographic learning was demonstrated in this study. This negative effect was interpreted as the consequence of passively listening to the auditory presentation of the text. We clarify how these results can be interpreted within current theoretical accounts of orthographic learning and briefly discuss implications for remedial interventions. © Hammill Institute on Disabilities 2013.
Benchmarking Deep Learning Models on Large Healthcare Datasets.
Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan
2018-06-04
Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Ural, A. Engin; Yuret, Deniz; Ketrez, F. Nihan; Kocbas, Dilara; Kuntay, Aylin C.
2009-01-01
The syntactic bootstrapping mechanism of verb learning was evaluated against child-directed speech in Turkish, a language with rich morphology, nominal ellipsis and free word order. Machine-learning algorithms were run on transcribed caregiver speech directed to two Turkish learners (one hour every two weeks between 0;9 to 1;10) of different…
Review of Speech-to-Text Recognition Technology for Enhancing Learning
ERIC Educational Resources Information Center
Shadiev, Rustam; Hwang, Wu-Yuin; Chen, Nian-Shing; Huang, Yueh-Min
2014-01-01
This paper reviewed literature from 1999 to 2014 inclusively on how Speech-to-Text Recognition (STR) technology has been applied to enhance learning. The first aim of this review is to understand how STR technology has been used to support learning over the past fifteen years, and the second is to analyze all research evidence to understand how…
ERIC Educational Resources Information Center
Wallace, Sarah E.
2015-01-01
Team-based learning (TBL), although found to increase student engagement and higher-level thinking, has not been examined in the field of speech-language pathology. The purpose of this study was to examine the effect of integrating TBL into a capstone course in evidence-based practice (EBP). The researcher evaluated 27 students' understanding of…
ERIC Educational Resources Information Center
Shadiev, Rustam; Wu, Ting-Ting; Sun, Ai; Huang, Yueh-Min
2018-01-01
In this study, 21 university students, who represented thirteen nationalities, participated in an online cross-cultural learning activity. The participants were engaged in interactions and exchanges carried out on Facebook® and Skype® platforms, and their multilingual communications were supported by speech-to-text recognition (STR) and…
Talker identification across source mechanisms: experiments with laryngeal and electrolarynx speech.
Perrachione, Tyler K; Stepp, Cara E; Hillman, Robert E; Wong, Patrick C M
2014-10-01
The purpose of this study was to determine listeners' ability to learn talker identity from speech produced with an electrolarynx, explore source and filter differentiation in talker identification, and describe acoustic-phonetic changes associated with electrolarynx use. Healthy adult control listeners learned to identify talkers from speech recordings produced using talkers' normal laryngeal vocal source or an electrolarynx. Listeners' abilities to identify talkers from the trained vocal source (Experiment 1) and generalize this knowledge to the untrained source (Experiment 2) were assessed. Acoustic-phonetic measurements of spectral differences between source mechanisms were performed. Additional listeners attempted to match recordings from different source mechanisms to a single talker (Experiment 3). Listeners successfully learned talker identity from electrolarynx speech but less accurately than from laryngeal speech. Listeners were unable to generalize talker identity to the untrained source mechanism. Electrolarynx use resulted in vowels with higher F1 frequencies compared with laryngeal speech. Listeners matched recordings from different sources to a single talker better than chance. Electrolarynx speech, although lacking individual differences in voice quality, nevertheless conveys sufficient indexical information related to the vocal filter and articulation for listeners to identify individual talkers. Psychologically, perception of talker identity arises from a "gestalt" of the vocal source and filter.
Talker identification across source mechanisms: Experiments with laryngeal and electrolarynx speech
Perrachione, Tyler K.; Stepp, Cara E.; Hillman, Robert E.; Wong, Patrick C.M.
2015-01-01
Purpose To determine listeners' ability to learn talker identity from speech produced with an electrolarynx, explore source and filter differentiation in talker identification, and describe acoustic-phonetic changes associated with electrolarynx use. Method Healthy adult control listeners learned to identify talkers from speech recordings produced using talkers' normal laryngeal vocal source or an electrolarynx. Listeners' abilities to identify talkers from the trained vocal source (Experiment 1) and generalize this knowledge to the untrained source (Experiment 2) were assessed. Acoustic-phonetic measurements of spectral differences between source mechanisms were performed. Additional listeners attempted to match recordings from different source mechanisms to a single talker (Experiment 3). Results Listeners successfully learned talker identity from electrolarynx speech, but less accurately than from laryngeal speech. Listeners were unable to generalize talker identity to the untrained source mechanism. Electrolarynx use resulted in vowels with higher F1 frequencies compared to laryngeal speech. Listeners matched recordings from different sources to a single talker better than chance. Conclusions Electrolarynx speech, though lacking individual differences in voice quality, nevertheless conveys sufficient indexical information related to the vocal filter and articulation for listeners to identify individual talkers. Psychologically, perception of talker identity arises from a “gestalt” of the vocal source and filter. PMID:24801962
Computational validation of the motor contribution to speech perception.
Badino, Leonardo; D'Ausilio, Alessandro; Fadiga, Luciano; Metta, Giorgio
2014-07-01
Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data). Copyright © 2014 Cognitive Science Society, Inc.
DIMENSION-BASED STATISTICAL LEARNING OF VOWELS
Liu, Ran; Holt, Lori L.
2015-01-01
Speech perception depends on long-term representations that reflect regularities of the native language. However, listeners rapidly adapt when speech acoustics deviate from these regularities due to talker idiosyncrasies such as foreign accents and dialects. To better understand these dual aspects of speech perception, we probe native English listeners’ baseline perceptual weighting of two acoustic dimensions (spectral quality and vowel duration) towards vowel categorization and examine how they subsequently adapt to an “artificial accent” that deviates from English norms in the correlation between the two dimensions. At baseline, listeners rely relatively more on spectral quality than vowel duration to signal vowel category, but duration nonetheless contributes. Upon encountering an “artificial accent” in which the spectral-duration correlation is perturbed relative to English language norms, listeners rapidly down-weight reliance on duration. Listeners exhibit this type of short-term statistical learning even in the context of nonwords, confirming that lexical information is not necessary to this form of adaptive plasticity in speech perception. Moreover, learning generalizes to both novel lexical contexts and acoustically-distinct altered voices. These findings are discussed in the context of a mechanistic proposal for how supervised learning may contribute to this type of adaptive plasticity in speech perception. PMID:26280268
The Impact of Feedback Frequency on Performance in a Novel Speech Motor Learning Task.
Lowe, Mara Steinberg; Buchwald, Adam
2017-06-22
This study investigated whether whole nonword accuracy, phoneme accuracy, and acoustic duration measures were influenced by the amount of feedback speakers without impairment received during a novel speech motor learning task. Thirty-two native English speakers completed a nonword production task across 3 time points: practice, short-term retention, and long-term retention. During practice, participants received knowledge of results feedback according to a randomly assigned schedule (100%, 50%, 20%, or 0%). Changes in nonword accuracy, phoneme accuracy, nonword duration, and initial-cluster duration were compared among feedback groups, sessions, and stimulus properties. All participants improved phoneme and whole nonword accuracy at short-term and long-term retention time points. Participants also refined productions of nonwords, as indicated by a decrease in nonword duration across sessions. The 50% group exhibited the largest reduction in duration between practice and long-term retention for nonwords with native and nonnative clusters. All speakers, regardless of feedback schedule, learned new speech motor behaviors quickly with a high degree of accuracy and refined their speech motor skills for perceptually accurate productions. Acoustic measurements may capture more subtle, subperceptual changes that may occur during speech motor learning. https://doi.org/10.23641/asha.5116324.
Neural network based speech synthesizer: A preliminary report
NASA Technical Reports Server (NTRS)
Villarreal, James A.; Mcintire, Gary
1987-01-01
A neural net based speech synthesis project is discussed. The novelty is that the reproduced speech was extracted from actual voice recordings. In essence, the neural network learns the timing, pitch fluctuations, connectivity between individual sounds, and speaking habits unique to that individual person. The parallel distributed processing network used for this project is the generalized backward propagation network which has been modified to also learn sequences of actions or states given in a particular plan.
Time-Warp–Invariant Neuronal Processing
Gütig, Robert; Sompolinsky, Haim
2009-01-01
Fluctuations in the temporal durations of sensory signals constitute a major source of variability within natural stimulus ensembles. The neuronal mechanisms through which sensory systems can stabilize perception against such fluctuations are largely unknown. An intriguing instantiation of such robustness occurs in human speech perception, which relies critically on temporal acoustic cues that are embedded in signals with highly variable duration. Across different instances of natural speech, auditory cues can undergo temporal warping that ranges from 2-fold compression to 2-fold dilation without significant perceptual impairment. Here, we report that time-warp–invariant neuronal processing can be subserved by the shunting action of synaptic conductances that automatically rescales the effective integration time of postsynaptic neurons. We propose a novel spike-based learning rule for synaptic conductances that adjusts the degree of synaptic shunting to the temporal processing requirements of a given task. Applying this general biophysical mechanism to the example of speech processing, we propose a neuronal network model for time-warp–invariant word discrimination and demonstrate its excellent performance on a standard benchmark speech-recognition task. Our results demonstrate the important functional role of synaptic conductances in spike-based neuronal information processing and learning. The biophysics of temporal integration at neuronal membranes can endow sensory pathways with powerful time-warp–invariant computational capabilities. PMID:19582146
ERIC Educational Resources Information Center
Raskind, Marshall
1993-01-01
This article describes assistive technologies for persons with learning disabilities, including word processing, spell checking, proofreading programs, outlining/"brainstorming" programs, abbreviation expanders, speech recognition, speech synthesis/screen review, optical character recognition systems, personal data managers, free-form databases,…
Hurtado, Nereyda; Marchman, Virginia A.; Fernald, Anne
2010-01-01
It is well established that variation in caregivers' speech is associated with language outcomes, yet little is known about the learning principles that mediate these effects. This longitudinal study (n = 27) explores whether Spanish-learning children's early experiences with language predict efficiency in real-time comprehension and vocabulary learning. Measures of mothers' speech at 18 months were examined in relation to children's speech processing efficiency and reported vocabulary at 18 and 24 months. Children of mothers who provided more input at 18 months knew more words and were faster in word recognition at 24 months. Moreover, multiple regression analyses indicated that the influences of caregiver speech on speed of word recognition and vocabulary were largely overlapping. This study provides the first evidence that input shapes children's lexical processing efficiency and that vocabulary growth and increasing facility in spoken word comprehension work together to support the uptake of the information that rich input affords the young language learner. PMID:19046145
Talker-specific learning in amnesia: Insight into mechanisms of adaptive speech perception
Trude, Alison M.; Duff, Melissa C.; Brown-Schmidt, Sarah
2014-01-01
A hallmark of human speech perception is the ability to comprehend speech quickly and effortlessly despite enormous variability across talkers. However, current theories of speech perception do not make specific claims about the memory mechanisms involved in this process. To examine whether declarative memory is necessary for talker-specific learning, we tested the ability of amnesic patients with severe declarative memory deficits to learn and distinguish the accents of two unfamiliar talkers by monitoring their eye-gaze as they followed spoken instructions. Analyses of the time-course of eye fixations showed that amnesic patients rapidly learned to distinguish these accents and tailored perceptual processes to the voice of each talker. These results demonstrate that declarative memory is not necessary for this ability and points to the involvement of non-declarative memory mechanisms. These results are consistent with findings that other social and accommodative behaviors are preserved in amnesia and contribute to our understanding of the interactions of multiple memory systems in the use and understanding of spoken language. PMID:24657480
Speech Motor Sequence Learning: Acquisition and Retention in Parkinson Disease and Normal Aging.
Whitfield, Jason A; Goberman, Alexander M
2017-06-10
The aim of the current investigation was to examine speech motor sequence learning in neurologically healthy younger adults, neurologically healthy older adults, and individuals with Parkinson disease (PD) over a 2-day period. A sequential nonword repetition task was used to examine learning over 2 days. Participants practiced a sequence of 6 monosyllabic nonwords that was retested following nighttime sleep. The speed and accuracy of the nonword sequence were measured, and learning was inferred by examining performance within and between sessions. Though all groups exhibited comparable improvements of the nonword sequence performance during the initial session, between-session retention of the nonword sequence differed between groups. Younger adult controls exhibited offline gains, characterized by an increase in the speed and accuracy of nonword sequence performance across sessions, whereas older adults exhibited stable between-session performance. Individuals with PD exhibited offline losses, marked by an increase in sequence duration between sessions. The current results demonstrate that both PD and normal aging affect retention of speech motor learning. Furthermore, these data suggest that basal ganglia dysfunction associated with PD may affect the later stages of speech motor learning. Findings from the current investigation are discussed in relation to studies examining consolidation of nonspeech motor learning.
A robotic voice simulator and the interactive training for hearing-impaired people.
Sawada, Hideyuki; Kitani, Mitsuki; Hayashi, Yasumori
2008-01-01
A talking and singing robot which adaptively learns the vocalization skill by means of an auditory feedback learning algorithm is being developed. The robot consists of motor-controlled vocal organs such as vocal cords, a vocal tract and a nasal cavity to generate a natural voice imitating a human vocalization. In this study, the robot is applied to the training system of speech articulation for the hearing-impaired, because the robot is able to reproduce their vocalization and to teach them how it is to be improved to generate clear speech. The paper briefly introduces the mechanical construction of the robot and how it autonomously acquires the vocalization skill in the auditory feedback learning by listening to human speech. Then the training system is described, together with the evaluation of the speech training by auditory impaired people.
Issues in developing valid assessments of speech pathology students' performance in the workplace.
McAllister, Sue; Lincoln, Michelle; Ferguson, Alison; McAllister, Lindy
2010-01-01
Workplace-based learning is a critical component of professional preparation in speech pathology. A validated assessment of this learning is seen to be 'the gold standard', but it is difficult to develop because of design and validation issues. These issues include the role and nature of judgement in assessment, challenges in measuring quality, and the relationship between assessment and learning. Valid assessment of workplace-based performance needs to capture the development of competence over time and account for both occupation specific and generic competencies. This paper reviews important conceptual issues in the design of valid and reliable workplace-based assessments of competence including assessment content, process, impact on learning, measurement issues, and validation strategies. It then goes on to share what has been learned about quality assessment and validation of a workplace-based performance assessment using competency-based ratings. The outcomes of a four-year national development and validation of an assessment tool are described. A literature review of issues in conceptualizing, designing, and validating workplace-based assessments was conducted. Key factors to consider in the design of a new tool were identified and built into the cycle of design, trialling, and data analysis in the validation stages of the development process. This paper provides an accessible overview of factors to consider in the design and validation of workplace-based assessment tools. It presents strategies used in the development and national validation of a tool COMPASS, used in an every speech pathology programme in Australia, New Zealand, and Singapore. The paper also describes Rasch analysis, a model-based statistical approach which is useful for establishing validity and reliability of assessment tools. Through careful attention to conceptual and design issues in the development and trialling of workplace-based assessments, it has been possible to develop the world's first valid and reliable national assessment tool for the assessment of performance in speech pathology.
Özçalışkan, Şeyda; Levine, Susan C.; Goldin-Meadow, Susan
2013-01-01
Children with pre/perinatal unilateral brain lesions (PL) show remarkable plasticity for language development. Is this plasticity characterized by the same developmental trajectory that characterizes typically developing (TD) children, with gesture leading the way into speech? We explored this question, comparing 11 children with PL—matched to 30 TD children on expressive vocabulary—in the second year of life. Children with PL showed similarities to TD children for simple but not complex sentence types. Children with PL produced simple sentences across gesture and speech several months before producing them entirely in speech, exhibiting parallel delays in both gesture+speech and speech-alone. However, unlike TD children, children with PL produced complex sentence types first in speech-alone. Overall, the gesture-speech system appears to be a robust feature of language-learning for simple—but not complex—sentence constructions, acting as a harbinger of change in language development even when that language is developing in an injured brain. PMID:23217292
ERIC Educational Resources Information Center
Masapollo, Matthew; Polka, Linda; Ménard, Lucie
2016-01-01
To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to…
Altvater-Mackensen, Nicole; Grossmann, Tobias
2015-01-01
Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential looking paradigm, 44 German 6-month olds' ability to detect mismatches between concurrently presented auditory and visual native vowels was tested. Outcomes were related to mothers' speech style and interactive behavior assessed during free play with their infant, and to infant-specific factors assessed through a questionnaire. Results show that mothers' and infants' social behavior modulated infants' preference for matching audiovisual speech. Moreover, infants' audiovisual speech perception correlated with later vocabulary size, suggesting a lasting effect on language development. © 2014 The Authors. Child Development © 2014 Society for Research in Child Development, Inc.
Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training.
Bernstein, Lynne E; Auer, Edward T; Eberhardt, Silvio P; Jiang, Jintao
2013-01-01
Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.
Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training
Bernstein, Lynne E.; Auer, Edward T.; Eberhardt, Silvio P.; Jiang, Jintao
2013-01-01
Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called “reverse hierarchy theory” of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning. PMID:23515520
MCA-NMF: Multimodal Concept Acquisition with Non-Negative Matrix Factorization
Mangin, Olivier; Filliat, David; ten Bosch, Louis; Oudeyer, Pierre-Yves
2015-01-01
In this paper we introduce MCA-NMF, a computational model of the acquisition of multimodal concepts by an agent grounded in its environment. More precisely our model finds patterns in multimodal sensor input that characterize associations across modalities (speech utterances, images and motion). We propose this computational model as an answer to the question of how some class of concepts can be learnt. In addition, the model provides a way of defining such a class of plausibly learnable concepts. We detail why the multimodal nature of perception is essential to reduce the ambiguity of learnt concepts as well as to communicate about them through speech. We then present a set of experiments that demonstrate the learning of such concepts from real non-symbolic data consisting of speech sounds, images, and motions. Finally we consider structure in perceptual signals and demonstrate that a detailed knowledge of this structure, named compositional understanding can emerge from, instead of being a prerequisite of, global understanding. An open-source implementation of the MCA-NMF learner as well as scripts and associated experimental data to reproduce the experiments are publicly available. PMID:26489021
The Suitability of Cloud-Based Speech Recognition Engines for Language Learning
ERIC Educational Resources Information Center
Daniels, Paul; Iwago, Koji
2017-01-01
As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…
Li, Kan; Príncipe, José C.
2018-01-01
This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime. PMID:29666568
Li, Kan; Príncipe, José C
2018-01-01
This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime.
ERIC Educational Resources Information Center
Commonwealth of Learning, 2007
2007-01-01
The third in a series published by the Commonwealth of Learning (COL), this booklet reproduces five addresses and one article from late 2006 and early 2007. This collection of speeches is entitled "Learning for Development" because that is the focus of the work of the COL's work. The addresses presented here were given at the opening and…
François, Clément; Cunillera, Toni; Garcia, Enara; Laine, Matti; Rodriguez-Fornells, Antoni
2017-04-01
Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning. Copyright © 2016 Elsevier Ltd. All rights reserved.
Teaching tone and intonation with the Prosody Workstation using schematic versus veridical contours
NASA Astrophysics Data System (ADS)
Allen, George D.; Eulenberg, John B.
2004-05-01
Prosodic features of speech (e.g., intonation and rhythm) are often challenging for adults to learn. Most computerized teaching tools, developed to help learners mimic model prosodic patterns, display lines representing the veridical (actual) acoustic fundamental frequency and intensity of the model speech. However, a veridical display may not be optimal for this task. Instead, stereotypical representations (e.g., simplified level or slanting lines) may help by reducing the amount of potentially distracting information. The Prosody Workstation (PW) permits the prosodic contours of both models and users' responses to be displayed using either veridical or stereotypical contours. Users are informed by both visual displays and scores representing the degree of match of their utterance to the model. American English-speaking undergraduates are being studied learning the tone contours and rhythm of Chinese and Hausa utterances ranging in length from two to six syllables. Data include (a) accuracy of mimicking of the models' prosodic contours, measured by the PW; (b) quality of tonal and rhythmic production, judged by native speaker listeners; and (c) learners' perceptions of the ease of the task, measured by a questionnaire at the end of each session.
Primate feedstock for the evolution of consonants.
Lameira, Adriano R; Maddieson, Ian; Zuberbühler, Klaus
2014-02-01
The evolution of speech remains an elusive scientific problem. A widespread notion is that vocal learning, underlined by vocal-fold control, is a key prerequisite for speech evolution. Although present in birds and non-primate mammals, vocal learning is ostensibly absent in non-human primates. Here we argue that the main road to speech evolution has been through controlling the supralaryngeal vocal tract, for which we find evidence for evolutionary continuity within the great apes. Copyright © 2013 Elsevier Ltd. All rights reserved.
Using Predictability for Lexical Segmentation
ERIC Educational Resources Information Center
Çöltekin, Çagri
2017-01-01
This study investigates a strategy based on predictability of consecutive sub-lexical units in learning to segment a continuous speech stream into lexical units using computational modeling and simulations. Lexical segmentation is one of the early challenges during language acquisition, and it has been studied extensively through psycholinguistic…
Huebner, Philip A.; Willits, Jon A.
2018-01-01
Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary “deep learning” approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory) to predict word sequences in a 5-million-word corpus of speech directed to children ages 0–3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM) and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing semantic system. PMID:29520243
Talker Identification across Source Mechanisms: Experiments with Laryngeal and Electrolarynx Speech
ERIC Educational Resources Information Center
Perrachione, Tyler K.; Stepp, Cara E.; Hillman, Robert E.; Wong, Patrick C. M.
2014-01-01
Purpose: The purpose of this study was to determine listeners' ability to learn talker identity from speech produced with an electrolarynx, explore source and filter differentiation in talker identification, and describe acoustic-phonetic changes associated with electrolarynx use. Method: Healthy adult control listeners learned to identify…
Plasticity in the adult human auditory brainstem following short-term linguistic training
Song, Judy H.; Skoe, Erika; Wong, Patrick C. M.; Kraus, Nina
2009-01-01
Peripheral and central structures along the auditory pathway contribute to speech processing and learning. However, because speech requires the use of functionally and acoustically complex sounds which necessitates high sensory and cognitive demands, long-term exposure and experience using these sounds is often attributed to the neocortex with little emphasis placed on subcortical structures. The present study examines changes in the auditory brainstem, specifically the frequency following response (FFR), as native English-speaking adults learn to incorporate foreign speech sounds (lexical pitch patterns) in word identification. The FFR presumably originates from the auditory midbrain, and can be elicited pre-attentively. We measured FFRs to the trained pitch patterns before and after training. Measures of pitch-tracking were then derived from the FFR signals. We found increased accuracy in pitch-tracking after training, including a decrease in the number of pitch-tracking errors and a refinement in the energy devoted to encoding pitch. Most interestingly, this change in pitch-tracking accuracy only occurred in the most acoustically complex pitch contour (dipping contour), which is also the least familiar to our English-speaking subjects. These results not only demonstrate the contribution of the brainstem in language learning and its plasticity in adulthood, but they also demonstrate the specificity of this contribution (i.e., changes in encoding only occurs in specific, least familiar stimuli, not all stimuli). Our findings complement existing data showing cortical changes after second language learning, and are consistent with models suggesting that brainstem changes resulting from perceptual learning are most apparent when acuity in encoding is most needed. PMID:18370594
Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech
Leong, Victoria; Goswami, Usha
2015-01-01
When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across languages. The S-AMPH model reveals a crucial developmental role for stress feet (AMs ~2 Hz). Stress feet underpin different linguistic rhythm typologies, and speech rhythm underpins language acquisition by infants in all languages. PMID:26641472
Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech.
Leong, Victoria; Goswami, Usha
2015-01-01
When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72-82% (freely-read CDS) and 90-98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across languages. The S-AMPH model reveals a crucial developmental role for stress feet (AMs ~2 Hz). Stress feet underpin different linguistic rhythm typologies, and speech rhythm underpins language acquisition by infants in all languages.
Implicit prosody mining based on the human eye image capture technology
NASA Astrophysics Data System (ADS)
Gao, Pei-pei; Liu, Feng
2013-08-01
The technology of eye tracker has become the main methods of analyzing the recognition issues in human-computer interaction. Human eye image capture is the key problem of the eye tracking. Based on further research, a new human-computer interaction method introduced to enrich the form of speech synthetic. We propose a method of Implicit Prosody mining based on the human eye image capture technology to extract the parameters from the image of human eyes when reading, control and drive prosody generation in speech synthesis, and establish prosodic model with high simulation accuracy. Duration model is key issues for prosody generation. For the duration model, this paper put forward a new idea for obtaining gaze duration of eyes when reading based on the eye image capture technology, and synchronous controlling this duration and pronunciation duration in speech synthesis. The movement of human eyes during reading is a comprehensive multi-factor interactive process, such as gaze, twitching and backsight. Therefore, how to extract the appropriate information from the image of human eyes need to be considered and the gaze regularity of eyes need to be obtained as references of modeling. Based on the analysis of current three kinds of eye movement control model and the characteristics of the Implicit Prosody reading, relative independence between speech processing system of text and eye movement control system was discussed. It was proved that under the same text familiarity condition, gaze duration of eyes when reading and internal voice pronunciation duration are synchronous. The eye gaze duration model based on the Chinese language level prosodic structure was presented to change previous methods of machine learning and probability forecasting, obtain readers' real internal reading rhythm and to synthesize voice with personalized rhythm. This research will enrich human-computer interactive form, and will be practical significance and application prospect in terms of disabled assisted speech interaction. Experiments show that Implicit Prosody mining based on the human eye image capture technology makes the synthesized speech has more flexible expressions.
Effects of Familiarity and Feeding on Newborn Speech-Voice Recognition
ERIC Educational Resources Information Center
Valiante, A. Grace; Barr, Ronald G.; Zelazo, Philip R.; Brant, Rollin; Young, Simon N.
2013-01-01
Newborn infants preferentially orient to familiar over unfamiliar speech sounds. They are also better at remembering unfamiliar speech sounds for short periods of time if learning and retention occur after a feed than before. It is unknown whether short-term memory for speech is enhanced when the sound is familiar (versus unfamiliar) and, if so,…
ERIC Educational Resources Information Center
Elliot, Linda; And Others
Designed to aid school districts, administrators, and teachers in meeting the Idaho Department of Education Speech Communication requirement, this pamphlet first defines the learning-teaching environment for the speech communication course, describes who should teach it, and justifies its inclusion in the school curriculum. The main part of the…
The Importance of Production Frequency in Therapy for Childhood Apraxia of Speech
ERIC Educational Resources Information Center
Edeal, Denice Michelle; Gildersleeve-Neumann, Christina Elke
2011-01-01
Purpose: This study explores the importance of production frequency during speech therapy to determine whether more practice of speech targets leads to increased performance within a treatment session, as well as to motor learning, in the form of generalization to untrained words. Method: Two children with childhood apraxia of speech were treated…
ERIC Educational Resources Information Center
Froyen, Dries; Willems, Gonny; Blomert, Leo
2011-01-01
The phonological deficit theory of dyslexia assumes that degraded speech sound representations might hamper the acquisition of stable letter-speech sound associations necessary for learning to read. However, there is only scarce and mainly indirect evidence for this assumed letter-speech sound association problem. The present study aimed at…
ERIC Educational Resources Information Center
Anderson, Karen L.; Goldstein, Howard
2004-01-01
Children typically learn in classroom environments that have background noise and reverberation that interfere with accurate speech perception. Amplification technology can enhance the speech perception of students who are hard of hearing. Purpose: This study used a single-subject alternating treatments design to compare the speech recognition…
Pitch-Learning Algorithm For Speech Encoders
NASA Technical Reports Server (NTRS)
Bhaskar, B. R. Udaya
1988-01-01
Adaptive algorithm detects and corrects errors in sequence of estimates of pitch period of speech. Algorithm operates in conjunction with techniques used to estimate pitch period. Used in such parametric and hybrid speech coders as linear predictive coders and adaptive predictive coders.
ERIC Educational Resources Information Center
Anderson, Carolyn
2011-01-01
This article presents research undertaken as part of a PhD by Carolyn Anderson who is a senior lecturer on the BSc (Hons) in Speech and Language Pathology at the University of Strathclyde. The study explores the professional learning experiences of 49 teachers working in eight schools and units for children with additional support needs in…
ERIC Educational Resources Information Center
Kempe, Vera; Brooks, Patricia J.
2005-01-01
This study investigated second-language (L2) learning to gain a better understanding of learning mechanisms that also operate in child first-language L1 learners. The research was inspired by research on the beneficial effects of child-directed speech CDS. We tried to examine whether such benefits can be observed in the domain of inflectional…
McCormack, Jane; Easton, Catherine; Morkel-Kingsbury, Lenni
2014-01-01
The landscape of tertiary education is changing. Developments in information and communications technology have created new ways of engaging with subject material and supporting students on their learning journeys. Therefore, it is timely to reconsider and re-imagine the education of speech-language pathology (SLP) students within this new learning space. In this paper, we outline the design of a new Master of Speech Pathology course being offered by distance education at Charles Sturt University (CSU) in Australia. We discuss the catalyst for the course and the commitments of the SLP team at CSU, then describe the curriculum design process, focusing on the pedagogical approach and the learning and teaching strategies utilised in the course delivery. We explain how the learning and teaching strategies have been selected to support students' online learning experience and enable greater interaction between students and the subject material, with students and subject experts, and among student groups. Finally, we highlight some of the challenges in designing and delivering a distance education SLP program and identify future directions for educating students in an online world. © 2015 S. Karger AG, Basel.
Rhythmic grouping biases constrain infant statistical learning
Hay, Jessica F.; Saffran, Jenny R.
2012-01-01
Linguistic stress and sequential statistical cues to word boundaries interact during speech segmentation in infancy. However, little is known about how the different acoustic components of stress constrain statistical learning. The current studies were designed to investigate whether intensity and duration each function independently as cues to initial prominence (trochaic-based hypothesis) or whether, as predicted by the Iambic-Trochaic Law (ITL), intensity and duration have characteristic and separable effects on rhythmic grouping (ITL-based hypothesis) in a statistical learning task. Infants were familiarized with an artificial language (Experiments 1 & 3) or a tone stream (Experiment 2) in which there was an alternation in either intensity or duration. In addition to potential acoustic cues, the familiarization sequences also contained statistical cues to word boundaries. In speech (Experiment 1) and non-speech (Experiment 2) conditions, 9-month-old infants demonstrated discrimination patterns consistent with an ITL-based hypothesis: intensity signaled initial prominence and duration signaled final prominence. The results of Experiment 3, in which 6.5-month-old infants were familiarized with the speech streams from Experiment 1, suggest that there is a developmental change in infants’ willingness to treat increased duration as a cue to word offsets in fluent speech. Infants’ perceptual systems interact with linguistic experience to constrain how infants learn from their auditory environment. PMID:23730217
Wright, Beverly A.; Baese-Berk, Melissa M.; Marrone, Nicole; Bradlow, Ann R.
2015-01-01
Language acquisition typically involves periods when the learner speaks and listens to the new language, and others when the learner is exposed to the language without consciously speaking or listening to it. Adaptation to variants of a native language occurs under similar conditions. Here, speech learning by adults was assessed following a training regimen that mimicked this common situation of language immersion without continuous active language processing. Experiment 1 focused on the acquisition of a novel phonetic category along the voice-onset-time continuum, while Experiment 2 focused on adaptation to foreign-accented speech. The critical training regimens of each experiment involved alternation between periods of practice with the task of phonetic classification (Experiment 1) or sentence recognition (Experiment 2) and periods of stimulus exposure without practice. These practice and exposure periods yielded little to no improvement separately, but alternation between them generated as much or more improvement as did practicing during every period. Practice appears to serve as a catalyst that enables stimulus exposures encountered both during and outside of the practice periods to contribute to quite distinct cases of speech learning. It follows that practice-plus-exposure combinations may tap a general learning mechanism that facilitates language acquisition and speech processing. PMID:26328708
Varying irrelevant phonetic features hinders learning of the feature being trained.
Antoniou, Mark; Wong, Patrick C M
2016-01-01
Learning to distinguish nonnative words that differ in a critical phonetic feature can be difficult. Speech training studies typically employ methods that explicitly direct the learner's attention to the relevant nonnative feature to be learned. However, studies on vision have demonstrated that perceptual learning may occur implicitly, by exposing learners to stimulus features, even if they are irrelevant to the task, and it has recently been suggested that this task-irrelevant perceptual learning framework also applies to speech. In this study, subjects took part in a seven-day training regimen to learn to distinguish one of two nonnative features, namely, voice onset time or lexical tone, using explicit training methods consistent with most speech training studies. Critically, half of the subjects were exposed to stimuli that varied not only in the relevant feature, but in the irrelevant feature as well. The results showed that subjects who were trained with stimuli that varied in the relevant feature and held the irrelevant feature constant achieved the best learning outcomes. Varying both features hindered learning and generalization to new stimuli.
Deep Learning Based Binaural Speech Separation in Reverberant Environments.
Zhang, Xueliang; Wang, DeLiang
2017-05-01
Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning problem, and we employ deep learning to map from both spatial and spectral features to a training target. With binaural inputs, we first apply a fixed beamformer and then extract several spectral features. A new spatial feature is proposed and extracted to complement the spectral features. The training target is the recently suggested ideal ratio mask. Systematic evaluations and comparisons show that the proposed system achieves very good separation performance and substantially outperforms related algorithms under challenging multi-source and reverberant environments.
Dimension-Based Statistical Learning Affects Both Speech Perception and Production
ERIC Educational Resources Information Center
Lehet, Matthew; Holt, Lori L.
2017-01-01
Multiple acoustic dimensions signal speech categories. However, dimensions vary in their informativeness; some are more diagnostic of category membership than others. Speech categorization reflects these dimensional regularities such that diagnostic dimensions carry more "perceptual weight" and more effectively signal category membership…
Using the Pecha Kucha Speech to Analyze and Train Humor Skills
ERIC Educational Resources Information Center
Waisanen, Don
2018-01-01
Courses: Public speaking; communication courses requiring speeches. Objective: Students will learn how to apply humor principles to speeches through a slideshow method supportive of this goal, and to become more discerning about the possibilities and pitfalls of humorous communication.
Reilly, Kevin J.; Spencer, Kristie A.
2013-01-01
The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Learning across Languages: Bilingual Experience Supports Dual Language Statistical Word Segmentation
ERIC Educational Resources Information Center
Antovich, Dylan M.; Graf Estes, Katharine
2018-01-01
Bilingual acquisition presents learning challenges beyond those found in monolingual environments, including the need to segment speech in two languages. Infants may use statistical cues, such as syllable-level transitional probabilities, to segment words from fluent speech. In the present study we assessed monolingual and bilingual 14-month-olds'…
ERIC Educational Resources Information Center
Murphy, Harry; Higgins, Eleanor
This final report describes the activities and accomplishments of a 3-year study on the compensatory effectiveness of three assistive technologies, optical character recognition, speech synthesis, and speech recognition, on postsecondary students (N=140) with learning disabilities. These technologies were investigated relative to: (1) immediate…
DECODAGE DE LA CHAINE PARLEE ET APPRENTISSAGE DES LANGUES (SPEECH DECODING AND LANGUAGE LEARNING).
ERIC Educational Resources Information Center
COMPANYS, EMMANUEL
THIS PAPER WRITTEN IN FRENCH, PRESENTS A HYPOTHESIS CONCERNING THE DECODING OF SPEECH IN SECOND LANGUAGE LEARNING. THE THEORETICAL BACKGROUND OF THE DISCUSSION CONSISTS OF WIDELY ACCEPTED LINGUISTIC CONCEPTS SUCH AS THE PHONEME, DISTINCTIVE FEATURES, NEUTRALIZATION, LINGUISTIC LEVELS, FORM AND SUBSTANCE, EXPRESSION AND CONTENT, SOUNDS, PHONEMES,…
Auditory-Visual Speech Integration by Adults with and without Language-Learning Disabilities
ERIC Educational Resources Information Center
Norrix, Linda W.; Plante, Elena; Vance, Rebecca
2006-01-01
Auditory and auditory-visual (AV) speech perception skills were examined in adults with and without language-learning disabilities (LLD). The AV stimuli consisted of congruent consonant-vowel syllables (auditory and visual syllables matched in terms of syllable being produced) and incongruent McGurk syllables (auditory syllable differed from…
Evaluating Automatic Speech Recognition-Based Language Learning Systems: A Case Study
ERIC Educational Resources Information Center
van Doremalen, Joost; Boves, Lou; Colpaert, Jozef; Cucchiarini, Catia; Strik, Helmer
2016-01-01
The purpose of this research was to evaluate a prototype of an automatic speech recognition (ASR)-based language learning system that provides feedback on different aspects of speaking performance (pronunciation, morphology and syntax) to students of Dutch as a second language. We carried out usability reviews, expert reviews and user tests to…
Talker-specific learning in amnesia: Insight into mechanisms of adaptive speech perception.
Trude, Alison M; Duff, Melissa C; Brown-Schmidt, Sarah
2014-05-01
A hallmark of human speech perception is the ability to comprehend speech quickly and effortlessly despite enormous variability across talkers. However, current theories of speech perception do not make specific claims about the memory mechanisms involved in this process. To examine whether declarative memory is necessary for talker-specific learning, we tested the ability of amnesic patients with severe declarative memory deficits to learn and distinguish the accents of two unfamiliar talkers by monitoring their eye-gaze as they followed spoken instructions. Analyses of the time-course of eye fixations showed that amnesic patients rapidly learned to distinguish these accents and tailored perceptual processes to the voice of each talker. These results demonstrate that declarative memory is not necessary for this ability and points to the involvement of non-declarative memory mechanisms. These results are consistent with findings that other social and accommodative behaviors are preserved in amnesia and contribute to our understanding of the interactions of multiple memory systems in the use and understanding of spoken language. Copyright © 2014 Elsevier Ltd. All rights reserved.
Perceptual Learning of Interrupted Speech
Benard, Michel Ruben; Başkent, Deniz
2013-01-01
The intelligibility of periodically interrupted speech improves once the silent gaps are filled with noise bursts. This improvement has been attributed to phonemic restoration, a top-down repair mechanism that helps intelligibility of degraded speech in daily life. Two hypotheses were investigated using perceptual learning of interrupted speech. If different cognitive processes played a role in restoring interrupted speech with and without filler noise, the two forms of speech would be learned at different rates and with different perceived mental effort. If the restoration benefit were an artificial outcome of using the ecologically invalid stimulus of speech with silent gaps, this benefit would diminish with training. Two groups of normal-hearing listeners were trained, one with interrupted sentences with the filler noise, and the other without. Feedback was provided with the auditory playback of the unprocessed and processed sentences, as well as the visual display of the sentence text. Training increased the overall performance significantly, however restoration benefit did not diminish. The increase in intelligibility and the decrease in perceived mental effort were relatively similar between the groups, implying similar cognitive mechanisms for the restoration of the two types of interruptions. Training effects were generalizable, as both groups improved their performance also with the other form of speech than that they were trained with, and retainable. Due to null results and relatively small number of participants (10 per group), further research is needed to more confidently draw conclusions. Nevertheless, training with interrupted speech seems to be effective, stimulating participants to more actively and efficiently use the top-down restoration. This finding further implies the potential of this training approach as a rehabilitative tool for hearing-impaired/elderly populations. PMID:23469266
ERIC Educational Resources Information Center
Kastner, Itamar; Adriaans, Frans
2018-01-01
Statistical learning is often taken to lie at the heart of many cognitive tasks, including the acquisition of language. One particular task in which probabilistic models have achieved considerable success is the segmentation of speech into words. However, these models have mostly been tested against English data, and as a result little is known…
Effects of Model Characteristics on Observational Learning of Inmates in a Pre-Release Center.
ERIC Educational Resources Information Center
Fliegel, Alan B.
Subjects were 138 inmates from the pre-release unit of a Southwestern prison system, randomly divided into three groups of 46 each. Each group viewed a video-taped model delivering a speech. The independent variable had three levels: (1) lecturer attired in a shirt and tie; (2) lecturer attired in a correctional officer's uniform; and (3) model…
Díaz, Begoña; Baus, Cristina; Escera, Carles; Costa, Albert; Sebastián-Gallés, Núria
2008-01-01
Human beings differ in their ability to master the sounds of their second language (L2). Phonetic training studies have proposed that differences in phonetic learning stem from differences in psychoacoustic abilities rather than speech-specific capabilities. We aimed at finding the origin of individual differences in L2 phonetic acquisition in natural learning contexts. We consider two alternative explanations: a general psychoacoustic origin vs. a speech-specific one. For this purpose, event-related potentials (ERPs) were recorded from two groups of early, proficient Spanish-Catalan bilinguals who differed in their mastery of the Catalan (L2) phonetic contrast /e-ε/. Brain activity in response to acoustic change detection was recorded in three different conditions involving tones of different length (duration condition), frequency (frequency condition), and presentation order (pattern condition). In addition, neural correlates of speech change detection were also assessed for both native (/o/-/e/) and nonnative (/o/-/ö/) phonetic contrasts (speech condition). Participants' discrimination accuracy, reflected electrically as a mismatch negativity (MMN), was similar between the two groups of participants in the three acoustic conditions. Conversely, the MMN was reduced in poor perceivers (PP) when they were presented with speech sounds. Therefore, our results support a speech-specific origin of individual variability in L2 phonetic mastery. PMID:18852470
Perceptual learning for speech in noise after application of binary time-frequency masks
Ahmadi, Mahnaz; Gross, Vauna L.; Sinex, Donal G.
2013-01-01
Ideal time-frequency (TF) masks can reject noise and improve the recognition of speech-noise mixtures. An ideal TF mask is constructed with prior knowledge of the target speech signal. The intelligibility of a processed speech-noise mixture depends upon the threshold criterion used to define the TF mask. The study reported here assessed the effect of training on the recognition of speech in noise after processing by ideal TF masks that did not restore perfect speech intelligibility. Two groups of listeners with normal hearing listened to speech-noise mixtures processed by TF masks calculated with different threshold criteria. For each group, a threshold criterion that initially produced word recognition scores between 0.56–0.69 was chosen for training. Listeners practiced with one set of TF-masked sentences until their word recognition performance approached asymptote. Perceptual learning was quantified by comparing word-recognition scores in the first and last training sessions. Word recognition scores improved with practice for all listeners with the greatest improvement observed for the same materials used in training. PMID:23464038
Brain dynamics that correlate with effects of learning on auditory distance perception.
Wisniewski, Matthew G; Mercado, Eduardo; Church, Barbara A; Gramann, Klaus; Makeig, Scott
2014-01-01
Accuracy in auditory distance perception can improve with practice and varies for sounds differing in familiarity. Here, listeners were trained to judge the distances of English, Bengali, and backwards speech sources pre-recorded at near (2-m) and far (30-m) distances. Listeners' accuracy was tested before and after training. Improvements from pre-test to post-test were greater for forward speech, demonstrating a learning advantage for forward speech sounds. Independent component (IC) processes identified in electroencephalographic (EEG) data collected during pre- and post-testing revealed three clusters of ICs across subjects with stimulus-locked spectral perturbations related to learning and accuracy. One cluster exhibited a transient stimulus-locked increase in 4-8 Hz power (theta event-related synchronization; ERS) that was smaller after training and largest for backwards speech. For a left temporal cluster, 8-12 Hz decreases in power (alpha event-related desynchronization; ERD) were greatest for English speech and less prominent after training. In contrast, a cluster of IC processes centered at or near anterior portions of the medial frontal cortex showed learning-related enhancement of sustained increases in 10-16 Hz power (upper-alpha/low-beta ERS). The degree of this enhancement was positively correlated with the degree of behavioral improvements. Results suggest that neural dynamics in non-auditory cortical areas support distance judgments. Further, frontal cortical networks associated with attentional and/or working memory processes appear to play a role in perceptual learning for source distance.
Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina
2014-12-01
In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.
Nigam, Ravi; Schlosser, Ralf W; Lloyd, Lyle L
2006-09-01
Matrix strategies employing parts of speech arranged in systematic language matrices and milieu language teaching strategies have been successfully used to teach word combining skills to children who have cognitive disabilities and some functional speech. The present study investigated the acquisition and generalized production of two-term semantic relationships in a new population using new types of symbols. Three children with cognitive disabilities and little or no functional speech were taught to combine graphic symbols. The matrix strategy and the mand-model procedure were used concomitantly as intervention procedures. A multiple probe design across sets of action-object combinations with generalization probes of untrained combinations was used to teach the production of graphic symbol combinations. Results indicated that two of the three children learned the early syntactic-semantic rule of combining action-object symbols and demonstrated generalization to untrained action-object combinations and generalization across trainers. The results and future directions for research are discussed.
Modeling the Perceptual Learning of Novel Dialect Features
ERIC Educational Resources Information Center
Tatman, Rachael
2017-01-01
All language use reflects the user's social identity in systematic ways. While humans can easily adapt to this sociolinguistic variation, automatic speech recognition (ASR) systems continue to struggle with it. This dissertation makes three main contributions. The first is to provide evidence that modern state-of-the-art commercial ASR systems…
Facilitating Comprehension and Processing of Language in Classroom and Clinic.
ERIC Educational Resources Information Center
Lasky, Elaine Z.
A speech/language remediation-intervention model is proposed to enhance processing of auditory information in students with language or learning disabilities. Such children have difficulty attending to language signals (verbal and nonverbal responses ranging from facial expressions and gestures to those requiring the generation of complex…
Collaboration around Facilitating Emergent Literacy: Role of Occupational Therapy
ERIC Educational Resources Information Center
Asher, Asha; Nichols, Joy D.
2016-01-01
The article uses a case study to illustrate transdisciplinary perspectives on facilitating emergent literacy skills of Elsa, a primary grade student with autism. The study demonstrates how a professional learning community implemented motor, sensory, and speech/language components to generate a classroom model supporting emergent literacy skills.…
Achieving Developmental Synchrony in Young Children With Hearing Loss
Mellon, Nancy K.; Ouellette, Meredith; Greer, Tracy; Gates-Ulanet, Patricia
2009-01-01
Children with hearing loss, with early and appropriate amplification and intervention, demonstrate gains in speech, language, and literacy skills. Despite these improvements many children continue to exhibit disturbances in cognitive, behavioral, and emotional control, self-regulation, and aspects of executive function. Given the complexity of developmental learning, educational settings should provide services that foster the growth of skills across multiple dimensions. Transdisciplinary intervention services that target the domains of language, communication, psychosocial functioning, motor, and cognitive development can promote academic and social success. Educational programs must provide children with access to the full range of basic skills necessary for academic and social achievement. In addition to an integrated curriculum that nurtures speech, language, and literacy development, innovations in the areas of auditory perception, social emotional learning, motor development, and vestibular function can enhance student outcomes. Through ongoing evaluation and modification, clearly articulated curricular approaches can serve as a model for early intervention and special education programs. The purpose of this article is to propose an intervention model that combines best practices from a variety of disciplines that affect developmental outcomes for young children with hearing loss, along with specific strategies and approaches that may help to promote optimal development across domains. Access to typically developing peers who model age-appropriate skills in language and behavior, small class sizes, a co-teaching model, and a social constructivist perspective of teaching and learning, are among the key elements of the model. PMID:20150187
Agarwalla, Swapna; Sarma, Kandarpa Kumar
2016-06-01
Automatic Speaker Recognition (ASR) and related issues are continuously evolving as inseparable elements of Human Computer Interaction (HCI). With assimilation of emerging concepts like big data and Internet of Things (IoT) as extended elements of HCI, ASR techniques are found to be passing through a paradigm shift. Oflate, learning based techniques have started to receive greater attention from research communities related to ASR owing to the fact that former possess natural ability to mimic biological behavior and that way aids ASR modeling and processing. The current learning based ASR techniques are found to be evolving further with incorporation of big data, IoT like concepts. Here, in this paper, we report certain approaches based on machine learning (ML) used for extraction of relevant samples from big data space and apply them for ASR using certain soft computing techniques for Assamese speech with dialectal variations. A class of ML techniques comprising of the basic Artificial Neural Network (ANN) in feedforward (FF) and Deep Neural Network (DNN) forms using raw speech, extracted features and frequency domain forms are considered. The Multi Layer Perceptron (MLP) is configured with inputs in several forms to learn class information obtained using clustering and manual labeling. DNNs are also used to extract specific sentence types. Initially, from a large storage, relevant samples are selected and assimilated. Next, a few conventional methods are used for feature extraction of a few selected types. The features comprise of both spectral and prosodic types. These are applied to Recurrent Neural Network (RNN) and Fully Focused Time Delay Neural Network (FFTDNN) structures to evaluate their performance in recognizing mood, dialect, speaker and gender variations in dialectal Assamese speech. The system is tested under several background noise conditions by considering the recognition rates (obtained using confusion matrices and manually) and computation time. It is found that the proposed ML based sentence extraction techniques and the composite feature set used with RNN as classifier outperform all other approaches. By using ANN in FF form as feature extractor, the performance of the system is evaluated and a comparison is made. Experimental results show that the application of big data samples has enhanced the learning of the ASR system. Further, the ANN based sample and feature extraction techniques are found to be efficient enough to enable application of ML techniques in big data aspects as part of ASR systems. Copyright © 2015 Elsevier Ltd. All rights reserved.
Yoo, Sejin; Chung, Jun-Young; Jeon, Hyeon-Ae; Lee, Kyoung-Min; Kim, Young-Bo; Cho, Zang-Hee
2012-07-01
Speech production is inextricably linked to speech perception, yet they are usually investigated in isolation. In this study, we employed a verbal-repetition task to identify the neural substrates of speech processing with two ends active simultaneously using functional MRI. Subjects verbally repeated auditory stimuli containing an ambiguous vowel sound that could be perceived as either a word or a pseudoword depending on the interpretation of the vowel. We found verbal repetition commonly activated the audition-articulation interface bilaterally at Sylvian fissures and superior temporal sulci. Contrasting word-versus-pseudoword trials revealed neural activities unique to word repetition in the left posterior middle temporal areas and activities unique to pseudoword repetition in the left inferior frontal gyrus. These findings imply that the tasks are carried out using different speech codes: an articulation-based code of pseudowords and an acoustic-phonetic code of words. It also supports the dual-stream model and imitative learning of vocabulary. Copyright © 2012 Elsevier Inc. All rights reserved.
Morrison, Amanda S.; Brozovich, Faith A.; Lee, Ihno A.; Jazaieri, Hooria; Goldin, Philippe R.; Heimberg, Richard G.; Gross, James J.
2016-01-01
The subjective experience of anxiety plays a central role in cognitive behavioral models of social anxiety disorder (SAD). However, much remains to be learned about the temporal dynamics of anxiety elicited by feared social situations. The aims of the current study were: 1) to compare anxiety trajectories during a speech task in individuals with SAD (n = 135) versus healthy controls (HCs; n = 47), and 2) to compare the effects of CBT on anxiety trajectories with a waitlist control condition. SAD was associated with higher levels of anxiety and greater increases in anticipatory anxiety compared to HCs, but not differential change in anxiety from pre- to post-speech. CBT was associated with decreases in anxiety from pre- to post-speech but not with changes in absolute levels of anticipatory anxiety or rates of change in anxiety during anticipation. The findings suggest that anticipatory experiences should be further incorporated into exposures. PMID:26760456
Morrison, Amanda S; Brozovich, Faith A; Lee, Ihno A; Jazaieri, Hooria; Goldin, Philippe R; Heimberg, Richard G; Gross, James J
2016-03-01
The subjective experience of anxiety plays a central role in cognitive behavioral models of social anxiety disorder (SAD). However, much remains to be learned about the temporal dynamics of anxiety elicited by feared social situations. The aims of the current study were: (1) to compare anxiety trajectories during a speech task in individuals with SAD (n=135) versus healthy controls (HCs; n=47), and (2) to compare the effects of CBT on anxiety trajectories with a waitlist control condition. SAD was associated with higher levels of anxiety and greater increases in anticipatory anxiety compared to HCs, but not differential change in anxiety from pre- to post-speech. CBT was associated with decreases in anxiety from pre- to post-speech but not with changes in absolute levels of anticipatory anxiety or rates of change in anxiety during anticipation. The findings suggest that anticipatory experiences should be further incorporated into exposures. Copyright © 2015 Elsevier Ltd. All rights reserved.
Orr, Elizabeth M J; Moscovitch, David A
2010-08-01
Video feedback (VF) with cognitive preparation (CP) has been widely integrated into cognitive-behavioral therapy (CBT) protocols for social anxiety disorder (SAD) due to its presumed efficacy in improving negative self-perception. However, previous experimental studies have demonstrated that improvements in negative self-perception via VF+CP do not typically facilitate anxiety reduction during subsequent social interactions - a troubling finding for proponents of cognitive models of social anxiety. We examined whether VF+CP could be optimized to enhance participants' processing of corrective self-related information through the addition of a post-VF cognitive review (CR). Sixty-eight socially anxious individuals were randomly assigned to perform two public speeches in one of the following conditions: a) exposure alone (EXP); b) CP+VF; and c) CP+VF+CR. Those in the CP+VF+CR condition demonstrated marginally significant reductions in anxiety from speech 1 to speech 2 relative to those who received EXP - an improvement not shown for those in the CP+VF condition. Furthermore, only those who received CP+VF+CR demonstrated significant improvements in self-perception and performance expectations relative to EXP. Decreases in anxiety among participants who received CP+VF+CR relative to EXP were fully mediated by improvements in self-perception. Implications are discussed in the context of cognitive models of social anxiety and mechanisms of exposure-based learning. Copyright 2010 Elsevier Ltd. All rights reserved.
McAllister, Sue; Lincoln, Michelle; Ferguson, Allison; McAllister, Lindy
2013-01-01
Valid assessment of health science students' ability to perform in the real world of workplace practice is critical for promoting quality learning and ultimately certifying students as fit to enter the world of professional practice. Current practice in performance assessment in the health sciences field has been hampered by multiple issues regarding assessment content and process. Evidence for the validity of scores derived from assessment tools are usually evaluated against traditional validity categories with reliability evidence privileged over validity, resulting in the paradoxical effect of compromising the assessment validity and learning processes the assessments seek to promote. Furthermore, the dominant statistical approaches used to validate scores from these assessments fall under the umbrella of classical test theory approaches. This paper reports on the successful national development and validation of measures derived from an assessment of Australian speech pathology students' performance in the workplace. Validation of these measures considered each of Messick's interrelated validity evidence categories and included using evidence generated through Rasch analyses to support score interpretation and related action. This research demonstrated that it is possible to develop an assessment of real, complex, work based performance of speech pathology students, that generates valid measures without compromising the learning processes the assessment seeks to promote. The process described provides a model for other health professional education programs to trial.
K-5 Educators' Perceptions of the Role of Speech Language Pathologists
ERIC Educational Resources Information Center
Hatcher, Karmon D.
2017-01-01
Rarely is a school-based speech language pathologist (SLP) thought of as an active contributor to the achievement of students or to the learning community in general. Researchers have found benefits for students when members of the learning community collaborate, and the SLP should be a part of this community collaboration. This qualitative case…
Personalized Messages That Promote Science Learning in Virtual Environments
ERIC Educational Resources Information Center
Moreno, Roxana; Mayer, Richard E.
2004-01-01
College students learned how to design the roots, stem, and leaves of plants to survive in five different virtual reality environments through an agent-based multimedia educational game. For each student, the agent used personalized speech (e.g., including I and you) or nonpersonalized speech (e.g., 3rd-person monologue), and the game was…
Effects of Attention Cueing on Learning Speech Organ Operation through Mobile Phones
ERIC Educational Resources Information Center
Yang, Hui-Yu
2017-01-01
The studies regarding using a cross sectional view of speech organs enriched with attention cueing and written text to probe learners' learning efficiency and behavior through mobile phones is scant. The purpose of this study was to examine whether the presence of attention cueing can benefit learners with different amounts of prior knowledge in…
Machine Learning Through Signature Trees. Applications to Human Speech.
ERIC Educational Resources Information Center
White, George M.
A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…
ERIC Educational Resources Information Center
Liu, Ran; Holt, Lori L.
2011-01-01
Native language experience plays a critical role in shaping speech categorization, but the exact mechanisms by which it does so are not well understood. Investigating category learning of nonspeech sounds with which listeners have no prior experience allows their experience to be systematically controlled in a way that is impossible to achieve by…
Integrating Text-to-Speech Software into Pedagogically Sound Teaching and Learning Scenarios
ERIC Educational Resources Information Center
Rughooputh, S. D. D. V.; Santally, M. I.
2009-01-01
This paper presents a new technique of delivery of classes--an instructional technique which will no doubt revolutionize the teaching and learning, whether for on-campus, blended or online modules. This is based on the simple task of instructionally incorporating text-to-speech software embedded in the lecture slides that will simulate exactly the…
User Experience of a Mobile Speaking Application with Automatic Speech Recognition for EFL Learning
ERIC Educational Resources Information Center
Ahn, Tae youn; Lee, Sangmin-Michelle
2016-01-01
With the spread of mobile devices, mobile phones have enormous potential regarding their pedagogical use in language education. The goal of this study is to analyse user experience of a mobile-based learning system that is enhanced by speech recognition technology for the improvement of EFL (English as a foreign language) learners' speaking…
ERIC Educational Resources Information Center
Smalle, Eleonore H. M.; Muylle, Merel; Szmalec, Arnaud; Duyck, Wouter
2017-01-01
Speech errors typically respect the speaker's implicit knowledge of language-wide phonotactics (e.g., /t/ cannot be a syllable onset in the English language). Previous work demonstrated that adults can learn novel experimentally induced phonotactic constraints by producing syllable strings in which the allowable position of a phoneme depends on…
Learning L2 Pronunciation with a Mobile Speech Recognizer: French /y/
ERIC Educational Resources Information Center
Liakin, Denis; Cardoso, Walcir; Liakina, Natallia
2015-01-01
This study investigates the acquisition of the L2 French vowel /y/ in a mobile-assisted learning environment, via the use of automatic speech recognition (ASR). Particularly, it addresses the question of whether ASR-based pronunciation instruction using a mobile device can improve the production and perception of French /y/. Forty-two elementary…
Visual Speech Perception in Children with Language Learning Impairments
ERIC Educational Resources Information Center
Knowland, Victoria C. P.; Evans, Sam; Snell, Caroline; Rosen, Stuart
2016-01-01
Purpose: The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method: In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with…
I Karipidis, Iliana; Pleisch, Georgette; Röthlisberger, Martina; Hofstetter, Christoph; Dornbierer, Dario; Stämpfli, Philipp; Brem, Silvia
2017-02-01
Learning letter-speech sound correspondences is a major step in reading acquisition and is severely impaired in children with dyslexia. Up to now, it remains largely unknown how quickly neural networks adopt specific functions during audiovisual integration of linguistic information when prereading children learn letter-speech sound correspondences. Here, we simulated the process of learning letter-speech sound correspondences in 20 prereading children (6.13-7.17 years) at varying risk for dyslexia by training artificial letter-speech sound correspondences within a single experimental session. Subsequently, we acquired simultaneously event-related potentials (ERP) and functional magnetic resonance imaging (fMRI) scans during implicit audiovisual presentation of trained and untrained pairs. Audiovisual integration of trained pairs correlated with individual learning rates in right superior temporal, left inferior temporal, and bilateral parietal areas and with phonological awareness in left temporal areas. In correspondence, a differential left-lateralized parietooccipitotemporal ERP at 400 ms for trained pairs correlated with learning achievement and familial risk. Finally, a late (650 ms) posterior negativity indicating audiovisual congruency of trained pairs was associated with increased fMRI activation in the left occipital cortex. Taken together, a short (<30 min) letter-speech sound training initializes audiovisual integration in neural systems that are responsible for processing linguistic information in proficient readers. To conclude, the ability to learn grapheme-phoneme correspondences, the familial history of reading disability, and phonological awareness of prereading children account for the degree of audiovisual integration in a distributed brain network. Such findings on emerging linguistic audiovisual integration could allow for distinguishing between children with typical and atypical reading development. Hum Brain Mapp 38:1038-1055, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
1985-10-01
speech errors. References Anderson, V. A. (1942). Training the speaking voice. New York: Oxford University Press. 50...is only about speech perception , in contrast to some t.at deal with other perceptual processes (e.g., Berkeley, 1709; Fest- inger, Burnham, Ono...there a process of learned equivalence. An example is the claim that the 66 * ° - . . Liberman & Mattingly: The Motor Theory of Speech Perception Revised
Differences in Talker Recognition by Preschoolers and Adults
ERIC Educational Resources Information Center
Creel, Sarah C.; Jimenez, Sofia R.
2012-01-01
Talker variability in speech influences language processing from infancy through adulthood and is inextricably embedded in the very cues that identify speech sounds. Yet little is known about developmental changes in the processing of talker information. On one account, children have not yet learned to separate speech sound variability from…
The Learning of Complex Speech Act Behaviour.
ERIC Educational Resources Information Center
Olshtain, Elite; Cohen, Andrew
1990-01-01
Pre- and posttraining measurement of adult English-as-a-Second-Language learners' (N=18) apology speech act behavior found no clear-cut quantitative improvement after training, although there was an obvious qualitative approximation of native-like speech act behavior in terms of types of intensification and downgrading, choice of strategy, and…
Developmental Differences in Speech Act Recognition: A Pragmatic Awareness Study
ERIC Educational Resources Information Center
Garcia, Paula
2004-01-01
With the growing acknowledgement of the importance of pragmatic competence in second language (L2) learning, language researchers have identified the comprehension of speech acts as they occur in natural conversation as essential to communicative competence (e.g. Bardovi-Harlig, 2001; Thomas, 1983). Nonconventional indirect speech acts are formed…
Reflections on Clinical Learning in Novice Speech-Language Therapy Students
ERIC Educational Resources Information Center
Hill, Anne E.; Davidson, Bronwyn J.; Theodoros, Deborah G.
2012-01-01
Background: Reflective practice is reported to enhance clinical reasoning and therefore to maximize client outcomes. The inclusion of targeted reflective practice in academic programmes in speech-language therapy has not been consistent, although providing opportunities for speech-language therapy students to reflect during their clinical practice…
The Neural Basis of Speech Parsing in Children and Adults
ERIC Educational Resources Information Center
McNealy, Kristin; Mazziotta, John C.; Dapretto, Mirella
2010-01-01
Word segmentation, detecting word boundaries in continuous speech, is a fundamental aspect of language learning that can occur solely by the computation of statistical and speech cues. Fifty-four children underwent functional magnetic resonance imaging (fMRI) while listening to three streams of concatenated syllables that contained either high…
Ultrasound visual feedback treatment and practice variability for residual speech sound errors
Preston, Jonathan L.; McCabe, Patricia; Rivera-Campos, Ahmed; Whittle, Jessica L.; Landry, Erik; Maas, Edwin
2014-01-01
Purpose The goals were to (1) test the efficacy of a motor-learning based treatment that includes ultrasound visual feedback for individuals with residual speech sound errors, and (2) explore whether the addition of prosodic cueing facilitates speech sound learning. Method A multiple baseline single subject design was used, replicated across 8 participants. For each participant, one sound context was treated with ultrasound plus prosodic cueing for 7 sessions, and another sound context was treated with ultrasound but without prosodic cueing for 7 sessions. Sessions included ultrasound visual feedback as well as non-ultrasound treatment. Word-level probes assessing untreated words were used to evaluate retention and generalization. Results For most participants, increases in accuracy of target sound contexts at the word level were observed with the treatment program regardless of whether prosodic cueing was included. Generalization between onset singletons and clusters was observed, as well as generalization to sentence-level accuracy. There was evidence of retention during post-treatment probes, including at a two-month follow-up. Conclusions A motor-based treatment program that includes ultrasound visual feedback can facilitate learning of speech sounds in individuals with residual speech sound errors. PMID:25087938
Loo, Jenny Hooi Yin; Bamiou, Doris-Eva; Campbell, Nicci; Luxon, Linda M
2010-08-01
This article reviews the evidence for computer-based auditory training (CBAT) in children with language, reading, and related learning difficulties, and evaluates the extent it can benefit children with auditory processing disorder (APD). Searches were confined to studies published between 2000 and 2008, and they are rated according to the level of evidence hierarchy proposed by the American Speech-Language Hearing Association (ASHA) in 2004. We identified 16 studies of two commercially available CBAT programs (13 studies of Fast ForWord (FFW) and three studies of Earobics) and five further outcome studies of other non-speech and simple speech sounds training, available for children with language, learning, and reading difficulties. The results suggest that, apart from the phonological awareness skills, the FFW and Earobics programs seem to have little effect on the language, spelling, and reading skills of children. Non-speech and simple speech sounds training may be effective in improving children's reading skills, but only if it is delivered by an audio-visual method. There is some initial evidence to suggest that CBAT may be of benefit for children with APD. Further research is necessary, however, to substantiate these preliminary findings.
Information from multiple modalities helps 5-month-olds learn abstract rules.
Frank, Michael C; Slemmer, Jonathan A; Marcus, Gary F; Johnson, Scott P
2009-07-01
By 7 months of age, infants are able to learn rules based on the abstract relationships between stimuli (Marcus et al., 1999), but they are better able to do so when exposed to speech than to some other classes of stimuli. In the current experiments we ask whether multimodal stimulus information will aid younger infants in identifying abstract rules. We habituated 5-month-olds to simple abstract patterns (ABA or ABB) instantiated in coordinated looming visual shapes and speech sounds (Experiment 1), shapes alone (Experiment 2), and speech sounds accompanied by uninformative but coordinated shapes (Experiment 3). Infants showed evidence of rule learning only in the presence of the informative multimodal cues. We hypothesize that the additional evidence present in these multimodal displays was responsible for the success of younger infants in learning rules, congruent with both a Bayesian account and with the Intersensory Redundancy Hypothesis.
Hill, Anne E; Davidson, Bronwyn J; Theodoros, Deborah G
2010-06-01
The use of standardized patients has been reported as a viable addition to traditional models of professional practice education in medicine, nursing and allied health programs. Educational programs rely on the inclusion of work-integrated learning components in order to graduate competent practitioners. Allied health programs world-wide have reported increasing difficulty in attaining sufficient traditional placements for students within the workplace. In response to this, allied health professionals are challenged to be innovative and problem-solving in the development and maintenance of clinical education placements and to consider potential alternative learning opportunities for students. Whilst there is a bank of literature describing the use of standardized patients in medicine and nursing, reports of its use in speech-language pathology clinical education are limited. Therefore, this paper aims to (1) provide a review of literature reporting on the use of standardized patients within medical and allied health professions with particular reference to use in speech-language pathology, (2) discuss methodological and practical issues involved in establishing and maintaining a standardized patient program and (3) identify future directions for research and clinical programs using standardized patients to build foundation clinical skills such as communication, interpersonal interaction and interviewing.
Phonetic diversity, statistical learning, and acquisition of phonology.
Pierrehumbert, Janet B
2003-01-01
In learning to perceive and produce speech, children master complex language-specific patterns. Daunting language-specific variation is found both in the segmental domain and in the domain of prosody and intonation. This article reviews the challenges posed by results in phonetic typology and sociolinguistics for the theory of language acquisition. It argues that categories are initiated bottom-up from statistical modes in use of the phonetic space, and sketches how exemplar theory can be used to model the updating of categories once they are initiated. It also argues that bottom-up initiation of categories is successful thanks to the perception-production loop operating in the speech community. The behavior of this loop means that the superficial statistical properties of speech available to the infant indirectly reflect the contrastiveness and discriminability of categories in the adult grammar. The article also argues that the developing system is refined using internal feedback from type statistics over the lexicon, once the lexicon is well-developed. The application of type statistics to a system initiated with surface statistics does not cause a fundamental reorganization of the system. Instead, it exploits confluences across levels of representation which characterize human language and make bootstrapping possible.
Directive Speech Act of Imamu in Katoba Discourse of Muna Ethnic
NASA Astrophysics Data System (ADS)
Ardianto, Ardianto; Hadirman, Hardiman
2018-05-01
One of the traditions of Muna ethnic is katoba ritual. Katoba ritual is one tradition that values local knowledge maintained its existence for generations until today. Katoba ritual is a ritual to be Islamic person, repentance, and the formation of a child's character (male/female) who will enter adulthood (6-11 years) using directive speech. In katoba ritual, a child who is in-katoba introduced to the teaching of the Islamic religion, customs, manners to parents and his brother and behaviour towards others which is expected to be implemented in daily life. This study aims to describe and explain the directive speech acts of the imamu in the katoba discourse of Muna ethnic. This research uses a qualitative approach. Data are collected from a natural setting, namely katoba speech discourses. The data consist of two types, namely: (a) speech data, and (b) field note data. Data are analyzed using an interactive model with four stages: (1) data collection, (2) data reduction, (3) data display, and (4) conclusion and verification. The result shows, firstly, the form of directive speech acts includes declarative and imperative form; secondly, the function of directive speech acts includes functions of teaching, explaining, suggesting, and expecting; and thirdly, the strategy of directive speech acts includes both direct and indirect strategy. The results of this study could be implied in the development of character learning materials at schools. It also can be one of the contents of local content (mulok) at school.
SchNet - A deep learning architecture for molecules and materials
NASA Astrophysics Data System (ADS)
Schütt, K. T.; Sauceda, H. E.; Kindermans, P.-J.; Tkatchenko, A.; Müller, K.-R.
2018-06-01
Deep learning has led to a paradigm shift in artificial intelligence, including web, text, and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning, in general, and deep learning, in particular, are ideally suitable for representing quantum-mechanical interactions, enabling us to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for molecules and materials, where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study on the quantum-mechanical properties of C20-fullerene that would have been infeasible with regular ab initio molecular dynamics.
An empirical generative framework for computational modeling of language acquisition.
Waterfall, Heidi R; Sandbank, Ben; Onnis, Luca; Edelman, Shimon
2010-06-01
This paper reports progress in developing a computer model of language acquisition in the form of (1) a generative grammar that is (2) algorithmically learnable from realistic corpus data, (3) viable in its large-scale quantitative performance and (4) psychologically real. First, we describe new algorithmic methods for unsupervised learning of generative grammars from raw CHILDES data and give an account of the generative performance of the acquired grammars. Next, we summarize findings from recent longitudinal and experimental work that suggests how certain statistically prominent structural properties of child-directed speech may facilitate language acquisition. We then present a series of new analyses of CHILDES data indicating that the desired properties are indeed present in realistic child-directed speech corpora. Finally, we suggest how our computational results, behavioral findings, and corpus-based insights can be integrated into a next-generation model aimed at meeting the four requirements of our modeling framework.
2000-01-01
for flight test data, and both generic and specialized tools of data filtering , data calibration, modeling , system identification, and simulation...GRAMMATICAL MODEL AND PARSER FOR AIR TRAFFIC CONTROLLER’S COMMANDS 11 A SPEECH-CONTROLLED INTERACTIVE VIRTUAL ENVIRONMENT FOR SHIP FAMILIARIZATION 12... MODELING AND SIMULATION IN THE 21ST CENTURY 23 NEW COTS HARDWARE AND SOFTWARE REDUCE THE COST AND EFFORT IN REPLACING AGING FLIGHT SIMULATORS SUBSYSTEMS
François, Clément; Schön, Daniele
2014-02-01
There is increasing evidence that humans and other nonhuman mammals are sensitive to the statistical structure of auditory input. Indeed, neural sensitivity to statistical regularities seems to be a fundamental biological property underlying auditory learning. In the case of speech, statistical regularities play a crucial role in the acquisition of several linguistic features, from phonotactic to more complex rules such as morphosyntactic rules. Interestingly, a similar sensitivity has been shown with non-speech streams: sequences of sounds changing in frequency or timbre can be segmented on the sole basis of conditional probabilities between adjacent sounds. We recently ran a set of cross-sectional and longitudinal experiments showing that merging music and speech information in song facilitates stream segmentation and, further, that musical practice enhances sensitivity to statistical regularities in speech at both neural and behavioral levels. Based on recent findings showing the involvement of a fronto-temporal network in speech segmentation, we defend the idea that enhanced auditory learning observed in musicians originates via at least three distinct pathways: enhanced low-level auditory processing, enhanced phono-articulatory mapping via the left Inferior Frontal Gyrus and Pre-Motor cortex and increased functional connectivity within the audio-motor network. Finally, we discuss how these data predict a beneficial use of music for optimizing speech acquisition in both normal and impaired populations. Copyright © 2013 Elsevier B.V. All rights reserved.
Articulation of sounds in Serbian language in patients who learned esophageal speech successfully.
Vekić, Maja; Veselinović, Mila; Mumović, Gordana; Mitrović, Slobodan M
2014-01-01
Articulation of pronounced sounds during the training and subsequent use of esophageal speech is very important because it contributes significantly to intelligibility and aesthetics of spoken words and sentences, as well as of speech and language itself. The aim of this research was to determine the quality of articulation of sounds of Serbian language by groups of sounds in patients who had learned esophageal speech successfully as well as the effect of age and tooth loss on the quality of articulation. This retrospective-prospective study included 16 patients who had undergone total laryngectomy. Having completed the rehabilitation of speech, these patient used esophageal voice and speech. The quality of articulation was tested by the "Global test of articulation." Esophageal speech was rated with grade 5, 4 and 3 in 62.5%, 31.3% and one patient, respectively. Serbian was the native language of all the patients. The study included 30 sounds of Serbian language in 16 subjects (480 total sounds). Only two patients (12.5%) articulated all sounds properly, whereas 87.5% of them had incorrect articulation. The articulation of affricates and fricatives, especially sound /h/ from the group of the fricatives, was found to be the worst in the patients who had successfully mastered esophageal speech. The age and the tooth loss of patients who have mastered esophageal speech do not affect the articulation of sounds in Serbian language.
Doubé, Wendy; Carding, Paul; Flanagan, Kieran; Kaufman, Jordy; Armitage, Hannah
2018-01-01
Children with speech sound disorders benefit from feedback about the accuracy of sounds they make. Home practice can reinforce feedback received from speech pathologists. Games in mobile device applications could encourage home practice, but those currently available are of limited value because they are unlikely to elaborate "Correct"/"Incorrect" feedback with information that can assist in improving the accuracy of the sound. This protocol proposes a "Wizard of Oz" experiment that aims to provide evidence for the provision of effective multimedia feedback for speech sound development. Children with two common speech sound disorders will play a game on a mobile device and make speech sounds when prompted by the game. A human "Wizard" will provide feedback on the accuracy of the sound but the children will perceive the feedback as coming from the game. Groups of 30 young children will be randomly allocated to one of five conditions: four types of feedback and a control which does not play the game. The results of this experiment will inform not only speech sound therapy, but also other types of language learning, both in general, and in multimedia applications. This experiment is a cost-effective precursor to the development of a mobile application that employs pedagogically and clinically sound processes for speech development in young children.
Doubé, Wendy; Carding, Paul; Flanagan, Kieran; Kaufman, Jordy; Armitage, Hannah
2018-01-01
Children with speech sound disorders benefit from feedback about the accuracy of sounds they make. Home practice can reinforce feedback received from speech pathologists. Games in mobile device applications could encourage home practice, but those currently available are of limited value because they are unlikely to elaborate “Correct”/”Incorrect” feedback with information that can assist in improving the accuracy of the sound. This protocol proposes a “Wizard of Oz” experiment that aims to provide evidence for the provision of effective multimedia feedback for speech sound development. Children with two common speech sound disorders will play a game on a mobile device and make speech sounds when prompted by the game. A human “Wizard” will provide feedback on the accuracy of the sound but the children will perceive the feedback as coming from the game. Groups of 30 young children will be randomly allocated to one of five conditions: four types of feedback and a control which does not play the game. The results of this experiment will inform not only speech sound therapy, but also other types of language learning, both in general, and in multimedia applications. This experiment is a cost-effective precursor to the development of a mobile application that employs pedagogically and clinically sound processes for speech development in young children. PMID:29674986
Selective attention: psi performance in children with learning disabilities.
Garcia, Vera Lúcia; Pereira, Liliane Desgualdo; Fukuda, Yotaka
2007-01-01
Selective attention is essential for learning how to write and read. The objective of this study was to examine the process of selective auditory attention in children with learning disabilities. Group I included forty subjects aged between 9 years and six months and 10 years and eleven months, who had a low risk of altered hearing, language and learning development. Group II included 20 subjects aged between 9 years and five months and 11 years and ten months, who presented learning disabilities. A prospective study was done using the Pediatric Speech Intelligibility Test (PSI). Right ear PSI with an ipsilateral competing message at speech/noise ratios of 0 and -10 was sufficient to differentiate Group I and Group II. Special attention should be given to the performance of Group II on the first tested ear, which may substantiate important signs of improvements in performance and rehabilitation. The PSI - MCI of the right ear at speech/noise ratios of 0 and -10 was appropriate to differentiate Groups I and II. There was an association with the group that presented learning disabilities: this group showed problems in selective attention.
Gradient language dominance affects talker learning.
Bregman, Micah R; Creel, Sarah C
2014-01-01
Traditional conceptions of spoken language assume that speech recognition and talker identification are computed separately. Neuropsychological and neuroimaging studies imply some separation between the two faculties, but recent perceptual studies suggest better talker recognition in familiar languages than unfamiliar languages. A familiar-language benefit in talker recognition potentially implies strong ties between the two domains. However, little is known about the nature of this language familiarity effect. The current study investigated the relationship between speech and talker processing by assessing bilingual and monolingual listeners' ability to learn voices as a function of language familiarity and age of acquisition. Two effects emerged. First, bilinguals learned to recognize talkers in their first language (Korean) more rapidly than they learned to recognize talkers in their second language (English), while English-speaking participants showed the opposite pattern (learning English talkers faster than Korean talkers). Second, bilinguals' learning rate for talkers in their second language (English) correlated with age of English acquisition. Taken together, these results suggest that language background materially affects talker encoding, implying a tight relationship between speech and talker representations. Copyright © 2013 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Huang, Y-M.; Liu, C-J.; Shadiev, Rustam; Shen, M-H.; Hwang, W-Y.
2015-01-01
One major drawback of previous research on speech-to-text recognition (STR) is that most findings showing the effectiveness of STR for learning were based upon subjective evidence. Very few studies have used eye-tracking techniques to investigate visual attention of students on STR-generated text. Furthermore, not much attention was paid to…
ERIC Educational Resources Information Center
Higgins, Eleanor L.; Raskind, Marshall H.
1997-01-01
Thirty-seven college students with learning disabilities were given a reading comprehension task under the following conditions: (1) using an optical character recognition/speech synthesis system; (2) having the text read aloud by a human reader; or (3) reading silently without assistance. Findings indicated that the greater the disability, the…
ERIC Educational Resources Information Center
O'Brien, Mary Grantham
2014-01-01
In early stages of classroom language learning, many adult second language (L2) learners communicate primarily with one another, yet we know little about which speech stream characteristics learners tune into or the extent to which they understand this lingua franca communication. In the current study, 25 native English speakers learning German as…
ERIC Educational Resources Information Center
Vogt, Paul; Mastin, J. Douglas; Schots, Diede M. A.
2015-01-01
This article compares the communicative intentions observed in the speech addressed to children of 1;1 and 1;6 years old from three cultural communities: the Netherlands, rural Mozambique, and urban Mozambique. These communities represent two prototypical learning environments and a third hybrid: Western, urban, middle-class families; non-Western,…
ERIC Educational Resources Information Center
Saito, Kazuya
2017-01-01
This study examines the relationship between different types of language learning aptitude (measured via the LLAMA test) and adult second language (L2) learners' attainment in speech production in English-as-a-foreign-language (EFL) classrooms. Picture descriptions elicited from 50 Japanese EFL learners from varied proficiency levels were analyzed…
ERIC Educational Resources Information Center
Mersad, Karima; Nazzi, Thierry
2012-01-01
Transitional Probability (TP) computations are regarded as a powerful learning mechanism that is functional early in development and has been proposed as an initial bootstrapping device for speech segmentation. However, a recent study casts doubt on the robustness of early statistical word-learning. Johnson and Tyler (2010) showed that when…
Generalization of Perceptual Learning of Degraded Speech across Talkers
ERIC Educational Resources Information Center
Huyck, Julia Jones; Smith, Rachel H.; Hawkins, Sarah; Johnsrude, Ingrid S.
2017-01-01
Purpose: We investigated whether perceptual learning of noise-vocoded (NV) speech is specific to a particular talker or accent. Method: Four groups of listeners (n = 18 per group) were first trained by listening to 20 NV sentences that had been recorded by a talker with either the same native accent as the listeners or a different regional accent.…
ERIC Educational Resources Information Center
Blau, Vera; Reithler, Joel; van Atteveldt, Nienke; Seitz, Jochen; Gerretsen, Patty; Goebel, Rainer; Blomert, Leo
2010-01-01
Learning to associate auditory information of speech sounds with visual information of letters is a first and critical step for becoming a skilled reader in alphabetic languages. Nevertheless, it remains largely unknown which brain areas subserve the learning and automation of such associations. Here, we employ functional magnetic resonance…
ERIC Educational Resources Information Center
Richardson, Tanya; Murray, Jane
2017-01-01
Within English early childhood education, there is emphasis on improving speech and language development as well as a drive for outdoor learning. This paper synthesises both aspects to consider whether or not links exist between the environment and the quality of young children's utterances as part of their speech and language development and if…
ERIC Educational Resources Information Center
Shadiev, Rustam; Huang, Yueh-Min; Hwang, Jan-Pan
2017-01-01
In this study, the effectiveness of the application of speech-to-text recognition (STR) technology on enhancing learning and concentration in a calm state of mind, hereafter referred to as meditation (An intentional and self-regulated focusing of attention in order to relax and calm the mind), was investigated. This effectiveness was further…
Lai, Ying-Hui; Tsao, Yu; Lu, Xugang; Chen, Fei; Su, Yu-Ting; Chen, Kuang-Chao; Chen, Yu-Hsuan; Chen, Li-Ching; Po-Hung Li, Lieber; Lee, Chin-Hui
2018-01-20
We investigate the clinical effectiveness of a novel deep learning-based noise reduction (NR) approach under noisy conditions with challenging noise types at low signal to noise ratio (SNR) levels for Mandarin-speaking cochlear implant (CI) recipients. The deep learning-based NR approach used in this study consists of two modules: noise classifier (NC) and deep denoising autoencoder (DDAE), thus termed (NC + DDAE). In a series of comprehensive experiments, we conduct qualitative and quantitative analyses on the NC module and the overall NC + DDAE approach. Moreover, we evaluate the speech recognition performance of the NC + DDAE NR and classical single-microphone NR approaches for Mandarin-speaking CI recipients under different noisy conditions. The testing set contains Mandarin sentences corrupted by two types of maskers, two-talker babble noise, and a construction jackhammer noise, at 0 and 5 dB SNR levels. Two conventional NR techniques and the proposed deep learning-based approach are used to process the noisy utterances. We qualitatively compare the NR approaches by the amplitude envelope and spectrogram plots of the processed utterances. Quantitative objective measures include (1) normalized covariance measure to test the intelligibility of the utterances processed by each of the NR approaches; and (2) speech recognition tests conducted by nine Mandarin-speaking CI recipients. These nine CI recipients use their own clinical speech processors during testing. The experimental results of objective evaluation and listening test indicate that under challenging listening conditions, the proposed NC + DDAE NR approach yields higher intelligibility scores than the two compared classical NR techniques, under both matched and mismatched training-testing conditions. When compared to the two well-known conventional NR techniques under challenging listening condition, the proposed NC + DDAE NR approach has superior noise suppression capabilities and gives less distortion for the key speech envelope information, thus, improving speech recognition more effectively for Mandarin CI recipients. The results suggest that the proposed deep learning-based NR approach can potentially be integrated into existing CI signal processors to overcome the degradation of speech perception caused by noise.
Electropalatographic Therapy for Children and Young People with Down's Syndrome
ERIC Educational Resources Information Center
Cleland, Joanne; Timmins, Claire; Wood, Sara E.; Hardcastle, William J.; Wishart, Jennifer G.
2009-01-01
Articulation disorders in Down's syndrome (DS) are prevalent and often intractable. Individuals with DS generally prefer visual to auditory methods of learning and may therefore find it beneficial to be given a visual model during speech intervention, such as that provided by electropalatography (EPG). In this study, participants with Down's…
Gestures as Semiotic Resources in the Mathematics Classroom
ERIC Educational Resources Information Center
Arzarello, Ferdinando; Paola, Domingo; Robutti, Ornella; Sabena, Cristina
2009-01-01
In this paper, we consider gestures as part of the resources activated in the mathematics classroom: speech, inscriptions, artifacts, etc. As such, gestures are seen as one of the semiotic tools used by students and teacher in mathematics teaching-learning. To analyze them, we introduce a suitable model, the "semiotic bundle." It allows focusing…
ERIC Educational Resources Information Center
Munoz-Strizver, Nancy
Conversational Spanish is taught to hearing-impaired adolescents at the Model Secondary School for the Deaf (MSSD) through the use of cued speech. This paper provides an explanation of this mode of instruction and a description of the Spanish program at MSSD. The students learn the four skills of listening, speaking, reading and writing. Cued…
Improving Tone Recognition with Nucleus Modeling and Sequential Learning
ERIC Educational Resources Information Center
Wang, Siwei
2010-01-01
Mandarin Chinese and many other tonal languages use tones that are defined as specific pitch patterns to distinguish syllables otherwise ambiguous. It had been shown that tones carry at least as much information as vowels in Mandarin Chinese [Surendran et al., 2005]. Surprisingly, though, many speech recognition systems for Mandarin Chinese have…
Subcortical Contributions to Motor Speech: Phylogenetic, Developmental, Clinical.
Ziegler, W; Ackermann, H
2017-08-01
Vocal learning is an exclusively human trait among primates. However, songbirds demonstrate behavioral features resembling human speech learning. Two circuits have a preeminent role in this human behavior; namely, the corticostriatal and the cerebrocerebellar motor loops. While the striatal contribution can be traced back to the avian anterior forebrain pathway (AFP), the sensorimotor adaptation functions of the cerebellum appear to be human specific in acoustic communication. This review contributes to an ongoing discussion on how birdsong translates into human speech. While earlier approaches were focused on higher linguistic functions, we place the motor aspects of speaking at center stage. Genetic data are brought together with clinical and developmental evidence to outline the role of cerebrocerebellar and corticostriatal interactions in human speech. Copyright © 2017 Elsevier Ltd. All rights reserved.
Erfanian Saeedi, Nafise; Blamey, Peter J; Burkitt, Anthony N; Grayden, David B
2016-04-01
Pitch perception is important for understanding speech prosody, music perception, recognizing tones in tonal languages, and perceiving speech in noisy environments. The two principal pitch perception theories consider the place of maximum neural excitation along the auditory nerve and the temporal pattern of the auditory neurons' action potentials (spikes) as pitch cues. This paper describes a biophysical mechanism by which fine-structure temporal information can be extracted from the spikes generated at the auditory periphery. Deriving meaningful pitch-related information from spike times requires neural structures specialized in capturing synchronous or correlated activity from amongst neural events. The emergence of such pitch-processing neural mechanisms is described through a computational model of auditory processing. Simulation results show that a correlation-based, unsupervised, spike-based form of Hebbian learning can explain the development of neural structures required for recognizing the pitch of simple and complex tones, with or without the fundamental frequency. The temporal code is robust to variations in the spectral shape of the signal and thus can explain the phenomenon of pitch constancy.
Erfanian Saeedi, Nafise; Blamey, Peter J.; Burkitt, Anthony N.; Grayden, David B.
2016-01-01
Pitch perception is important for understanding speech prosody, music perception, recognizing tones in tonal languages, and perceiving speech in noisy environments. The two principal pitch perception theories consider the place of maximum neural excitation along the auditory nerve and the temporal pattern of the auditory neurons’ action potentials (spikes) as pitch cues. This paper describes a biophysical mechanism by which fine-structure temporal information can be extracted from the spikes generated at the auditory periphery. Deriving meaningful pitch-related information from spike times requires neural structures specialized in capturing synchronous or correlated activity from amongst neural events. The emergence of such pitch-processing neural mechanisms is described through a computational model of auditory processing. Simulation results show that a correlation-based, unsupervised, spike-based form of Hebbian learning can explain the development of neural structures required for recognizing the pitch of simple and complex tones, with or without the fundamental frequency. The temporal code is robust to variations in the spectral shape of the signal and thus can explain the phenomenon of pitch constancy. PMID:27049657
Competitive Speech and Debate: How Play Influenced American Educational Practice
ERIC Educational Resources Information Center
Bartanen, Michael D.; Littlefield, Robert S.
2015-01-01
The authors identify competitive speech and debate as a form of play that helped democratize American citizenship for the poor, who used what they learned through the practice to advance their personal social and economic goals. In addition, this competitive activity led to the development of speech communication as an academic discipline and…
Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.
ERIC Educational Resources Information Center
Horn, Carin E.; Scott, Brian L.
A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…
The Motor Core of Speech: A Comparison of Serial Organization Patterns in Infants and Languages.
ERIC Educational Resources Information Center
MacNeilage, Peter F.; Davis, Barbara L.; Kinney, Ashlynn; Matyear, Christine L.
2000-01-01
Presents evidence for four major design features of serial organization of speech arising from comparison of babbling and early speech with patterns in ten languages. Maintains that no explanation for the design features is available from Universal Grammar; except for intercyclical consonant repetition development, perceptual-motor learning seems…
Using Web Speech Technology with Language Learning Applications
ERIC Educational Resources Information Center
Daniels, Paul
2015-01-01
In this article, the author presents the history of human-to-computer interaction based upon the design of sophisticated computerized speech recognition algorithms. Advancements such as the arrival of cloud-based computing and software like Google's Web Speech API allows anyone with an Internet connection and Chrome browser to take advantage of…
The Relationship between Speech Impairment, Phonological Awareness and Early Literacy Development
ERIC Educational Resources Information Center
Harris, Judy; Botting, Nicola; Myers, Lucy; Dodd, Barbara
2011-01-01
Although children with speech impairment are at increased risk for impaired literacy, many learn to read and spell without difficulty. Around half the children with speech impairment have delayed acquisition, making errors typical of a normally developing younger child (e.g. reducing consonant clusters so that "spoon" is pronounced as…
ERIC Educational Resources Information Center
Hariri, Ruaa Osama
2016-01-01
Children with Attention-Deficiency/Hyperactive Disorder (ADHD) often have co-existing learning disabilities and developmental weaknesses or delays in some areas including speech (Rief, 2005). Seeing that phonological disorders include articulation errors and other forms of speech disorders, studies pertaining to children with ADHD symptoms who…
Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal
2017-09-01
Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content. Copyright © 2017 Elsevier Ltd. All rights reserved.
Statistical Learning is Related to Early Literacy-Related Skills
Spencer, Mercedes; Kaschak, Michael P.; Jones, John L.; Lonigan, Christopher J.
2015-01-01
It has been demonstrated that statistical learning, or the ability to use statistical information to learn the structure of one’s environment, plays a role in young children’s acquisition of linguistic knowledge. Although most research on statistical learning has focused on language acquisition processes, such as the segmentation of words from fluent speech and the learning of syntactic structure, some recent studies have explored the extent to which individual differences in statistical learning are related to literacy-relevant knowledge and skills. The present study extends on this literature by investigating the relations between two measures of statistical learning and multiple measures of skills that are critical to the development of literacy—oral language, vocabulary knowledge, and phonological processing—within a single model. Our sample included a total of 553 typically developing children from prekindergarten through second grade. Structural equation modeling revealed that statistical learning accounted for a unique portion of the variance in these literacy-related skills. Practical implications for instruction and assessment are discussed. PMID:26478658
Oratoria Online: The Use of Technology Enhaced Learning to Improve Students' Oral Skills
NASA Astrophysics Data System (ADS)
Dornaleteche, Jon
New ITCs have proven to be useful tools for implementing innovating didactic and pedagogical formula oriented to enhance students' en teachers' creativity. The up-and-coming massive e-learning and blended learning projects are clear examples of such a phenomenon. The teaching of oral communication offers a perfect scenario to experiment with these formulas. Since the traditional face to face approach for teaching 'Speech techniques' does not keep up with the new digital environment that surround students, it is necessary to move towards an 'Online oratory' model focused on using TEL to improve oral skills.
Auditory-neurophysiological responses to speech during early childhood: Effects of background noise
White-Schwoch, Travis; Davies, Evan C.; Thompson, Elaine C.; Carr, Kali Woodruff; Nicol, Trent; Bradlow, Ann R.; Kraus, Nina
2015-01-01
Early childhood is a critical period of auditory learning, during which children are constantly mapping sounds to meaning. But learning rarely occurs under ideal listening conditions—children are forced to listen against a relentless din. This background noise degrades the neural coding of these critical sounds, in turn interfering with auditory learning. Despite the importance of robust and reliable auditory processing during early childhood, little is known about the neurophysiology underlying speech processing in children so young. To better understand the physiological constraints these adverse listening scenarios impose on speech sound coding during early childhood, auditory-neurophysiological responses were elicited to a consonant-vowel syllable in quiet and background noise in a cohort of typically-developing preschoolers (ages 3–5 yr). Overall, responses were degraded in noise: they were smaller, less stable across trials, slower, and there was poorer coding of spectral content and the temporal envelope. These effects were exacerbated in response to the consonant transition relative to the vowel, suggesting that the neural coding of spectrotemporally-dynamic speech features is more tenuous in noise than the coding of static features—even in children this young. Neural coding of speech temporal fine structure, however, was more resilient to the addition of background noise than coding of temporal envelope information. Taken together, these results demonstrate that noise places a neurophysiological constraint on speech processing during early childhood by causing a breakdown in neural processing of speech acoustics. These results may explain why some listeners have inordinate difficulties understanding speech in noise. Speech-elicited auditory-neurophysiological responses offer objective insight into listening skills during early childhood by reflecting the integrity of neural coding in quiet and noise; this paper documents typical response properties in this age group. These normative metrics may be useful clinically to evaluate auditory processing difficulties during early childhood. PMID:26113025
Towards parameter-free classification of sound effects in movies
NASA Astrophysics Data System (ADS)
Chu, Selina; Narayanan, Shrikanth; Kuo, C.-C. J.
2005-08-01
The problem of identifying intense events via multimedia data mining in films is investigated in this work. Movies are mainly characterized by dialog, music, and sound effects. We begin our investigation with detecting interesting events through sound effects. Sound effects are neither speech nor music, but are closely associated with interesting events such as car chases and gun shots. In this work, we utilize low-level audio features including MFCC and energy to identify sound effects. It was shown in previous work that the Hidden Markov model (HMM) works well for speech/audio signals. However, this technique requires a careful choice in designing the model and choosing correct parameters. In this work, we introduce a framework that will avoid such necessity and works well with semi- and non-parametric learning algorithms.
HTM Spatial Pooler With Memristor Crossbar Circuits for Sparse Biometric Recognition.
James, Alex Pappachen; Fedorova, Irina; Ibrayev, Timur; Kudithipudi, Dhireesha
2017-06-01
Hierarchical Temporal Memory (HTM) is an online machine learning algorithm that emulates the neo-cortex. The development of a scalable on-chip HTM architecture is an open research area. The two core substructures of HTM are spatial pooler and temporal memory. In this work, we propose a new Spatial Pooler circuit design with parallel memristive crossbar arrays for the 2D columns. The proposed design was validated on two different benchmark datasets, face recognition, and speech recognition. The circuits are simulated and analyzed using a practical memristor device model and 0.18 μm IBM CMOS technology model. The databases AR, YALE, ORL, and UFI, are used to test the performance of the design in face recognition. TIMIT dataset is used for the speech recognition.
Pragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models
Najnin, Shamima; Banerjee, Bonny
2018-01-01
Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The “novel words to novel objects” language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task. PMID:29441027
Specific acoustic models for spontaneous and dictated style in indonesian speech recognition
NASA Astrophysics Data System (ADS)
Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.
2018-03-01
The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.
ERIC Educational Resources Information Center
Bornman, Juan; Donohue, Dana K.
2013-01-01
This study examined teachers' attitudes toward learners with two types of barriers to learning: a learner with attention-deficit and hyperactivity disorder (ADHD), and a learner with little or no functional speech (LNFS). The results indicated that although teachers reported that the learner with ADHD would be more disruptive in class and have a…
Huda, Shamsul; Yearwood, John; Togneri, Roberto
2009-02-01
This paper attempts to overcome the tendency of the expectation-maximization (EM) algorithm to locate a local rather than global maximum when applied to estimate the hidden Markov model (HMM) parameters in speech signal modeling. We propose a hybrid algorithm for estimation of the HMM in automatic speech recognition (ASR) using a constraint-based evolutionary algorithm (EA) and EM, the CEL-EM. The novelty of our hybrid algorithm (CEL-EM) is that it is applicable for estimation of the constraint-based models with many constraints and large numbers of parameters (which use EM) like HMM. Two constraint-based versions of the CEL-EM with different fusion strategies have been proposed using a constraint-based EA and the EM for better estimation of HMM in ASR. The first one uses a traditional constraint-handling mechanism of EA. The other version transforms a constrained optimization problem into an unconstrained problem using Lagrange multipliers. Fusion strategies for the CEL-EM use a staged-fusion approach where EM has been plugged with the EA periodically after the execution of EA for a specific period of time to maintain the global sampling capabilities of EA in the hybrid algorithm. A variable initialization approach (VIA) has been proposed using a variable segmentation to provide a better initialization for EA in the CEL-EM. Experimental results on the TIMIT speech corpus show that CEL-EM obtains higher recognition accuracies than the traditional EM algorithm as well as a top-standard EM (VIA-EM, constructed by applying the VIA to EM).
McKnight, Lindsay M; O'Malley-Keighran, Mary-Pat; Carroll, Clare
2016-11-01
There is evidence indicating that parent training programmes including interaction coaching of parents of children with autism spectrum disorders (ASD) can increase parental responsiveness, promote language development and social interaction skills in children with ASD. However, there is a lack of research exploring precisely how healthcare professionals use language in interaction coaching. To identify the speech acts of healthcare professionals during individual video-recorded interaction coaching sessions of a Hanen-influenced parent training programme with parents of children with ASD. This retrospective study used speech act analysis. Healthcare professional participants included two speech-language therapists and one occupational therapist. Sixteen videos were transcribed and a speech act analysis was conducted to identify the form and functions of the language used by the healthcare professionals. Descriptive statistics provided frequencies and percentages for the different speech acts used across the 16 videos. Six types of speech acts used by the healthcare professionals during coaching sessions were identified. These speech acts were, in order of frequency: Instructing, Modelling, Suggesting, Commanding, Commending and Affirming. The healthcare professionals were found to tailor their interaction coaching to the learning needs of the parents. A pattern was observed in which more direct speech acts were used in instances where indirect speech acts did not achieve the intended response. The study provides an insight into the nature of interaction coaching provided by healthcare professionals during a parent training programme. It identifies the types of language used during interaction coaching. It also highlights additional important aspects of interaction coaching such as the ability of healthcare professionals to adjust the directness of the coaching in order to achieve the intended parental response to the child's interaction. The findings may be used to increase the awareness of healthcare professionals about the types of speech acts used during interaction coaching as well as the manner in which coaching sessions are conducted. © 2016 Royal College of Speech and Language Therapists.
Musical training during early childhood enhances the neural encoding of speech in noise
Strait, Dana L.; Parbery-Clark, Alexandra; Hittner, Emily; Kraus, Nina
2012-01-01
For children, learning often occurs in the presence of background noise. As such, there is growing desire to improve a child’s access to a target signal in noise. Given adult musicians’ perceptual and neural speech-in-noise enhancements, we asked whether similar effects are present in musically-trained children. We assessed the perception and subcortical processing of speech in noise and related cognitive abilities in musician and nonmusician children that were matched for a variety of overarching factors. Outcomes reveal that musicians’ advantages for processing speech in noise are present during pivotal developmental years. Supported by correlations between auditory working memory and attention and auditory brainstem response properties, we propose that musicians’ perceptual and neural enhancements are driven in a top-down manner by strengthened cognitive abilities with training. Our results may be considered by professionals involved in the remediation of language-based learning deficits, which are often characterized by poor speech perception in noise. PMID:23102977
Constraints on the Transfer of Perceptual Learning in Accented Speech
Eisner, Frank; Melinger, Alissa; Weber, Andrea
2013-01-01
The perception of speech sounds can be re-tuned through a mechanism of lexically driven perceptual learning after exposure to instances of atypical speech production. This study asked whether this re-tuning is sensitive to the position of the atypical sound within the word. We investigated perceptual learning using English voiced stop consonants, which are commonly devoiced in word-final position by Dutch learners of English. After exposure to a Dutch learner’s productions of devoiced stops in word-final position (but not in any other positions), British English (BE) listeners showed evidence of perceptual learning in a subsequent cross-modal priming task, where auditory primes with devoiced final stops (e.g., “seed”, pronounced [si:th]), facilitated recognition of visual targets with voiced final stops (e.g., SEED). In Experiment 1, this learning effect generalized to test pairs where the critical contrast was in word-initial position, e.g., auditory primes such as “town” facilitated recognition of visual targets like DOWN. Control listeners, who had not heard any stops by the speaker during exposure, showed no learning effects. The generalization to word-initial position did not occur when participants had also heard correctly voiced, word-initial stops during exposure (Experiment 2), and when the speaker was a native BE speaker who mimicked the word-final devoicing (Experiment 3). The readiness of the perceptual system to generalize a previously learned adjustment to other positions within the word thus appears to be modulated by distributional properties of the speech input, as well as by the perceived sociophonetic characteristics of the speaker. The results suggest that the transfer of pre-lexical perceptual adjustments that occur through lexically driven learning can be affected by a combination of acoustic, phonological, and sociophonetic factors. PMID:23554598
Statistical Learning in a Natural Language by 8-Month-Old Infants
Pelucchi, Bruna; Hay, Jessica F.; Saffran, Jenny R.
2013-01-01
Numerous studies over the past decade support the claim that infants are equipped with powerful statistical language learning mechanisms. The primary evidence for statistical language learning in word segmentation comes from studies using artificial languages, continuous streams of synthesized syllables that are highly simplified relative to real speech. To what extent can these conclusions be scaled up to natural language learning? In the current experiments, English-learning 8-month-old infants’ ability to track transitional probabilities in fluent infant-directed Italian speech was tested (N = 72). The results suggest that infants are sensitive to transitional probability cues in unfamiliar natural language stimuli, and support the claim that statistical learning is sufficiently robust to support aspects of real-world language acquisition. PMID:19489896
Statistical learning in a natural language by 8-month-old infants.
Pelucchi, Bruna; Hay, Jessica F; Saffran, Jenny R
2009-01-01
Numerous studies over the past decade support the claim that infants are equipped with powerful statistical language learning mechanisms. The primary evidence for statistical language learning in word segmentation comes from studies using artificial languages, continuous streams of synthesized syllables that are highly simplified relative to real speech. To what extent can these conclusions be scaled up to natural language learning? In the current experiments, English-learning 8-month-old infants' ability to track transitional probabilities in fluent infant-directed Italian speech was tested (N = 72). The results suggest that infants are sensitive to transitional probability cues in unfamiliar natural language stimuli, and support the claim that statistical learning is sufficiently robust to support aspects of real-world language acquisition.
ERIC Educational Resources Information Center
Sperbeck, Mieko
2010-01-01
The primary aim of this dissertation was to investigate the relationship between speech perception and speech production difficulties among Japanese second language (L2) learners of English, in their learning complex syllable structures. Japanese L2 learners and American English controls were tested in a categorical ABX discrimination task of…
Two Different Communication Genres and Implications for Vocabulary Development and Learning to Read
ERIC Educational Resources Information Center
Massaro, Dominic W.
2015-01-01
This study examined potential differences in vocabulary found in picture books and adult's speech to children and to other adults. Using a small sample of various sources of speech and print, Hayes observed that print had a more extensive vocabulary than speech. The current analyses of two different spoken language databases and an assembled…
ERIC Educational Resources Information Center
Pons, Ferran; Biesanz, Jeremy C.; Kajikawa, Sachiyo; Fais, Laurel; Narayan, Chandan R.; Amano, Shigeaki; Werker, Janet F.
2012-01-01
Using an artificial language learning manipulation, Maye, Werker, and Gerken (2002) demonstrated that infants' speech sound categories change as a function of the distributional properties of the input. In a recent study, Werker et al. (2007) showed that Infant-directed Speech (IDS) input contains reliable acoustic cues that support distributional…
The Influence of Child-Directed Speech on Word Learning and Comprehension
ERIC Educational Resources Information Center
Foursha-Stevenson, Cassandra; Schembri, Taylor; Nicoladis, Elena; Eriksen, Cody
2017-01-01
This paper describes an investigation into the function of child-directed speech (CDS) across development. In the first experiment, 10-21-month-olds were presented with familiar words in CDS and trained on novel words in CDS or adult-directed speech (ADS). All children preferred the matching display for familiar words. However, only older toddlers…
The First Amendment: Free Speech & a Free Press. A Curriculum Guide for High School Teachers.
ERIC Educational Resources Information Center
Eveslage, Thomas
This curriculum guide is intended to encourage students to learn how everyone benefits when young people, other citizens, and the media exercise the constitutional rights of free speech and free press. Background information on free speech issues is provided, along with classroom activities, discussion questions, and student worksheets. There are…
ERIC Educational Resources Information Center
Holm, Alison; Farrier, Faith; Dodd, Barbara
2008-01-01
Background: Although children with speech disorder are at increased risk of literacy impairments, many learn to read and spell without difficulty. They are also a heterogeneous population in terms of the number and type of speech errors and their identified speech processing deficits. One problem lies in determining which preschool children with…
The Development of the Mealings, Demuth, Dillon, and Buchholz Classroom Speech Perception Test
ERIC Educational Resources Information Center
Mealings, Kiri T.; Demuth, Katherine; Buchholz, Jörg; Dillon, Harvey
2015-01-01
Purpose: Open-plan classroom styles are increasingly being adopted in Australia despite evidence that their high intrusive noise levels adversely affect learning. The aim of this study was to develop a new Australian speech perception task (the Mealings, Demuth, Dillon, and Buchholz Classroom Speech Perception Test) and use it in an open-plan…
Language-learning disabilities: Paradigms for the nineties.
Wiig, E H
1991-01-01
We are beginning a decade, during which many traditional paradigms in education, special education, and speech-language pathology will undergo change. Among paradigms considered promising for speech-language pathology in the schools are collaborative language intervention and strategy training for language and communication. This presentation introduces management models for developing a collaborative language intervention process, among them the Deming Management Method for Total Quality (TQ) (Deming 1986). Implementation models for language assessment and IEP planning and multicultural issues are also introduced (Damico and Nye 1990; Secord and Wiig in press). While attention to processes involved in developing and implementing collaborative language intervention is paramount, content should not be neglected. To this end, strategy training for language and communication is introduced as a viable paradigm. Macro- and micro-level process models for strategy training are featured and general issues are discussed (Ellis, Deshler, and Schumaker 1989; Swanson 1989; Wiig 1989).
End-to-End ASR-Free Keyword Search From Speech
NASA Astrophysics Data System (ADS)
Audhkhasi, Kartik; Rosenberg, Andrew; Sethy, Abhinav; Ramabhadran, Bhuvana; Kingsbury, Brian
2017-12-01
End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E systems are attractive due to the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores the design of an ASR-free end-to-end system for text query-based keyword search (KWS) from speech trained with minimal supervision. Our E2E KWS system consists of three sub-systems. The first sub-system is a recurrent neural network (RNN)-based acoustic auto-encoder trained to reconstruct the audio through a finite-dimensional representation. The second sub-system is a character-level RNN language model using embeddings learned from a convolutional neural network. Since the acoustic and text query embeddings occupy different representation spaces, they are input to a third feed-forward neural network that predicts whether the query occurs in the acoustic utterance or not. This E2E ASR-free KWS system performs respectably despite lacking a conventional ASR system and trains much faster.
Comprehension: an overlooked component in augmented language development.
Sevcik, Rose A
2006-02-15
Despite the importance of children's receptive skills as a foundation for later productive word use, the role of receptive language traditionally has received very limited attention since the focus in linguistic development has centered on language production. For children with significant developmental disabilities and communication impairments, augmented language systems have been devised as a tool both for language input and output. The role of both speech and symbol comprehension skills is emphasized in this paper. Data collected from two longitudinal studies of children and youth with severe disabilities and limited speech serve as illustrations in this paper. The acquisition and use of the System for Augmenting Language (SAL) was studied in home and school settings. Communication behaviors of the children and youth and their communication partners were observed and language assessment measures were collected. Two patterns of symbol learning and achievement--beginning and advanced--were observed. Extant speech comprehension skills brought to the augmented language learning task impacted the participants' patterns of symbol learning and use. Though often overlooked, the importance of speech and symbol comprehension skills were underscored in the studies described. Future areas for research are identified.
Cross-Cultural Learning: The Language Connection.
ERIC Educational Resources Information Center
Axelrod, Joseph
1981-01-01
If foreign language acquisition is disconnected from the cultural life of the foreign speech community, the learning yield is low. Integration of affective learning, cultural learning, and foreign language learning are essential to a successful cross-cultural experience. (MSE)
Kaplan, Peter S; Danko, Christina M; Cejka, Anna M; Everhart, Kevin D
2015-11-01
The hypothesis that the associative learning-promoting effects of infant-directed speech (IDS) depend on infants' social experience was tested in a conditioned-attention paradigm with a cumulative sample of 4- to 14-month-old infants. Following six forward pairings of a brief IDS segment and a photographic slide of a smiling female face, infants of clinically depressed mothers exhibited evidence of having acquired significantly weaker voice-face associations than infants of non-depressed mothers. Regression analyses revealed that maternal depression was significantly related to infant learning even after demographic correlates of depression, antidepressant medication use, and extent of pitch modulation in maternal IDS had been taken into account. However, after maternal depression had been accounted for, maternal emotional availability, coded by blind raters from separate play interactions, accounted for significant further increments in the proportion of variance accounted for in infant learning scores. Both maternal depression and maternal insensitivity negatively, and additively, predicted poor learning. Copyright © 2015 Elsevier Inc. All rights reserved.
A Selective Deficit in Phonetic Recalibration by Text in Developmental Dyslexia.
Keetels, Mirjam; Bonte, Milene; Vroomen, Jean
2018-01-01
Upon hearing an ambiguous speech sound, listeners may adjust their perceptual interpretation of the speech input in accordance with contextual information, like accompanying text or lipread speech (i.e., phonetic recalibration; Bertelson et al., 2003). As developmental dyslexia (DD) has been associated with reduced integration of text and speech sounds, we investigated whether this deficit becomes manifest when text is used to induce this type of audiovisual learning. Adults with DD and normal readers were exposed to ambiguous consonants halfway between /aba/ and /ada/ together with text or lipread speech. After this audiovisual exposure phase, they categorized auditory-only ambiguous test sounds. Results showed that individuals with DD, unlike normal readers, did not use text to recalibrate their phoneme categories, whereas their recalibration by lipread speech was spared. Individuals with DD demonstrated similar deficits when ambiguous vowels (halfway between /wIt/ and /wet/) were recalibrated by text. These findings indicate that DD is related to a specific letter-speech sound association deficit that extends over phoneme classes (vowels and consonants), but - as lipreading was spared - does not extend to a more general audio-visual integration deficit. In particular, these results highlight diminished reading-related audiovisual learning in addition to the commonly reported phonological problems in developmental dyslexia.
ERIC Educational Resources Information Center
Treurniet, William
A study applied artificial neural networks, trained with the back-propagation learning algorithm, to modelling phonemes extracted from the DARPA TIMIT multi-speaker, continuous speech data base. A number of proposed network architectures were applied to the phoneme classification task, ranging from the simple feedforward multilayer network to more…
ERIC Educational Resources Information Center
Riede, Tobias; Goller, Franz
2010-01-01
Song production in songbirds is a model system for studying learned vocal behavior. As in humans, bird phonation involves three main motor systems (respiration, vocal organ and vocal tract). The avian respiratory mechanism uses pressure regulation in air sacs to ventilate a rigid lung. In songbirds sound is generated with two independently…
Teaching Anatomy and Physiology Using Computer-Based, Stereoscopic Images
ERIC Educational Resources Information Center
Perry, Jamie; Kuehn, David; Langlois, Rick
2007-01-01
Learning real three-dimensional (3D) anatomy for the first time can be challenging. Two-dimensional drawings and plastic models tend to over-simplify the complexity of anatomy. The approach described uses stereoscopy to create 3D images of the process of cadaver dissection and to demonstrate the underlying anatomy related to the speech mechanisms.…
Lim, Sung-joo; Holt, Lori L
2011-01-01
Although speech categories are defined by multiple acoustic dimensions, some are perceptually weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: Increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information, and players' responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5h across 5 days exhibited improvements in /r/-/l/ perception on par with 2-4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights. Copyright © 2011 Cognitive Science Society, Inc.
Lim, Sung-joo; Holt, Lori L.
2011-01-01
Although speech categories are defined by multiple acoustic dimensions, some are perceptually-weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information and players’ responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5 hours across 5 days exhibited improvements in /r/-/l/ perception on par with 2–4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights. PMID:21827533
ERIC Educational Resources Information Center
Grünke, Matthias; Janning, Andriana Maria; Sperling, Marko
2016-01-01
The purpose of this single-case study was to evaluate the effects of a peer-tutoring intervention on the text production skills of three third graders with severe learning and speech difficulties. All tutees were initially able to produce only very short stories. During the course of the treatment, higher performing classmates taught them how to…
ERIC Educational Resources Information Center
Stansfield, Jois
2012-01-01
The speech and language therapy (SLT) service in an area of northern England receives referrals of parents who have learning disabilities. The aim of this study was to identify current referral patterns and quantify the level of demand upon the SLT service from this relatively new referral population to enable to service to meet the needs of these…
Investigating speech motor practice and learning in people who stutter.
Namasivayam, Aravind Kumar; van Lieshout, Pascal
2008-03-01
In this exploratory study, we investigated whether or not people who stutter (PWS) show motor practice and learning changes similar to those of people who do not stutter (PNS). To this end, five PWS and five PNS repeated a set of non-words at two different rates (normal and fast) across three test sessions (T1, T2 on the same day and T3 on a separate day, at least 1 week apart). The results indicated that PWS and PNS may resemble each other on a number of performance variables (such as movement amplitude and duration), but they differ in terms of practice and learning on variables that relate to movement stability and strength of coordination patterns. These findings are interpreted in support of recent claims about speech motor skill limitations in PWS. The reader will be able to: (1) define oral articulatory changes associated with motor practice and learning and their measurement; (2) summarize findings from previous studies examining motor practice and learning in PWS; and (3) discuss hypotheses that could account for the present findings that suggest PWS and PNS differ in their speech motor learning abilities.
An Exploration of Rhythmic Grouping of Speech Sequences by French- and German-Learning Infants
Abboub, Nawal; Boll-Avetisyan, Natalie; Bhatara, Anjali; Höhle, Barbara; Nazzi, Thierry
2016-01-01
Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias—called the Iambic-Trochaic Law (ITL)–has been claimed to be an universal property of the auditory system applying in both the music and the language domains. Recent studies have shown that language experience can modulate the effects of the ITL on rhythmic perception of both speech and non-speech sequences in adults, and of non-speech sequences in 7.5-month-old infants. The goal of the present study was to explore whether language experience also modulates infants’ grouping of speech. To do so, we presented sequences of syllables to monolingual French- and German-learning 7.5-month-olds. Using the Headturn Preference Procedure (HPP), we examined whether they were able to perceive a rhythmic structure in sequences of syllables that alternated in duration, pitch, or intensity. Our findings show that both French- and German-learning infants perceived a rhythmic structure when it was cued by duration or pitch but not intensity. Our findings also show differences in how these infants use duration and pitch cues to group syllable sequences, suggesting that pitch cues were the easier ones to use. Moreover, performance did not differ across languages, failing to reveal early language effects on rhythmic perception. These results contribute to our understanding of the origin of rhythmic perception and perceptual mechanisms shared across music and speech, which may bootstrap language acquisition. PMID:27378887
ERIC Educational Resources Information Center
Noguchi, Masaki; Hudson Kam, Carla L.
2018-01-01
In human languages, different speech sounds can be contextual variants of a single phoneme, called allophones. Learning which sounds are allophones is an integral part of the acquisition of phonemes. Whether given sounds are separate phonemes or allophones in a listener's language affects speech perception. Listeners tend to be less sensitive to…
ERIC Educational Resources Information Center
McAllen, Audrey E.
This book gives teachers an understanding of speech training through specially selected exercises. The book's exercises aim to help develop clear speaking in the classroom. Methodically and perceptively used, the book will assist those concerned with the creative powers of speech as a teaching art. In Part 1, there are sections on the links…
ERIC Educational Resources Information Center
Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann
2013-01-01
Blind people can learn to understand speech at ultra-high syllable rates (ca. 20 syllables/s), a capability associated with hemodynamic activation of the central-visual system. To further elucidate the neural mechanisms underlying this skill, magnetoencephalographic (MEG) measurements during listening to sentence utterances were cross-correlated…
Is Discussion an Exchange of Ideas? On Education, Money, and Speech
ERIC Educational Resources Information Center
Backer, David I.
2017-01-01
How do we learn the link between speech and money? What is the process of formation that legitimates the logic whereby speech is equivalent to money? What are the experiences, events, and subjectivities that render the connection between currency and speaking/listening intuitive? As educators and researchers, what do we do and say to shore up this…
Reading Aloud to Children: Benefits and Implications for Acquiring Literacy Before Schooling Begins.
Massaro, Dominic W
2017-01-01
Extensive experience in written language might provide children the opportunity to learn to read in the same manner they learn spoken language. One potential type of written language immersion is reading aloud to children, which is additionally valuable because the vocabulary in picture books is richer and more extensive than that found in child-directed speech. This study continues a comparison between these 2 communication media by evaluating their relative linguistic and cognitive complexity. Although reading grade level has been used only to assess the complexity of written language, it was also applied to both child-directed and adult-directed speech. Five measures of reading grade level gave an average grade level of 4.2 for picture books, 1.9 for child-directed speech, and 3.0 for adult-directed speech. The language in picture books is more challenging than that found in both child-directed and adult-directed speech. It is proposed that this difference between written and spoken language is the formal versus informal genre of their occurrence rather than their text or oral medium. The value of reading books aloud therefore exposes children to a linguistic and cognitive complexity not typically found in speech to children.
Sayeski, Kristin L; Earle, Gentry A; Eslinger, R Paige; Whitenton, Jessy N
2017-04-01
Matching phonemes (speech sounds) to graphemes (letters and letter combinations) is an important aspect of decoding (translating print to speech) and encoding (translating speech to print). Yet, many teacher candidates do not receive explicit training in phoneme-grapheme correspondence. Difficulty with accurate phoneme production and/or lack of understanding of sound-symbol correspondence can make it challenging for teachers to (a) identify student errors on common assessments and (b) serve as a model for students when teaching beginning reading or providing remedial reading instruction. For students with dyslexia, lack of teacher proficiency in this area is particularly problematic. This study examined differences between two learning conditions (massed and distributed practice) on teacher candidates' development of phoneme-grapheme correspondence knowledge and skills. An experimental, pretest-posttest-delayed test design was employed with teacher candidates (n = 52) to compare a massed practice condition (one, 60-min session) to a distributed practice condition (four, 15-min sessions distributed over 4 weeks) for learning phonemes associated with letters and letter combinations. Participants in the distributed practice condition significantly outperformed participants in the massed practice condition on their ability to correctly produce phonemes associated with different letters and letter combinations. Implications for teacher preparation are discussed.
Adam, I; Mendoza, E; Kobalz, U; Wohlgemuth, S; Scharff, C
2017-07-01
Mutations of FOXP2 are associated with altered brain structure, including the striatal part of the basal ganglia, and cause a severe speech and language disorder. Songbirds serve as a tractable neurobiological model for speech and language research. Experimental downregulation of FoxP2 in zebra finch Area X, a nucleus of the striatal song control circuitry, affects synaptic transmission and spine densities. It also renders song learning and production inaccurate and imprecise, similar to the speech impairment of patients carrying FOXP2 mutations. Here we show that experimental downregulation of FoxP2 in Area X using lentiviral vectors leads to reduced expression of CNTNAP2, a FOXP2 target gene in humans. In addition, natural downregulation of FoxP2 by age or by singing also downregulated CNTNAP2 expression. Furthermore, we report that FoxP2 binds to and activates the avian CNTNAP2 promoter in vitro. Taken together these data establish CNTNAP2 as a direct FoxP2 target gene in songbirds, likely affecting synaptic function relevant for song learning and song maintenance. © 2017 The Authors. Genes, Brain and Behavior published by International Behavioural and Neural Genetics Society and John Wiley & Sons Ltd.
A new model of sensorimotor coupling in the development of speech.
Westermann, Gert; Reck Miranda, Eduardo
2004-05-01
We present a computational model that learns a coupling between motor parameters and their sensory consequences in vocal production during a babbling phase. Based on the coupling, preferred motor parameters and prototypically perceived sounds develop concurrently. Exposure to an ambient language modifies perception to coincide with the sounds from the language. The model develops motor mirror neurons that are active when an external sound is perceived. An extension to visual mirror neurons for oral gestures is suggested.
Infant Directed Speech Enhances Statistical Learning in Newborn Infants: An ERP Study
Teinonen, Tuomas; Tervaniemi, Mari; Huotilainen, Minna
2016-01-01
Statistical learning and the social contexts of language addressed to infants are hypothesized to play important roles in early language development. Previous behavioral work has found that the exaggerated prosodic contours of infant-directed speech (IDS) facilitate statistical learning in 8-month-old infants. Here we examined the neural processes involved in on-line statistical learning and investigated whether the use of IDS facilitates statistical learning in sleeping newborns. Event-related potentials (ERPs) were recorded while newborns were exposed to12 pseudo-words, six spoken with exaggerated pitch contours of IDS and six spoken without exaggerated pitch contours (ADS) in ten alternating blocks. We examined whether ERP amplitudes for syllable position within a pseudo-word (word-initial vs. word-medial vs. word-final, indicating statistical word learning) and speech register (ADS vs. IDS) would interact. The ADS and IDS registers elicited similar ERP patterns for syllable position in an early 0–100 ms component but elicited different ERP effects in both the polarity and topographical distribution at 200–400 ms and 450–650 ms. These results provide the first evidence that the exaggerated pitch contours of IDS result in differences in brain activity linked to on-line statistical learning in sleeping newborns. PMID:27617967
Fast phonetic learning occurs already in 2-to-3-month old infants: an ERP study
Wanrooij, Karin; Boersma, Paul; van Zuijen, Titia L.
2014-01-01
An important mechanism for learning speech sounds in the first year of life is “distributional learning,” i.e., learning by simply listening to the frequency distributions of the speech sounds in the environment. In the lab, fast distributional learning has been reported for infants in the second half of the first year; the present study examined whether it can also be demonstrated at a much younger age, long before the onset of language-specific speech perception (which roughly emerges between 6 and 12 months). To investigate this, Dutch infants aged 2 to 3 months were presented with either a unimodal or a bimodal vowel distribution based on the English /æ/~/ε/ contrast, for only 12 minutes. Subsequently, mismatch responses (MMRs) were measured in an oddball paradigm, where one half of the infants in each group heard a representative [æ] as the standard and a representative [ε] as the deviant, and the other half heard the same reversed. The results (from the combined MMRs during wakefulness and active sleep) disclosed a larger MMR, implying better discrimination of [æ] and [ε], for bimodally than unimodally trained infants, thus extending an effect of distributional training found in previous behavioral research to a much younger age when speech perception is still universal rather than language-specific, and to a new method (using event-related potentials). Moreover, the analysis revealed a robust interaction between the distribution (unimodal vs. bimodal) and the identity of the standard stimulus ([æ] vs. [ε]), which provides evidence for an interplay between a perceptual asymmetry and distributional learning. The outcomes show that distributional learning can affect vowel perception already in the first months of life. PMID:24701203
NASA Astrophysics Data System (ADS)
Peng, Zhao Ellen
The current classroom acoustics standard (ANSI S12.60-2010) recommends core learning spaces not to exceed background noise level (BNL) of 35 dBA and reverberation time (RT) of 0.6 second, based on speech intelligibility performance mainly by the native English-speaking population. Existing literature has not correlated these recommended values well with student learning outcomes. With a growing population of non-native English speakers in American classrooms, the special needs for perceiving degraded speech among non-native listeners, either due to realistic room acoustics or talker foreign accent, have not been addressed in the current standard. This research seeks to investigate the effects of BNL and RT on the comprehension of English speech from native English and native Mandarin Chinese talkers as perceived by native and non-native English listeners, and to provide acoustic design guidelines to supplement the existing standard. This dissertation presents two studies on the effects of RT and BNL on more realistic classroom learning experiences. How do native and non-native English-speaking listeners perform on speech comprehension tasks under adverse acoustic conditions, if the English speech is produced by talkers of native English (Study 1) versus native Mandarin Chinese (Study 2)? Speech comprehension materials were played back in a listening chamber to individual listeners: native and non-native English-speaking in Study 1; native English, native Mandarin Chinese, and other non-native English-speaking in Study 2. Each listener was screened for baseline English proficiency level, and completed dual tasks simultaneously involving speech comprehension and adaptive dot-tracing under 15 acoustic conditions, comprised of three BNL conditions (RC-30, 40, and 50) and five RT scenarios (0.4 to 1.2 seconds). The results show that BNL and RT negatively affect both objective performance and subjective perception of speech comprehension, more severely for non-native listeners than for native listeners. While the presence of foreign accent is generally detrimental, an interlanguage benefit was identified on both speech comprehension and the self-report frustration and perceived performance ratings, specifically for non-native listeners with matched foreign accent as the talker. Suggested design guidelines for BNL and RT are identified for attaining optimal speech comprehension performance to improve classroom acoustics for the non-native English-speaking population.
Noise levels in an urban Asian school environment
Chan, Karen M.K.; Li, Chi Mei; Ma, Estella P.M.; Yiu, Edwin M.L.; McPherson, Bradley
2015-01-01
Background noise is known to adversely affect speech perception and speech recognition. High levels of background noise in school classrooms may affect student learning, especially for those pupils who are learning in a second language. The current study aimed to determine the noise level and teacher speech-to-noise ratio (SNR) in Hong Kong classrooms. Noise level was measured in 146 occupied classrooms in 37 schools, including kindergartens, primary schools, secondary schools and special schools, in Hong Kong. The mean noise levels in occupied kindergarten, primary school, secondary school and special school classrooms all exceeded recommended maximum noise levels, and noise reduction measures were seldom used in classrooms. The measured SNRs were not optimal and could have adverse implications for student learning and teachers’ vocal health. Schools in urban Asian environments are advised to consider noise reduction measures in classrooms to better comply with recommended maximum noise levels for classrooms. PMID:25599758
Noise levels in an urban Asian school environment.
Chan, Karen M K; Li, Chi Mei; Ma, Estella P M; Yiu, Edwin M L; McPherson, Bradley
2015-01-01
Background noise is known to adversely affect speech perception and speech recognition. High levels of background noise in school classrooms may affect student learning, especially for those pupils who are learning in a second language. The current study aimed to determine the noise level and teacher speech-to-noise ratio (SNR) in Hong Kong classrooms. Noise level was measured in 146 occupied classrooms in 37 schools, including kindergartens, primary schools, secondary schools and special schools, in Hong Kong. The mean noise levels in occupied kindergarten, primary school, secondary school and special school classrooms all exceeded recommended maximum noise levels, and noise reduction measures were seldom used in classrooms. The measured SNRs were not optimal and could have adverse implications for student learning and teachers' vocal health. Schools in urban Asian environments are advised to consider noise reduction measures in classrooms to better comply with recommended maximum noise levels for classrooms.
Automated Depression Analysis Using Convolutional Neural Networks from Speech.
He, Lang; Cao, Cui
2018-05-28
To help clinicians to efficiently diagnose the severity of a person's depression, the affective computing community and the artificial intelligence field have shown a growing interest in designing automated systems. The speech features have useful information for the diagnosis of depression. However, manually designing and domain knowledge are still important for the selection of the feature, which makes the process labor consuming and subjective. In recent years, deep-learned features based on neural networks have shown superior performance to hand-crafted features in various areas. In this paper, to overcome the difficulties mentioned above, we propose a combination of hand-crafted and deep-learned features which can effectively measure the severity of depression from speech. In the proposed method, Deep Convolutional Neural Networks (DCNN) are firstly built to learn deep-learned features from spectrograms and raw speech waveforms. Then we manually extract the state-of-the-art texture descriptors named median robust extended local binary patterns (MRELBP) from spectrograms. To capture the complementary information within the hand-crafted features and deep-learned features, we propose joint fine-tuning layers to combine the raw and spectrogram DCNN to boost the depression recognition performance. Moreover, to address the problems with small samples, a data augmentation method was proposed. Experiments conducted on AVEC2013 and AVEC2014 depression databases show that our approach is robust and effective for the diagnosis of depression when compared to state-of-the-art audio-based methods. Copyright © 2018. Published by Elsevier Inc.
Carey, Daniel; McGettigan, Carolyn
2017-04-01
The human vocal system is highly plastic, allowing for the flexible expression of language, mood and intentions. However, this plasticity is not stable throughout the life span, and it is well documented that adult learners encounter greater difficulty than children in acquiring the sounds of foreign languages. Researchers have used magnetic resonance imaging (MRI) to interrogate the neural substrates of vocal imitation and learning, and the correlates of individual differences in phonetic "talent". In parallel, a growing body of work using MR technology to directly image the vocal tract in real time during speech has offered primarily descriptive accounts of phonetic variation within and across languages. In this paper, we review the contribution of neural MRI to our understanding of vocal learning, and give an overview of vocal tract imaging and its potential to inform the field. We propose methods by which our understanding of speech production and learning could be advanced through the combined measurement of articulation and brain activity using MRI - specifically, we describe a novel paradigm, developed in our laboratory, that uses both MRI techniques to for the first time map directly between neural, articulatory and acoustic data in the investigation of vocalisation. This non-invasive, multimodal imaging method could be used to track central and peripheral correlates of spoken language learning, and speech recovery in clinical settings, as well as provide insights into potential sites for targeted neural interventions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Feng, Gangyi; Ingvalson, Erin M; Grieco-Calub, Tina M; Roberts, Megan Y; Ryan, Maura E; Birmingham, Patrick; Burrowes, Delilah; Young, Nancy M; Wong, Patrick C M
2018-01-30
Although cochlear implantation enables some children to attain age-appropriate speech and language development, communicative delays persist in others, and outcomes are quite variable and difficult to predict, even for children implanted early in life. To understand the neurobiological basis of this variability, we used presurgical neural morphological data obtained from MRI of individual pediatric cochlear implant (CI) candidates implanted younger than 3.5 years to predict variability of their speech-perception improvement after surgery. We first compared neuroanatomical density and spatial pattern similarity of CI candidates to that of age-matched children with normal hearing, which allowed us to detail neuroanatomical networks that were either affected or unaffected by auditory deprivation. This information enables us to build machine-learning models to predict the individual children's speech development following CI. We found that regions of the brain that were unaffected by auditory deprivation, in particular the auditory association and cognitive brain regions, produced the highest accuracy, specificity, and sensitivity in patient classification and the most precise prediction results. These findings suggest that brain areas unaffected by auditory deprivation are critical to developing closer to typical speech outcomes. Moreover, the findings suggest that determination of the type of neural reorganization caused by auditory deprivation before implantation is valuable for predicting post-CI language outcomes for young children.
A model of serial order problems in fluent, stuttered and agrammatic speech.
Howell, Peter
2007-10-01
Many models of speech production have attempted to explain dysfluent speech. Most models assume that the disruptions that occur when speech is dysfluent arise because the speakers make errors while planning an utterance. In this contribution, a model of the serial order of speech is described that does not make this assumption. It involves the coordination or 'interlocking' of linguistic planning and execution stages at the language-speech interface. The model is examined to determine whether it can distinguish two forms of dysfluent speech (stuttered and agrammatic speech) that are characterized by iteration and omission of whole words and parts of words.
López-Barroso, Diana; Ripollés, Pablo; Marco-Pallarés, Josep; Mohammadi, Bahram; Münte, Thomas F; Bachoud-Lévi, Anne-Catherine; Rodriguez-Fornells, Antoni; de Diego-Balaguer, Ruth
2015-04-15
Although neuroimaging studies using standard subtraction-based analysis from functional magnetic resonance imaging (fMRI) have suggested that frontal and temporal regions are involved in word learning from fluent speech, the possible contribution of different brain networks during this type of learning is still largely unknown. Indeed, univariate fMRI analyses cannot identify the full extent of distributed networks that are engaged by a complex task such as word learning. Here we used Independent Component Analysis (ICA) to characterize the different brain networks subserving word learning from an artificial language speech stream. Results were replicated in a second cohort of participants with a different linguistic background. Four spatially independent networks were associated with the task in both cohorts: (i) a dorsal Auditory-Premotor network; (ii) a dorsal Sensory-Motor network; (iii) a dorsal Fronto-Parietal network; and (iv) a ventral Fronto-Temporal network. The level of engagement of these networks varied through the learning period with only the dorsal Auditory-Premotor network being engaged across all blocks. In addition, the connectivity strength of this network in the second block of the learning phase correlated with the individual variability in word learning performance. These findings suggest that: (i) word learning relies on segregated connectivity patterns involving dorsal and ventral networks; and (ii) specifically, the dorsal auditory-premotor network connectivity strength is directly correlated with word learning performance. Copyright © 2015 Elsevier Inc. All rights reserved.
Bernstein, Lynne E.; Eberhardt, Silvio P.; Auer, Edward T.
2014-01-01
Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training. PMID:25206344
Bernstein, Lynne E; Eberhardt, Silvio P; Auer, Edward T
2014-01-01
Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training.
Fava, Eswen; Hull, Rachel; Bortfeld, Heather
2014-01-01
Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech). Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity. PMID:25116572
Carrillo, Facundo; Sigman, Mariano; Fernández Slezak, Diego; Ashton, Philip; Fitzgerald, Lily; Stroud, Jack; Nutt, David J; Carhart-Harris, Robin L
2018-04-01
Natural speech analytics has seen some improvements over recent years, and this has opened a window for objective and quantitative diagnosis in psychiatry. Here, we used a machine learning algorithm applied to natural speech to ask whether language properties measured before psilocybin for treatment-resistant can predict for which patients it will be effective and for which it will not. A baseline autobiographical memory interview was conducted and transcribed. Patients with treatment-resistant depression received 2 doses of psilocybin, 10 mg and 25 mg, 7 days apart. Psychological support was provided before, during and after all dosing sessions. Quantitative speech measures were applied to the interview data from 17 patients and 18 untreated age-matched healthy control subjects. A machine learning algorithm was used to classify between controls and patients and predict treatment response. Speech analytics and machine learning successfully differentiated depressed patients from healthy controls and identified treatment responders from non-responders with a significant level of 85% of accuracy (75% precision). Automatic natural language analysis was used to predict effective response to treatment with psilocybin, suggesting that these tools offer a highly cost-effective facility for screening individuals for treatment suitability and sensitivity. The sample size was small and replication is required to strengthen inferences on these results. Copyright © 2018 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Van Laere, E.; Braak, J.
2017-01-01
Text-to-speech technology can act as an important support tool in computer-based learning environments (CBLEs) as it provides auditory input, next to on-screen text. Particularly for students who use a language at home other than the language of instruction (LOI) applied at school, text-to-speech can be useful. The CBLE E-Validiv offers content in…
ERIC Educational Resources Information Center
Horgan, David James
2010-01-01
This dissertation study explored the efficacy of the SpeechEasy[R] device for individuals who are gainfully employed stutterers and who participated in workplace education learning activities. This study attempted to fill a gap in the literature regarding efficacy of the SpeechEasy[R] device. It employed a qualitative multiple unit case study…
ERIC Educational Resources Information Center
Mackler, Tobi; Savard, Theresa
Taking advantage of the opportunity to heighten cultural awareness and create an intercultural exchange, this paper presents two articles that provide a summary of the rationale, methodology, and assignments used to teach the linked courses of an introductory speech communication course and an English-as-a-Second-Language Oral Skills course. The…
ERIC Educational Resources Information Center
Perkell, Joseph S.
2013-01-01
Purpose: The author presents a view of research in speech motor control over the past 5 decades, as observed from within Ken Stevens's Speech Communication Group (SCG) in the Research Laboratory of Electronics at MIT. Method: The author presents a limited overview of some important developments and discoveries. The perspective is based…
ERIC Educational Resources Information Center
McCartney, Elspeth; Muir, Margaret
2017-01-01
School-leaving for pupils with long-term speech, language, swallowing or communication difficulties requires careful management. Speech and language therapists (SLTs) support communication, secure assistive technology and manage swallowing difficulties post-school. UK SLTs are employed by health services, with child SLT teams based in schools.…
ERIC Educational Resources Information Center
Ramírez-Esparza, Nairán; García-Sierra, Adrián; Kuhl, Patricia K.
2014-01-01
Language input is necessary for language learning, yet little is known about whether, in natural environments, the speech style and social context of language input to children impacts language development. In the present study we investigated the relationship between language input and language development, examining both the style of parental…
ERIC Educational Resources Information Center
Haskins Labs., New Haven, CT.
This document, containing 15 articles and 2 abstracts, is a report on the current status and progress of speech research. The following topics are investigated: phonological fusion, phonetic prerequisites for first-language learning, auditory and phonetic levels of processing, auditory short-term memory in vowel perception, hemispheric…
The Effect of Hearing Loss on Novel Word Learning in Infant- and Adult-Directed Speech.
Robertson, V Susie; von Hapsburg, Deborah; Hay, Jessica S
Relatively little is known about how young children with hearing impairment (HI) learn novel words in infant- and adult-directed speech (ADS). Infant-directed speech (IDS) supports word learning in typically developing infants relative to ADS. This study examined how children with normal hearing (NH) and children with HI learn novel words in IDS and ADS. It was predicted that IDS would support novel word learning in both groups of children. In addition, children with HI were expected to be less proficient word learners as compared with their NH peers. A looking-while-listening paradigm was used to measure novel word learning in 16 children with sensorineural HI (age range 23.2 to 42.1 months) who wore either bilateral hearing aids (n = 10) or bilateral cochlear implants (n = 6) and 16 children with NH (age range 23.1 to 42.1 months) who were matched for gender, chronological age, and maternal education level. Two measures of word learning were assessed (accuracy and reaction time). Each child participated in two experiments approximately 1 week apart, one in IDS and one in ADS. Both groups successfully learned the novel words in both speech type conditions, as evidenced by children looking at the correct picture significantly above chance. As a group, children with NH outperformed children with HI in the novel word learning task; however, there were no significant differences between performance on IDS versus ADS. More fine-grained time course analyses revealed that children with HI, and particularly children who use hearing aids, had more difficulty learning novel words in ADS, compared with children with NH. The pattern of results observed in the children with HI suggests that they may need extended support from clinicians and caregivers, through the use of IDS, during novel word learning. Future research should continue to focus on understanding the factors (e.g., device type and use, age of intervention, audibility, acoustic characteristics of input, etc.) that may influence word learning in children with HI in both IDS and ADS.
Neurophysiological Influence of Musical Training on Speech Perception
Shahin, Antoine J.
2011-01-01
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL. PMID:21716639
Neurophysiological influence of musical training on speech perception.
Shahin, Antoine J
2011-01-01
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL.
Infants with Williams syndrome detect statistical regularities in continuous speech.
Cashon, Cara H; Ha, Oh-Ryeong; Graf Estes, Katharine; Saffran, Jenny R; Mervis, Carolyn B
2016-09-01
Williams syndrome (WS) is a rare genetic disorder associated with delays in language and cognitive development. The reasons for the language delay are unknown. Statistical learning is a domain-general mechanism recruited for early language acquisition. In the present study, we investigated whether infants with WS were able to detect the statistical structure in continuous speech. Eighteen 8- to 20-month-olds with WS were familiarized with 2min of a continuous stream of synthesized nonsense words; the statistical structure of the speech was the only cue to word boundaries. They were tested on their ability to discriminate statistically-defined "words" and "part-words" (which crossed word boundaries) in the artificial language. Despite significant cognitive and language delays, infants with WS were able to detect the statistical regularities in the speech stream. These findings suggest that an inability to track the statistical properties of speech is unlikely to be the primary basis for the delays in the onset of language observed in infants with WS. These results provide the first evidence of statistical learning by infants with developmental delays. Copyright © 2016 Elsevier B.V. All rights reserved.
Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers
Mustafa, Mumtaz Begum; Salim, Siti Salwah; Mohamed, Noraini; Al-Qatab, Bassam; Siong, Chng Eng
2014-01-01
Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data. PMID:24466004
Quantum-chemical insights from deep tensor neural networks
Schütt, Kristof T.; Arbabzadah, Farhad; Chmiela, Stefan; Müller, Klaus R.; Tkatchenko, Alexandre
2017-01-01
Learning from data has led to paradigm shifts in a multitude of disciplines, including web, text and image search, speech recognition, as well as bioinformatics. Can machine learning enable similar breakthroughs in understanding quantum many-body systems? Here we develop an efficient deep learning approach that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems. We unify concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate (1 kcal mol−1) predictions in compositional and configurational chemical space for molecules of intermediate size. As an example of chemical relevance, the model reveals a classification of aromatic rings with respect to their stability. Further applications of our model for predicting atomic energies and local chemical potentials in molecules, reliable isomer energies, and molecules with peculiar electronic structure demonstrate the potential of machine learning for revealing insights into complex quantum-chemical systems. PMID:28067221
Quantum-chemical insights from deep tensor neural networks.
Schütt, Kristof T; Arbabzadah, Farhad; Chmiela, Stefan; Müller, Klaus R; Tkatchenko, Alexandre
2017-01-09
Learning from data has led to paradigm shifts in a multitude of disciplines, including web, text and image search, speech recognition, as well as bioinformatics. Can machine learning enable similar breakthroughs in understanding quantum many-body systems? Here we develop an efficient deep learning approach that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems. We unify concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate (1 kcal mol -1 ) predictions in compositional and configurational chemical space for molecules of intermediate size. As an example of chemical relevance, the model reveals a classification of aromatic rings with respect to their stability. Further applications of our model for predicting atomic energies and local chemical potentials in molecules, reliable isomer energies, and molecules with peculiar electronic structure demonstrate the potential of machine learning for revealing insights into complex quantum-chemical systems.
Quantum-chemical insights from deep tensor neural networks
NASA Astrophysics Data System (ADS)
Schütt, Kristof T.; Arbabzadah, Farhad; Chmiela, Stefan; Müller, Klaus R.; Tkatchenko, Alexandre
2017-01-01
Learning from data has led to paradigm shifts in a multitude of disciplines, including web, text and image search, speech recognition, as well as bioinformatics. Can machine learning enable similar breakthroughs in understanding quantum many-body systems? Here we develop an efficient deep learning approach that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems. We unify concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate (1 kcal mol-1) predictions in compositional and configurational chemical space for molecules of intermediate size. As an example of chemical relevance, the model reveals a classification of aromatic rings with respect to their stability. Further applications of our model for predicting atomic energies and local chemical potentials in molecules, reliable isomer energies, and molecules with peculiar electronic structure demonstrate the potential of machine learning for revealing insights into complex quantum-chemical systems.
GESTURE'S ROLE IN CREATING AND LEARNING LANGUAGE.
Goldin-Meadow, Susan
2010-09-22
Imagine a child who has never seen or heard language. Would such a child be able to invent a language? Despite what one might guess, the answer is "yes". This chapter describes children who are congenitally deaf and cannot learn the spoken language that surrounds them. In addition, the children have not been exposed to sign language, either by their hearing parents or their oral schools. Nevertheless, the children use their hands to communicate--they gesture--and those gestures take on many of the forms and functions of language (Goldin-Meadow 2003a). The properties of language that we find in these gestures are just those properties that do not need to be handed down from generation to generation, but can be reinvented by a child de novo. They are the resilient properties of language, properties that all children, deaf or hearing, come to language-learning ready to develop. In contrast to these deaf children who are inventing language with their hands, hearing children are learning language from a linguistic model. But they too produce gestures, as do all hearing speakers (Feyereisen and de Lannoy 1991; Goldin-Meadow 2003b; Kendon 1980; McNeill 1992). Indeed, young hearing children often use gesture to communicate before they use words. Interestingly, changes in a child's gestures not only predate but also predict changes in the child's early language, suggesting that gesture may be playing a role in the language-learning process. This chapter begins with a description of the gestures the deaf child produces without speech. These gestures assume the full burden of communication and take on a language-like form--they are language. This phenomenon stands in contrast to the gestures hearing speakers produce with speech. These gestures share the burden of communication with speech and do not take on a language-like form--they are part of language.
Incremental Lexical Learning in Speech Production: A Computational Model and Empirical Evaluation
ERIC Educational Resources Information Center
Oppenheim, Gary Michael
2011-01-01
Naming a picture of a dog primes the subsequent naming of a picture of a dog (repetition priming) and interferes with the subsequent naming of a picture of a cat (semantic interference). Behavioral studies suggest that these effects derive from persistent changes in the way that words are activated and selected for production, and some have…
Grounded theory as a method for research in speech and language therapy.
Skeat, J; Perry, A
2008-01-01
The use of qualitative methodologies in speech and language therapy has grown over the past two decades, and there is now a body of literature, both generally describing qualitative research, and detailing its applicability to health practice(s). However, there has been only limited profession-specific discussion of qualitative methodologies and their potential application to speech and language therapy. To describe the methodology of grounded theory, and to explain how it might usefully be applied to areas of speech and language research where theoretical frameworks or models are lacking. Grounded theory as a methodology for inductive theory-building from qualitative data is explained and discussed. Some differences between 'modes' of grounded theory are clarified and areas of controversy within the literature are highlighted. The past application of grounded theory to speech and language therapy, and its potential for informing research and clinical practice, are examined. This paper provides an in-depth critique of a qualitative research methodology, including an overview of the main difference between two major 'modes'. The article supports the application of a theory-building approach in the profession, which is sometimes complex to learn and apply, but worthwhile in its results. Grounded theory as a methodology has much to offer speech and language therapists and researchers. Although the majority of research and discussion around this methodology has rested within sociology and nursing, grounded theory can be applied by researchers in any field, including speech and language therapists. The benefit of the grounded theory method to researchers and practitioners lies in its application to social processes and human interactions. The resulting theory may support further research in the speech and language therapy profession.
Auditory Perceptual Learning in Adults with and without Age-Related Hearing Loss
Karawani, Hanin; Bitan, Tali; Attias, Joseph; Banai, Karen
2016-01-01
Introduction : Speech recognition in adverse listening conditions becomes more difficult as we age, particularly for individuals with age-related hearing loss (ARHL). Whether these difficulties can be eased with training remains debated, because it is not clear whether the outcomes are sufficiently general to be of use outside of the training context. The aim of the current study was to compare training-induced learning and generalization between normal-hearing older adults and those with ARHL. Methods : Fifty-six listeners (60–72 y/o), 35 participants with ARHL, and 21 normal hearing adults participated in the study. The study design was a cross over design with three groups (immediate-training, delayed-training, and no-training group). Trained participants received 13 sessions of home-based auditory training over the course of 4 weeks. Three adverse listening conditions were targeted: (1) Speech-in-noise, (2) time compressed speech, and (3) competing speakers, and the outcomes of training were compared between normal and ARHL groups. Pre- and post-test sessions were completed by all participants. Outcome measures included tests on all of the trained conditions as well as on a series of untrained conditions designed to assess the transfer of learning to other speech and non-speech conditions. Results : Significant improvements on all trained conditions were observed in both ARHL and normal-hearing groups over the course of training. Normal hearing participants learned more than participants with ARHL in the speech-in-noise condition, but showed similar patterns of learning in the other conditions. Greater pre- to post-test changes were observed in trained than in untrained listeners on all trained conditions. In addition, the ability of trained listeners from the ARHL group to discriminate minimally different pseudowords in noise also improved with training. Conclusions : ARHL did not preclude auditory perceptual learning but there was little generalization to untrained conditions. We suggest that most training-related changes occurred at higher level task-specific cognitive processes in both groups. However, these were enhanced by high quality perceptual representations in the normal-hearing group. In contrast, some training-related changes have also occurred at the level of phonemic representations in the ARHL group, consistent with an interaction between bottom-up and top-down processes. PMID:26869944
Poor phonetic perceivers are affected by cognitive load when resolving talker variability
Antoniou, Mark; Wong, Patrick C. M.
2015-01-01
Speech training paradigms aim to maximise learning outcomes by manipulating external factors such as talker variability. However, not all individuals may benefit from such manipulations because subject-external factors interact with subject-internal ones (e.g., aptitude) to determine speech perception and/or learning success. In a previous tone learning study, high-aptitude individuals benefitted from talker variability, whereas low-aptitude individuals were impaired. Because increases in cognitive load have been shown to hinder speech perception in mixed-talker conditions, it has been proposed that resolving talker variability requires cognitive resources. This proposal leads to the hypothesis that low-aptitude individuals do not use their cognitive resources as efficiently as those with high aptitude. Here, high- and low-aptitude subjects identified pitch contours spoken by multiple talkers under high and low cognitive load conditions established by a secondary task. While high-aptitude listeners outperformed low-aptitude listeners across load conditions, only low-aptitude listeners were impaired by increased cognitive load. The findings suggest that low-aptitude listeners either have fewer available cognitive resources or are poorer at allocating attention to the signal. Therefore, cognitive load is an important factor when considering individual differences in speech perception and training paradigms. PMID:26328675
Poor phonetic perceivers are affected by cognitive load when resolving talker variability.
Antoniou, Mark; Wong, Patrick C M
2015-08-01
Speech training paradigms aim to maximise learning outcomes by manipulating external factors such as talker variability. However, not all individuals may benefit from such manipulations because subject-external factors interact with subject-internal ones (e.g., aptitude) to determine speech perception and/or learning success. In a previous tone learning study, high-aptitude individuals benefitted from talker variability, whereas low-aptitude individuals were impaired. Because increases in cognitive load have been shown to hinder speech perception in mixed-talker conditions, it has been proposed that resolving talker variability requires cognitive resources. This proposal leads to the hypothesis that low-aptitude individuals do not use their cognitive resources as efficiently as those with high aptitude. Here, high- and low-aptitude subjects identified pitch contours spoken by multiple talkers under high and low cognitive load conditions established by a secondary task. While high-aptitude listeners outperformed low-aptitude listeners across load conditions, only low-aptitude listeners were impaired by increased cognitive load. The findings suggest that low-aptitude listeners either have fewer available cognitive resources or are poorer at allocating attention to the signal. Therefore, cognitive load is an important factor when considering individual differences in speech perception and training paradigms.
Learning words and learning sounds: Advances in language development.
Vihman, Marilyn M
2017-02-01
Phonological development is sometimes seen as a process of learning sounds, or forming phonological categories, and then combining sounds to build words, with the evidence taken largely from studies demonstrating 'perceptual narrowing' in infant speech perception over the first year of life. In contrast, studies of early word production have long provided evidence that holistic word learning may precede the formation of phonological categories. In that account, children begin by matching their existing vocal patterns to adult words, with knowledge of the phonological system emerging from the network of related word forms. Here I review evidence from production and then consider how the implicit and explicit learning mechanisms assumed by the complementary memory systems model might be understood as reconciling the two approaches. © 2016 The British Psychological Society.
ERIC Educational Resources Information Center
Padula, Janice
2016-01-01
According to the latest news about declining standards in mathematics learning in Australia, boys, and girls, in particular, need to be more engaged in mathematics learning. Only 30% of mathematics students at university level in Australia are female. Proofs are made up of words and mathematical symbols. One can assume the words would assist…
Massaro, Dominic W
2012-01-01
I review 2 seminal research reports published in this journal during its second decade more than a century ago. Given psychology's subdisciplines, they would not normally be reviewed together because one involves reading and the other speech perception. The small amount of interaction between these domains might have limited research and theoretical progress. In fact, the 2 early research reports revealed common processes involved in these 2 forms of language processing. Their illustration of the role of Wundt's apperceptive process in reading and speech perception anticipated descriptions of contemporary theories of pattern recognition, such as the fuzzy logical model of perception. Based on the commonalities between reading and listening, one can question why they have been viewed so differently. It is commonly believed that learning to read requires formal instruction and schooling, whereas spoken language is acquired from birth onward through natural interactions with people who talk. Most researchers and educators believe that spoken language is acquired naturally from birth onward and even prenatally. Learning to read, on the other hand, is not possible until the child has acquired spoken language, reaches school age, and receives formal instruction. If an appropriate form of written text is made available early in a child's life, however, the current hypothesis is that reading will also be learned inductively and emerge naturally, with no significant negative consequences. If this proposal is true, it should soon be possible to create an interactive system, Technology Assisted Reading Acquisition, to allow children to acquire literacy naturally.
How Autism Affects Speech Understanding in Multitalker Environments
2013-10-01
difficult than will typically- developing children. Knowing whether toddlers with ASD have difficulties processing speech in the presence of acoustic...to separate the speech of different talkers than do their typically- developing peers. We also predict that they will fail to exploit visual cues on...learn language from many settings in which children are typically placed. In addition, one of the cues that typically- developing listeners use to
Deep bottleneck features for spoken language identification.
Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong
2014-01-01
A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.
Graves, Judy
2007-03-01
The working context for speech and language therapists (SLTs) delivering interventions to adults who have a learning disability has changed following the reorganization of care provision from hospitals to the community. Consequently, SLTs often deliver their care within a social model of disability through indirect intervention in collaboration with carers. However, there has been little research into how this approach works in practice. To gain insight into the working context by identifying the key factors that influence indirect SLT interventions as perceived by SLTs and by paid carers from a range of service providers. To explore the implications of the results for the delivery of indirect SLT interventions and provide direction for further research. Semi-structured interviews were used to collect data from an opportunistic sample of five SLTs working in Community Learning Disability Teams (CLDTs) and 12 carers from residential and day care services who had had experience of working with SLTs. The data were analysed inductively using a grounded theory framework. Two broad themes emerged for SLTs: roles and expectations, and changing carer behaviour through training. The key themes for carers were roles and values, awareness of communication needs, and motivation and opportunity to implement interventions. Four broad factors are suggested as having the potential to influence indirect interventions: diversity in the working context; possible conflict between the guiding values of SLTs and carers, particularly residential carers; collaboration and support for implementation; and SLT doubts about the effectiveness of formal carer communication training. The results add to the evidence that the delivery of indirect speech and language therapy interventions to people with learning disabilities is a complex activity demanding specialist skills from SLTs. The findings suggest that these should include expertise in professional collaborative and relational skills, and training methods and strategies. Action research is needed to test the validity of the findings and document their impact on indirect interventions in day-to-day practice. More research is needed on the effectiveness of modelling or demonstration as a training technique with carers.
Genetics Home Reference: 48,XXYY syndrome
... degree of difficulty with speech and language development. Learning disabilities, especially those that are language-based, are very ... Autism Speaks CHADD: The National Resource on ADHD Learning Disabilities Association of America National Center for Learning Disabilities ...
Vertical transmission of learned signatures in a wild parrot
Berg, Karl S.; Delgado, Soraya; Cortopassi, Kathryn A.; Beissinger, Steven R.; Bradbury, Jack W.
2012-01-01
Learned birdsong is a widely used animal model for understanding the acquisition of human speech. Male songbirds often learn songs from adult males during sensitive periods early in life, and sing to attract mates and defend territories. In presumably all of the 350+ parrot species, individuals of both sexes commonly learn vocal signals throughout life to satisfy a wide variety of social functions. Despite intriguing parallels with humans, there have been no experimental studies demonstrating learned vocal production in wild parrots. We studied contact call learning in video-rigged nests of a well-known marked population of green-rumped parrotlets (Forpus passerinus) in Venezuela. Both sexes of naive nestlings developed individually unique contact calls in the nest, and we demonstrate experimentally that signature attributes are learned from both primary care-givers. This represents the first experimental evidence for the mechanisms underlying the transmission of a socially acquired trait in a wild parrot population. PMID:21752824
Fu, Szu-Wei; Li, Pei-Chun; Lai, Ying-Hui; Yang, Cheng-Chien; Hsieh, Li-Chun; Tsao, Yu
2017-11-01
Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients. Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients.
Generalized Adaptation to Dysarthric Speech
ERIC Educational Resources Information Center
Borrie, Stephanie A.; Lansford, Kaitlin L.; Barrett, Tyson S.
2017-01-01
Purpose: Generalization of perceptual learning has received limited attention in listener adaptation studies with dysarthric speech. This study investigated whether adaptation to a talker with dysarthria could be predicted by the nature of the listener's prior familiarization experience, specifically similarity of perceptual features, and level of…
ERIC Educational Resources Information Center
2000
This document contains the six of the seven keynote speeches from an international conference on vocational education and training (VET) for lifelong learning in the information era. "IVETA (International Vocational Education and Training Association) 2000 Conference 6-9 August 2000" (K.Y. Yeung) discusses the objectives and activities…
ERIC Educational Resources Information Center
Leaby, Margaret M.; Walsh, Irene P.
2008-01-01
The importance of learning about and applying clinical discourse analysis to enhance the talk in interaction in the speech-language pathology clinic is discussed. The benefits of analyzing clinical discourse to explicate therapy dynamics are described.
Do perceived context pictures automatically activate their phonological code?
Jescheniak, Jörg D; Oppermann, Frank; Hantsch, Ansgar; Wagner, Valentin; Mädebach, Andreas; Schriefers, Herbert
2009-01-01
Morsella and Miozzo (Morsella, E., & Miozzo, M. (2002). Evidence for a cascade model of lexical access in speech production. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 555-563) have reported that the to-be-ignored context pictures become phonologically activated when participants name a target picture, and took this finding as support for cascaded models of lexical retrieval in speech production. In a replication and extension of their experiment in German, we failed to obtain priming effects from context pictures phonologically related to a to-be-named target picture. By contrast, corresponding context words (i.e., the names of the respective pictures) and the same context pictures, when used in an identity condition, did reliably facilitate the naming process. This pattern calls into question the generality of the claim advanced by Morsella and Miozzo that perceptual processing of pictures in the context of a naming task automatically leads to the activation of corresponding lexical-phonological codes.
Outcomes of cochlear implantation in deaf children of deaf parents: comparative study.
Hassanzadeh, S
2012-10-01
This retrospective study compared the cochlear implantation outcomes of first- and second-generation deaf children. The study group consisted of seven deaf, cochlear-implanted children with deaf parents. An equal number of deaf children with normal-hearing parents were selected by matched sampling as a reference group. Participants were matched based on onset and severity of deafness, duration of deafness, age at cochlear implantation, duration of cochlear implantation, gender, and cochlear implant model. We used the Persian Auditory Perception Test for the Hearing Impaired, the Speech Intelligibility Rating scale, and the Sentence Imitation Test, in order to measure participants' speech perception, speech production and language development, respectively. Both groups of children showed auditory and speech development. However, the second-generation deaf children (i.e. deaf children of deaf parents) exceeded the cochlear implantation performance of the deaf children with hearing parents. This study confirms that second-generation deaf children exceed deaf children of hearing parents in terms of cochlear implantation performance. Encouraging deaf children to communicate in sign language from a very early age, before cochlear implantation, appears to improve their ability to learn spoken language after cochlear implantation.
Kakouros, Sofoklis; Räsänen, Okko
2016-09-01
Numerous studies have examined the acoustic correlates of sentential stress and its underlying linguistic functionality. However, the mechanism that connects stress cues to the listener's attentional processing has remained unclear. Also, the learnability versus innateness of stress perception has not been widely discussed. In this work, we introduce a novel perspective to the study of sentential stress and put forward the hypothesis that perceived sentence stress in speech is related to the unpredictability of prosodic features, thereby capturing the attention of the listener. As predictability is based on the statistical structure of the speech input, the hypothesis also suggests that stress perception is a result of general statistical learning mechanisms. To study this idea, computational simulations are performed where temporal prosodic trajectories are modeled with an n-gram model. Probabilities of the feature trajectories are subsequently evaluated on a set of novel utterances and compared to human perception of stress. The results show that the low-probability regions of F0 and energy trajectories are strongly correlated with stress perception, giving support to the idea that attention and unpredictability of sensory stimulus are mutually connected. Copyright © 2015 Cognitive Science Society, Inc.
Loebach, Jeremy L.; Pisoni, David B.; Svirsky, Mario A.
2009-01-01
Objective The objective of this study was to assess whether training on speech processed with an 8-channel noise vocoder to simulate the output of a cochlear implant would produce transfer of auditory perceptual learning to the recognition of non-speech environmental sounds, the identification of speaker gender, and the discrimination of talkers by voice. Design Twenty-four normal hearing subjects were trained to transcribe meaningful English sentences processed with a noise vocoder simulation of a cochlear implant. An additional twenty-four subjects served as an untrained control group and transcribed the same sentences in their unprocessed form. All subjects completed pre- and posttest sessions in which they transcribed vocoded sentences to provide an assessment of training efficacy. Transfer of perceptual learning was assessed using a series of closed-set, nonlinguistic tasks: subjects identified talker gender, discriminated the identity of pairs of talkers, and identified ecologically significant environmental sounds from a closed set of alternatives. Results Although both groups of subjects showed significant pre- to posttest improvements, subjects who transcribed vocoded sentences during training performed significantly better at posttest than subjects in the control group. Both groups performed equally well on gender identification and talker discrimination. Subjects who received explicit training on the vocoded sentences, however, performed significantly better on environmental sound identification than the untrained subjects. Moreover, across both groups, pretest speech performance, and to a higher degree posttest speech performance, were significantly correlated with environmental sound identification. For both groups, environmental sounds that were characterized as having more salient temporal information were identified more often than environmental sounds that were characterized as having more salient spectral information. Conclusions Listeners trained to identify noise-vocoded sentences showed evidence of transfer of perceptual learning to the identification of environmental sounds. In addition, the correlation between environmental sound identification and sentence transcription indicates that subjects who were better able to utilize the degraded acoustic information to identify the environmental sounds were also better able to transcribe the linguistic content of novel sentences. Both trained and untrained groups performed equally well (~75% correct) on the gender identification task, indicating that training did not have an effect on the ability to identify the gender of talkers. Although better than chance, performance on the talker discrimination task was poor overall (~55%), suggesting that either explicit training is required to reliably discriminate talkers’ voices, or that additional information (perhaps spectral in nature) not present in the vocoded speech is required to excel in such tasks. Taken together, the results suggest that while transfer of auditory perceptual learning with spectrally degraded speech does occur, explicit task-specific training may be necessary for tasks that cannot rely on temporal information alone. PMID:19773659
McNeil, M.R.; Katz, W.F.; Fossett, T.R.D.; Garst, D.M.; Szuminsky, N.J.; Carter, G.; Lim, K.Y.
2010-01-01
Apraxia of speech (AOS) is a motor speech disorder characterized by disturbed spatial and temporal parameters of movement. Research on motor learning suggests that augmented feedback may provide a beneficial effect for training movement. This study examined the effects of the presence and frequency of online augmented visual kinematic feedback (AVKF) and clinician-provided perceptual feedback on speech accuracy in 2 adults with acquired AOS. Within a single-subject multiple-baseline design, AVKF was provided using electromagnetic midsagittal articulography (EMA) in 2 feedback conditions (50 or 100%). Articulator placement was specified for speech motor targets (SMTs). Treated and baselined SMTs were in the initial or final position of single-syllable words, in varying consonant-vowel or vowel-consonant contexts. SMTs were selected based on each participant's pre-assessed erred productions. Productions were digitally recorded and online perceptual judgments of accuracy (including segment and intersegment distortions) were made. Inter- and intra-judge reliability for perceptual accuracy was high. Results measured by visual inspection and effect size revealed positive acquisition and generalization effects for both participants. Generalization occurred across vowel contexts and to untreated probes. Results of the frequency manipulation were confounded by presentation order. Maintenance of learned and generalized effects were demonstrated for 1 participant. These data provide support for the role of augmented feedback in treating speech movements that result in perceptually accurate speech production. Future investigations will explore the independent contributions of each feedback type (i.e. kinematic and perceptual) in producing efficient and effective training of SMTs in persons with AOS. PMID:20424468
Auditory processing disorders and problems with hearing-aid fitting in old age.
Antonelli, A R
1978-01-01
The hearing handicap experienced by elderly subjects depends only partially on end-organ impairment. Not only the neural unit loss along the central auditory pathways contributes to decreased speech discrimination, but also learning processes are slowed down. Diotic listening in elderly people seems to fasten learning of discrimination in critical conditions, as in the case of sensitized speech. This fact, and the binaural gain through the binaural release from masking, stress the superiority, on theoretical grounds, of binaural over monaural hearing-aid fitting.
Incidental learning of sound categories is impaired in developmental dyslexia.
Gabay, Yafit; Holt, Lori L
2015-12-01
Developmental dyslexia is commonly thought to arise from specific phonological impairments. However, recent evidence is consistent with the possibility that phonological impairments arise as symptoms of an underlying dysfunction of procedural learning. The nature of the link between impaired procedural learning and phonological dysfunction is unresolved. Motivated by the observation that speech processing involves the acquisition of procedural category knowledge, the present study investigates the possibility that procedural learning impairment may affect phonological processing by interfering with the typical course of phonetic category learning. The present study tests this hypothesis while controlling for linguistic experience and possible speech-specific deficits by comparing auditory category learning across artificial, nonlinguistic sounds among dyslexic adults and matched controls in a specialized first-person shooter videogame that has been shown to engage procedural learning. Nonspeech auditory category learning was assessed online via within-game measures and also with a post-training task involving overt categorization of familiar and novel sound exemplars. Each measure reveals that dyslexic participants do not acquire procedural category knowledge as effectively as age- and cognitive-ability matched controls. This difference cannot be explained by differences in perceptual acuity for the sounds. Moreover, poor nonspeech category learning is associated with slower phonological processing. Whereas phonological processing impairments have been emphasized as the cause of dyslexia, the current results suggest that impaired auditory category learning, general in nature and not specific to speech signals, could contribute to phonological deficits in dyslexia with subsequent negative effects on language acquisition and reading. Implications for the neuro-cognitive mechanisms of developmental dyslexia are discussed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Incidental Learning of Sound Categories is Impaired in Developmental Dyslexia
Gabay, Yafit; Holt, Lori L.
2015-01-01
Developmental dyslexia is commonly thought to arise from specific phonological impairments. However, recent evidence is consistent with the possibility that phonological impairments arise as symptoms of an underlying dysfunction of procedural learning. The nature of the link between impaired procedural learning and phonological dysfunction is unresolved. Motivated by the observation that speech processing involves the acquisition of procedural category knowledge, the present study investigates the possibility that procedural learning impairment may affect phonological processing by interfering with the typical course of phonetic category learning. The present study tests this hypothesis while controlling for linguistic experience and possible speech-specific deficits by comparing auditory category learning across artificial, nonlinguistic sounds among dyslexic adults and matched controls in a specialized first-person shooter videogame that has been shown to engage procedural learning. Nonspeech auditory category learning was assessed online via within-game measures and also with a post-training task involving overt categorization of familiar and novel sound exemplars. Each measure reveals that dyslexic participants do not acquire procedural category knowledge as effectively as age- and cognitive-ability matched controls. This difference cannot be explained by differences in perceptual acuity for the sounds. Moreover, poor nonspeech category learning is associated with slower phonological processing. Whereas phonological processing impairments have been emphasized as the cause of dyslexia, the current results suggest that impaired auditory category learning, general in nature and not specific to speech signals, could contribute to phonological deficits in dyslexia with subsequent negative effects on language acquisition and reading. Implications for the neuro-cognitive mechanisms of developmental dyslexia are discussed. PMID:26409017
Kumar, Veena; Croxson, Paula L; Simonyan, Kristina
2016-04-13
The laryngeal motor cortex (LMC) is essential for the production of learned vocal behaviors because bilateral damage to this area renders humans unable to speak but has no apparent effect on innate vocalizations such as human laughing and crying or monkey calls. Several hypotheses have been put forward attempting to explain the evolutionary changes from monkeys to humans that potentially led to enhanced LMC functionality for finer motor control of speech production. These views, however, remain limited to the position of the larynx area within the motor cortex, as well as its connections with the phonatory brainstem regions responsible for the direct control of laryngeal muscles. Using probabilistic diffusion tractography in healthy humans and rhesus monkeys, we show that, whereas the LMC structural network is largely comparable in both species, the LMC establishes nearly 7-fold stronger connectivity with the somatosensory and inferior parietal cortices in humans than in macaques. These findings suggest that important "hard-wired" components of the human LMC network controlling the laryngeal component of speech motor output evolved from an already existing, similar network in nonhuman primates. However, the evolution of enhanced LMC-parietal connections likely allowed for more complex synchrony of higher-order sensorimotor coordination, proprioceptive and tactile feedback, and modulation of learned voice for speech production. The role of the primary motor cortex in the formation of a comprehensive network controlling speech and language has been long underestimated and poorly studied. Here, we provide comparative and quantitative evidence for the significance of this region in the control of a highly learned and uniquely human behavior: speech production. From the viewpoint of structural network organization, we discuss potential evolutionary advances of enhanced temporoparietal cortical connections with the laryngeal motor cortex in humans compared with nonhuman primates that may have contributed to the development of finer vocal motor control necessary for speech production. Copyright © 2016 the authors 0270-6474/16/364170-12$15.00/0.
ERIC Educational Resources Information Center
Oppenheim, Gary M.; Dell, Gary S.; Schwartz, Myrna F.
2010-01-01
Naming a picture of a dog primes the subsequent naming of a picture of a dog (repetition priming) and interferes with the subsequent naming of a picture of a cat (semantic interference). Behavioral studies suggest that these effects derive from persistent changes in the way that words are activated and selected for production, and some have…
Weir, Kristy A
2008-01-01
Speech pathology students readily identify the importance of a sound understanding of anatomical structures central to their intended profession. In contrast, they often do not recognize the relevance of a broader understanding of structure and function. This study aimed to explore students' perceptions of the relevance of anatomy to speech pathology. The effect of two learning activities on students' perceptions was also evaluated. First, a written assignment required students to illustrate the relevance of anatomy to speech pathology by using an example selected from one of the four alternative structures. The second approach was the introduction of brief "scenarios" with directed questions into the practical class. The effects of these activities were assessed via two surveys designed to evaluate students' perceptions of the relevance of anatomy before and during the course experience. A focus group was conducted to clarify and extend discussion of issues arising from the survey data. The results showed that the students perceived some course material as irrelevant to speech pathology. The importance of relevance to the students' "state" motivation was well supported by the data. Although the students believed that the learning activities helped their understanding of the relevance of anatomy, some structures were considered less relevant at the end of the course. It is likely that the perceived amount of content and surface approach to learning may have prevented students from "thinking outside the box" regarding which anatomical structures are relevant to the profession.
NASA Astrophysics Data System (ADS)
Sato, Hayato; Ota, Ryo; Morimoto, Masayuki; Sato, Hiroshi
2005-04-01
Assessing sound environment of classrooms for the aged is a very important issue, because classrooms can be used by the aged for their lifelong learning, especially in the aged society. Hence hearing loss due to aging is a considerable factor for classrooms. In this study, the optimal speech level in noisy fields for both young adults and aged persons was investigated. Listening difficulty ratings and word intelligibility scores for familiar words were used to evaluate speech transmission performance. The results of the tests demonstrated that the optimal speech level for moderate background noise (i.e., less than around 60 dBA) was fairly constant. Meanwhile, the optimal speech level depended on the speech-to-noise ratio when the background noise level exceeded around 60 dBA. The minimum required speech level to minimize difficulty ratings for the aged was higher than that for the young. However, the minimum difficulty ratings for both the young and the aged were given in the range of speech level of 70 to 80 dBA of speech level.
ERIC Educational Resources Information Center
Haskins Labs., New Haven, CT.
This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical implications. Manuscripts and extended reports cover the following topics: (1) "On Learning a New Contrast," (2) "Letter Confusions and reversals of Sequence in the Beginning Reader:…
Five Lectures on Artificial Intelligence
1974-09-01
large systems The current projects on speech understanding (which I will describe iater) are an exception to this, dealing explicitly with the problem...learns that "Fred lives in Sydney", we must find some new fact to resolve the tension — 1 SPEECH UNDERSTANDING SYSTEMS perhaps he lives in a zco It is...possible Speech Understanding Systems Most of the problems described above might be characterized as relating to the chunking of knowledge. Such ideas are
The Struggle with Hate Speech. Teaching Strategy.
ERIC Educational Resources Information Center
Bloom, Jennifer
1995-01-01
Discusses the issue of hate-motivated violence and special laws aimed at deterrence. Presents a secondary school lesson to help students define hate speech and understand constitutional issues related to the topic. Includes three student handouts, student learning objectives, instructional procedures, and a discussion guide. (CFR)
Discharge experiences of speech-language pathologists working in Cyprus and Greece.
Kambanaros, Maria
2010-08-01
Post-termination relationships are complex because the client may need additional services and it may be difficult to determine when the speech-language pathologist-client relationship is truly terminated. In my contribution to this scientific forum, discharge experiences from speech-language pathologists working in Cyprus and Greece will be explored in search of commonalities and differences in the way in which pathologists end therapy from different cultural perspectives. Within this context the personal impact on speech-language pathologists of the discharge process will be highlighted. Inherent in this process is how speech-language pathologists learn to hold their feelings, anxieties and reactions when communicating discharge to clients. Overall speech-language pathologists working in Cyprus and Greece experience similar emotional responses to positive and negative therapy endings as speech-language pathologists working in Australia. The major difference is that Cypriot and Greek therapists face serious limitations in moving their clients on after therapy has ended.
Massed versus distributed practice in learned improvement of speech intelligibility.
DOT National Transportation Integrated Search
1976-03-01
Student pilots or new air traffic controllers have two ways to learn to understand the noisy and distorted communications common to aircraft operations: they may learn as they are working on other aspects of the activity, or they may learn by devotin...
All words are not created equal: Expectations about word length guide infant statistical learning
Lew-Williams, Casey; Saffran, Jenny R.
2011-01-01
Infants have been described as ‘statistical learners’ capable of extracting structure (such as words) from patterned input (such as language). Here, we investigated whether prior knowledge influences how infants track transitional probabilities in word segmentation tasks. Are infants biased by prior experience when engaging in sequential statistical learning? In a laboratory simulation of learning across time, we exposed 9- and 10-month-old infants to a list of either bisyllabic or trisyllabic nonsense words, followed by a pause-free speech stream composed of a different set of bisyllabic or trisyllabic nonsense words. Listening times revealed successful segmentation of words from fluent speech only when words were uniformly bisyllabic or trisyllabic throughout both phases of the experiment. Hearing trisyllabic words during the pre-exposure phase derailed infants’ abilities to segment speech into bisyllabic words, and vice versa. We conclude that prior knowledge about word length equips infants with perceptual expectations that facilitate efficient processing of subsequent language input. PMID:22088408
Deep Learning: A Primer for Radiologists.
Chartrand, Gabriel; Cheng, Phillip M; Vorontsov, Eugene; Drozdzal, Michal; Turcotte, Simon; Pal, Christopher J; Kadoury, Samuel; Tang, An
2017-01-01
Deep learning is a class of machine learning methods that are gaining success and attracting interest in many domains, including computer vision, speech recognition, natural language processing, and playing games. Deep learning methods produce a mapping from raw inputs to desired outputs (eg, image classes). Unlike traditional machine learning methods, which require hand-engineered feature extraction from inputs, deep learning methods learn these features directly from data. With the advent of large datasets and increased computing power, these methods can produce models with exceptional performance. These models are multilayer artificial neural networks, loosely inspired by biologic neural systems. Weighted connections between nodes (neurons) in the network are iteratively adjusted based on example pairs of inputs and target outputs by back-propagating a corrective error signal through the network. For computer vision tasks, convolutional neural networks (CNNs) have proven to be effective. Recently, several clinical applications of CNNs have been proposed and studied in radiology for classification, detection, and segmentation tasks. This article reviews the key concepts of deep learning for clinical radiologists, discusses technical requirements, describes emerging applications in clinical radiology, and outlines limitations and future directions in this field. Radiologists should become familiar with the principles and potential applications of deep learning in medical imaging. © RSNA, 2017.
Perceptual Learning and Auditory Training in Cochlear Implant Recipients
Fu, Qian-Jie; Galvin, John J.
2007-01-01
Learning electrically stimulated speech patterns can be a new and difficult experience for cochlear implant (CI) recipients. Recent studies have shown that most implant recipients at least partially adapt to these new patterns via passive, daily-listening experiences. Gradually introducing a speech processor parameter (eg, the degree of spectral mismatch) may provide for more complete and less stressful adaptation. Although the implant device restores hearing sensation and the continued use of the implant provides some degree of adaptation, active auditory rehabilitation may be necessary to maximize the benefit of implantation for CI recipients. Currently, there are scant resources for auditory rehabilitation for adult, postlingually deafened CI recipients. We recently developed a computer-assisted speech-training program to provide the means to conduct auditory rehabilitation at home. The training software targets important acoustic contrasts among speech stimuli, provides auditory and visual feedback, and incorporates progressive training techniques, thereby maintaining recipients’ interest during the auditory training exercises. Our recent studies demonstrate the effectiveness of targeted auditory training in improving CI recipients’ speech and music perception. Provided with an inexpensive and effective auditory training program, CI recipients may find the motivation and momentum to get the most from the implant device. PMID:17709574
Neural Entrainment to Rhythmically Presented Auditory, Visual, and Audio-Visual Speech in Children
Power, Alan James; Mead, Natasha; Barnes, Lisa; Goswami, Usha
2012-01-01
Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal “samples” of information from the speech stream at different rates, phase resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (“phase locking”). Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate) based on repetition of the syllable “ba,” presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a “talking head”). To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the “ba” stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a “ba” in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling, such as dyslexia. PMID:22833726
Thinking About Better Speech: Mental Practice for Stroke-Induced Motor Speech Impairments
Page, Stephen J.; Harnish, Stacy
2012-01-01
Background Mental practice (MP) is a mind-body technique in which physical movements are cognitively rehearsed. It has shown efficacy in reducing the severity of a number of neurological impairments. Aims In the present review, we highlight recent developments in MP research, and the basis for MP use after stroke-induced motor speech disorders. Main Contribution In this review, we: (a) propose a novel conceptual model regarding the development of learned nonuse in people with motor speech impairments; (b) review the rationale and efficacy of MP for reducing the severity of stroke-induced impairments; (c) review evidence demonstrating muscular and neural activations during and following MP use; (d) review evidence showing that MP increases skill acquisition, use, and function in stroke; (e) review literature regarding neuroplasticity after stroke, including MP-induced neuroplasticity and the neural substrates underlying motor and language reacquisition; and (f) based on the above, review the rationale and clinical application of MP for stroke-induced motor speech impairments. Conclusions Support for MP use includes decades of MP neurobiological and behavioral efficacy data in a number of populations. Most recently, these data have expanded to the application of MP in neurological populations. Given increasingly demanding managed care environments, efficacious strategies that can be easily administered are needed. We also encounter clinicians who aspire to use MP, but their protocols do not contain several of the elements shown to be fundamental to effective MP implementation. Given shortfalls of some conventional aphasia and motor speech rehabilitative techniques, and uncertainty regarding optimal MP implementation, this paper introduces the neurophysiologic bases for MP, the evidence for MP use in stroke rehabilitation, and discusses its applications and considerations in patients with stroke-induced motor speech impairments. PMID:22308050
Learning atoms for materials discovery.
Zhou, Quan; Tang, Peizhe; Liu, Shenxiu; Pan, Jinbo; Yan, Qimin; Zhang, Shou-Cheng
2018-06-26
Exciting advances have been made in artificial intelligence (AI) during recent decades. Among them, applications of machine learning (ML) and deep learning techniques brought human-competitive performances in various tasks of fields, including image recognition, speech recognition, and natural language understanding. Even in Go, the ancient game of profound complexity, the AI player has already beat human world champions convincingly with and without learning from the human. In this work, we show that our unsupervised machines (Atom2Vec) can learn the basic properties of atoms by themselves from the extensive database of known compounds and materials. These learned properties are represented in terms of high-dimensional vectors, and clustering of atoms in vector space classifies them into meaningful groups consistent with human knowledge. We use the atom vectors as basic input units for neural networks and other ML models designed and trained to predict materials properties, which demonstrate significant accuracy. Copyright © 2018 the Author(s). Published by PNAS.
Phonological Awareness and Speech Comprehensibility: An Exploratory Study
ERIC Educational Resources Information Center
Venkatagiri, H. S.; Levis, John M.
2007-01-01
This study examined whether differences in phonological awareness were related to differences in speech comprehensibility. Seventeen adults who learned English as a foreign language (EFL) in academic settings completed 14 tests of phonological awareness that measured their explicit knowledge of English phonological structures, and three tests of…
Random Deep Belief Networks for Recognizing Emotions from Speech Signals.
Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang
2017-01-01
Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
Random Deep Belief Networks for Recognizing Emotions from Speech Signals
Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang
2017-01-01
Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition. PMID:28356908
Articulatory Control in Childhood Apraxia of Speech in a Novel Word-Learning Task.
Case, Julie; Grigos, Maria I
2016-12-01
Articulatory control and speech production accuracy were examined in children with childhood apraxia of speech (CAS) and typically developing (TD) controls within a novel word-learning task to better understand the influence of planning and programming deficits in the production of unfamiliar words. Participants included 16 children between the ages of 5 and 6 years (8 CAS, 8 TD). Short- and long-term changes in lip and jaw movement, consonant and vowel accuracy, and token-to-token consistency were measured for 2 novel words that differed in articulatory complexity. Children with CAS displayed short- and long-term changes in consonant accuracy and consistency. Lip and jaw movements did not change over time. Jaw movement duration was longer in children with CAS than in TD controls. Movement stability differed between low- and high-complexity words in both groups. Children with CAS displayed a learning effect for consonant accuracy and consistency. Lack of change in movement stability may indicate that children with CAS require additional practice to demonstrate changes in speech motor control, even within production of novel word targets with greater consonant and vowel accuracy and consistency. The longer movement duration observed in children with CAS is believed to give children additional time to plan and program movements within a novel skill.
A systematic review of treatment intensity in speech disorders.
Kaipa, Ramesh; Peterson, Abigail Marie
2016-12-01
Treatment intensity (sometimes referred to as "practice amount") has been well-investigated in learning non-speech tasks, but its role in treating speech disorders has not been largely analysed. This study reviewed the literature regarding treatment intensity in speech disorders. A systematic search was conducted in four databases using appropriate search terms. Seven articles from a total of 580 met the inclusion criteria. The speech disorders investigated included speech sound disorders, dysarthria, acquired apraxia of speech and childhood apraxia of speech. All seven studies were evaluated for their methodological quality, research phase and evidence level. Evidence level of reviewed studies ranged from moderate to strong. With regard to the research phase, only one study was considered to be phase III research, which corresponds to the controlled trial phase. The remaining studies were considered to be phase II research, which corresponds to the phase where magnitude of therapeutic effect is assessed. Results suggested that higher treatment intensity was favourable over lower treatment intensity of specific treatment technique(s) for treating childhood apraxia of speech and speech sound (phonological) disorders. Future research should incorporate randomised-controlled designs to establish optimal treatment intensity that is specific to each of the speech disorders.
Schädler, Marc R; Warzybok, Anna; Kollmeier, Birger
2018-01-01
The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than -20 dB could not be predicted.
Schädler, Marc R.; Warzybok, Anna; Kollmeier, Birger
2018-01-01
The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than −20 dB could not be predicted. PMID:29692200
Levi, Susannah V.; Winters, Stephen J.; Pisoni, David B.
2011-01-01
Previous research has shown that familiarity with a talker’s voice can improve linguistic processing (herein, “Familiar Talker Advantage”), but this benefit is constrained by the context in which the talker’s voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers’ voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers’ voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. PMID:22225059
Jørgensen, Søren; Dau, Torsten
2011-09-01
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America
D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette
2016-08-01
Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.
Visual speech segmentation: using facial cues to locate word boundaries in continuous speech
Mitchel, Aaron D.; Weiss, Daniel J.
2014-01-01
Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition. PMID:25018577
Shiue, Ivy
2016-12-01
Rarely do we know the perception toward neighbourhoods in people specifically with health conditions. Therefore, the aim of the present study was to understand the perception toward neighbourhoods among adults with a series of the existing health conditions in a country-wide and population-based setting. Data were retrieved from and analysed in Scottish Household Survey, 2007-2008. Information on demographics, self-reported health conditions and perception toward neighbourhoods and the surrounding facilities was obtained by household interview. Analysis including chi-square test, t test and logistic regression modelling were performed. Of 19,150 Scottish adults (aged 16-80) included in the study cohort, 1079 (7.7 %) people were dissatisfied with their living areas; particularly for those who experienced harassment (15.4 %), did not recycle or with dyslexia, chest, digestive, mental and musculoskeletal problems. Twenty to forty per cent reported common neighbourhood problems including noise, rubbish, disputes, graffiti, harassment and drug misuse. People with heart or digestive problems were more dissatisfied with the existing parks and open space. People with arthritis, chest or hearing problems were more dissatisfied with the waste management condition. People with dyslexia were more dissatisfied with the existing public transportation. People with heart problems were more dissatisfied with the current street cleaning condition. People with hearing, vision, speech, learning problems or dyslexia were also more dissatisfied with sports and recreational facilities. People with heart, chest, skin, digestive, musculoskeletal, vision, learning, speech and mental disorders and dyslexia were more dissatisfied with their current neighbourhood environments. Upgrading neighbourhood planning to tackle social environment injustice and put pleasant life experience as priorty would be suggested. Graphical abstract interrelations of individual health and neighbourhood health.
The predictive roles of neural oscillations in speech motor adaptability.
Sengupta, Ranit; Nasir, Sazzad M
2016-06-01
The human speech system exhibits a remarkable flexibility by adapting to alterations in speaking environments. While it is believed that speech motor adaptation under altered sensory feedback involves rapid reorganization of speech motor networks, the mechanisms by which different brain regions communicate and coordinate their activity to mediate adaptation remain unknown, and explanations of outcome differences in adaption remain largely elusive. In this study, under the paradigm of altered auditory feedback with continuous EEG recordings, the differential roles of oscillatory neural processes in motor speech adaptability were investigated. The predictive capacities of different EEG frequency bands were assessed, and it was found that theta-, beta-, and gamma-band activities during speech planning and production contained significant and reliable information about motor speech adaptability. It was further observed that these bands do not work independently but interact with each other suggesting an underlying brain network operating across hierarchically organized frequency bands to support motor speech adaptation. These results provide novel insights into both learning and disorders of speech using time frequency analysis of neural oscillations. Copyright © 2016 the American Physiological Society.
Interpersonal Learning Systems for National Speech-Communication.
ERIC Educational Resources Information Center
Heinberg, Paul
A consensus has prevailed among educators that Americans of verying ethnic, social, cultural, and linguistic backgrounds who must communicate with each other in social, academic, and occupational situations might achieve a greater degree of rapport if the dialect of the English mutually spoken and the speech mannerisms used were standardized.…
Sociolinguistics and Language Acquisition.
ERIC Educational Resources Information Center
Wolfson, Nessa, Ed.; Judd, Elliot, Ed.
The following are included in this collection of essays on patterns of rules of speaking, and sociolinguistics and second language learning and teaching: "How to Tell When Someone Is Saying 'No' Revisited" (Joan Rubin); "Apology: A Speech-Act Set" (Elite Olshtain and Andrew Cohen); "Interpreting and Performing Speech Acts in a Second Language: A…
Quantitative Endoscopic Phototransducer Investigation of Normal Velopharyngeal Physiology
ERIC Educational Resources Information Center
Karnell, Michael P.; Moon, Jerald B.; Nakajima, Kengo; Kacmarynski, Deborah S.
2016-01-01
Purpose: The purpose of this research was to learn the extent to which healthy individuals vary in their ability to achieve velopharyngeal closure for speech. Method: Twenty healthy adult volunteers (10 women, 10 men) were tested using an endoscopic phototransducer system that tracks variations in velopharyngeal closure during speech production.…
Prenatal Alcohol and Cocaine Exposure: Influences on Cognition, Speech, Language, and Hearing
ERIC Educational Resources Information Center
Cone-Wesson, B.
2005-01-01
This paper reviews research on the consequences of prenatal exposure to alcohol and cocaine on children's speech, language, hearing, and cognitive development. The review shows that cognitive impairment, learning disabilities, and behavioral disorders are the central nervous system manifestations of fetal alcohol syndrome (FAS), and cranio-facial…
Learning Resources for the Secondary Speech Communication Classroom.
ERIC Educational Resources Information Center
Wolvin, Andrew D.
1974-01-01
New print and nonprint resources for secondary level classroom use are available in the field of speech communication, which has become process oriented with continual interaction between speaker and listener. Of five specific books, three provide valuable resource material for teachers, focusing on practical teaching suggestions and the necessity…
Status Report on Speech Research. January-June 1990.
ERIC Educational Resources Information Center
Studdert-Kennedy, Michael, Ed.
This collection of articles is one of a series of semiannual reports on the status of speech research at Haskins' Laboratories. The titles of the 18 articles and their authors are as follows: "The Alphabetic Principle and Learning to Read" (Isabelle Y. Liberman and others); "Language Development from an Evolutionary…
La parole, vue et prise par les etudiants (Speech as Seen and Understood by Student).
ERIC Educational Resources Information Center
Gajo, Laurent, Ed.; Jeanneret, Fabrice, Ed.
1998-01-01
Articles on speech and second language learning include: "Les sequences de correction en classe de langue seconde: evitement du 'non' explicite" ("Error Correction Sequences in Second Language Class: Avoidance of the Explicit 'No'") (Anne-Lise de Bosset); "Analyse hierarchique et fonctionnelle du discours: conversations…