speech samples produced: Topics by Science.gov

Sample records for speech samples produced

Methodological Choices in Rating Speech Samples

ERIC Educational Resources Information Center

O'Brien, Mary Grantham

2016-01-01

Much pronunciation research critically relies upon listeners' judgments of speech samples, but researchers have rarely examined the impact of methodological choices. In the current study, 30 German native listeners and 42 German L2 learners (L1 English) rated speech samples produced by English-German L2 learners along three continua: accentedness,…
Stuttering on function words in bilingual children who stutter: A preliminary study.

PubMed

Gkalitsiou, Zoi; Byrd, Courtney T; Bedore, Lisa M; Taliancich-Klinger, Casey L

2017-01-01

Evidence suggests young monolingual children who stutter (CWS) are more disfluent on function than content words, particularly when produced in the initial utterance position. The purpose of the present preliminary study was to investigate whether young bilingual CWS present with this same pattern. The narrative and conversational samples of four bilingual Spanish- and English-speaking CWS were analysed. All four bilingual participants produced significantly more stuttering on function words compared to content words, irrespective of their position in the utterance, in their Spanish narrative and conversational speech samples. Three of the four participants also demonstrated more stuttering on function compared to content words in their narrative speech samples in English, but only one participant produced more stuttering on function than content words in her English conversational sample. These preliminary findings are discussed relative to linguistic planning and language proficiency and their potential contribution to stuttered speech.
Asynchronous sampling of speech with some vocoder experimental results

NASA Technical Reports Server (NTRS)

Babcock, M. L.

1972-01-01

The method of asynchronously sampling speech is based upon the derivatives of the acoustical speech signal. The following results are apparent from experiments to date: (1) It is possible to represent speech by a string of pulses of uniform amplitude, where the only information contained in the string is the spacing of the pulses in time; (2) the string of pulses may be produced in a simple analog manner; (3) the first derivative of the original speech waveform is the most important for the encoding process; (4) the resulting pulse train can be utilized to control an acoustical signal production system to regenerate the intelligence of the original speech.
The Effectiveness of SpeechEasy during Situations of Daily Living

ERIC Educational Resources Information Center

O'Donnell, Jennifer J.; Armson, Joy; Kiefte, Michael

2008-01-01

A multiple single-subject design was used to examine the effects of SpeechEasy on stuttering frequency in the laboratory and in longitudinal samples of speech produced in situations of daily living (SDL). Seven adults who stutter participated, all of whom had exhibited at least 30% reduction in stuttering frequency while using SpeechEasy during…
Monkey vocal tracts are speech-ready.

PubMed

Fitch, W Tecumseh; de Boer, Bart; Mathur, Neil; Ghazanfar, Asif A

2016-12-01

For four decades, the inability of nonhuman primates to produce human speech sounds has been claimed to stem from limitations in their vocal tract anatomy, a conclusion based on plaster casts made from the vocal tract of a monkey cadaver. We used x-ray videos to quantify vocal tract dynamics in living macaques during vocalization, facial displays, and feeding. We demonstrate that the macaque vocal tract could easily produce an adequate range of speech sounds to support spoken language, showing that previous techniques based on postmortem samples drastically underestimated primate vocal capabilities. Our findings imply that the evolution of human speech capabilities required neural changes rather than modifications of vocal anatomy. Macaques have a speech-ready vocal tract but lack a speech-ready brain to control it.
Phonology and Vocal Behavior in Toddlers with Autism Spectrum Disorders

PubMed Central

Schoen, Elizabeth; Paul, Rhea; Chawarska, Katyrzyna

2011-01-01

Scientific Abstract The purpose of this study is to examine the phonological and other vocal productions of children, 18-36 months, with autism spectrum disorder (ASD) and to compare these productions to those of age-matched and language-matched controls. Speech samples were obtained from 30 toddlers with ASD, 11 age-matched toddlers and 23 language-matched toddlers during either parent-child or clinician-child play sessions. Samples were coded for a variety of speech-like and non-speech vocalization productions. Toddlers with ASD produced speech-like vocalizations similar to those of language-matched peers, but produced significantly more atypical non-speech vocalizations when compared to both control groups.Toddlers with ASD show speech-like sound production that is linked to their language level, in a manner similar to that seen in typical development. The main area of difference in vocal development in this population is in the production of atypical vocalizations. Findings suggest that toddlers with autism spectrum disorders might not tune into the language model of their environment. Failure to attend to the ambient language environment negatively impacts the ability to acquire spoken language. PMID:21308998
An Experimental Investigation of the Effect of Altered Auditory Feedback on the Conversational Speech of Adults Who Stutter

ERIC Educational Resources Information Center

Lincoln, Michelle; Packman, Ann; Onslow, Mark; Jones, Mark

2010-01-01

Purpose: To investigate the impact on percentage of syllables stuttered of various durations of delayed auditory feedback (DAF), levels of frequency-altered feedback (FAF), and masking auditory feedback (MAF) during conversational speech. Method: Eleven adults who stuttered produced 10-min conversational speech samples during a control condition…
A Wavelet Model for Vocalic Speech Coarticulation

DTIC Science & Technology

1994-10-01

control vowel’s signal as the mother wavelet. A practical experiment is conducted to evaluate the coarticulation channel using samples 01 real speech...transformation from a control speech state (input) to an effected speech state (output). Specifically, a vowel produced in isolation is transformed into an...the wavelet transform of the effected vowel’s signal, using the control vowel’s signal as the mother wavelet. A practical experiment is conducted to
Using on-line altered auditory feedback treating Parkinsonian speech

NASA Astrophysics Data System (ADS)

Wang, Emily; Verhagen, Leo; de Vries, Meinou H.

2005-09-01

Patients with advanced Parkinson's disease tend to have dysarthric speech that is hesitant, accelerated, and repetitive, and that is often resistant to behavior speech therapy. In this pilot study, the speech disturbances were treated using on-line altered feedbacks (AF) provided by SpeechEasy (SE), an in-the-ear device registered with the FDA for use in humans to treat chronic stuttering. Eight PD patients participated in the study. All had moderate to severe speech disturbances. In addition, two patients had moderate recurring stuttering at the onset of PD after long remission since adolescence, two had bilateral STN DBS, and two bilateral pallidal DBS. An effective combination of delayed auditory feedback and frequency-altered feedback was selected for each subject and provided via SE worn in one ear. All subjects produced speech samples (structured-monologue and reading) under three conditions: baseline, with SE without, and with feedbacks. The speech samples were randomly presented and rated for speech intelligibility goodness using UPDRS-III item 18 and the speaking rate. The results indicted that SpeechEasy is well tolerated and AF can improve speech intelligibility in spontaneous speech. Further investigational use of this device for treating speech disorders in PD is warranted [Work partially supported by Janus Dev. Group, Inc.].
Phonological Acquisition of Korean Consonants in Conversational Speech Produced by Young Korean Children

ERIC Educational Resources Information Center

Kim, Minjung; Kim, Soo-Jin; Stoel-Gammon, Carol

2017-01-01

This study investigates the phonological acquisition of Korean consonants using conversational speech samples collected from sixty monolingual typically developing Korean children aged two, three, and four years. Phonemic acquisition was examined for syllable-initial and syllable-final consonants. Results showed that Korean children acquired stops…
The Influence of Native Language on Auditory-Perceptual Evaluation of Vocal Samples Completed by Brazilian and Canadian SLPs.

PubMed

Chaves, Cristiane Ribeiro; Campbell, Melanie; Côrtes Gama, Ana Cristina

2017-03-01

This study aimed to determine the influence of native language on the auditory-perceptual assessment of voice, as completed by Brazilian and Anglo-Canadian listeners using Brazilian vocal samples and the grade, roughness, breathiness, asthenia, strain (GRBAS) scale. This is an analytical, observational, comparative, and transversal study conducted at the Speech Language Pathology Department of the Federal University of Minas Gerais in Brazil, and at the Communication Sciences and Disorders Department of the University of Alberta in Canada. The GRBAS scale, connected speech, and a sustained vowel were used in this study. The vocal samples were drawn randomly from a database of recorded speech of Brazilian adults, some with healthy voices and some with voice disorders. The database is housed at the Federal University of Minas Gerais. Forty-six samples of connected speech (recitation of days of the week), produced by 35 women and 11 men, and 46 samples of the sustained vowel /a/, produced by 37 women and 9 men, were used in this study. The listeners were divided into two groups of three speech therapists, according to nationality: Brazilian or Anglo-Canadian. The groups were matched according to the years of professional experience of participants. The weighted kappa was used to calculate the intra- and inter-rater agreements, with 95% confidence intervals, respectively. An analysis of the intra-rater agreement showed that Brazilians and Canadians had similar results in auditory-perceptual evaluation of sustained vowel and connected speech. The results of the inter-rater agreement of connected speech and sustained vowel indicated that Brazilians and Canadians had, respectively, moderate agreement on the overall severity (0.57 and 0.50), breathiness (0.45 and 0.45), and asthenia (0.50 and 0.46); poor correlation on roughness (0.19 and 0.007); and weak correlation on strain to connected speech (0.22), and moderate correlation to sustained vowel (0.50). In general, auditory-perceptual evaluation is not influenced by the native language on most dimensions of the perceptual parameters of the GRBAS scale. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
The enhancement of beneficial effects following audio feedback by cognitive preparation in the treatment of social anxiety: a single-session experiment.

PubMed

Nilsson, Jan-Erik; Lundh, Lars-Gunnar; Faghihi, Shahriar; Roth-Andersson, Gun

2011-12-01

According to cognitive models, negatively biased processing of the publicly observable self is an important aspect of social phobia; if this is true, effective methods for producing corrective feedback concerning the public self should be strived for. Video feedback is proven effective, but since one's voice represents another aspect of the self, audio feedback should produce equivalent results. This is the first study to assess the enhancement of audio feedback by cognitive preparation in a single-session randomized controlled experiment. Forty socially anxious participants were asked to give a speech, then to listen to and evaluate a taped recording of their performance. Half of the sample was given cognitive preparation prior to the audio feedback and the remainder received audio feedback only. Cognitive preparation involved asking participants to (1) predict in detail what they would hear on the audiotape, (2) form an image of themselves giving the speech and (3) listen to the audio recording as though they were listening to a stranger. To assess generalization effects all participants were asked to give a second speech. Audio feedback with cognitive preparation was shown to produce less negative ratings after the first speech, and effects generalized to the evaluation of the second speech. More positive speech evaluations were associated with corresponding reductions of state anxiety. Social anxiety as indexed by the Implicit Association Test was reduced in participants given cognitive preparation. Small sample size; analogue study. Audio feedback with cognitive preparation may be utilized as a treatment intervention for social phobia. Copyright © 2011 Elsevier Ltd. All rights reserved.
Movement of the velum during speech and singing in classically trained singers.

PubMed

Austin, S F

1997-06-01

The present study addresses two questions: (a) Is the action and/or posture of the velopharyngeal valve conducive to allow significant resonance during Western tradition classical singing? (b) How do the actions of the velopharyngeal valve observed in this style of singing compare with normal speech? A photodetector system was used to observe the area function of the velopharyngeal port during speech and classical style singing. Identical speech samples were produced by each subject in a normal speaking voice and then in the low, medium, and high singing ranges. Results indicate that in these four singers the velopharyngeal port was closed significantly longer in singing than in speaking samples. The amount of time the velopharyngeal port was opened was greatest in speech and diminished as the singer ascended in pitch. In the high voice condition, little or no opening of the velopharyngeal port was measured.
Speech acoustic markers of early stage and prodromal Huntington's disease: a marker of disease onset?

PubMed

Vogel, Adam P; Shirbin, Christopher; Churchyard, Andrew J; Stout, Julie C

2012-12-01

Speech disturbances (e.g., altered prosody) have been described in symptomatic Huntington's Disease (HD) individuals, however, the extent to which speech changes in gene positive pre-manifest (PreHD) individuals is largely unknown. The speech of individuals carrying the mutant HTT gene is a behavioural/motor/cognitive marker demonstrating some potential as an objective indicator of early HD onset and disease progression. Speech samples were acquired from 30 individuals carrying the mutant HTT gene (13 PreHD, 17 early stage HD) and 15 matched controls. Participants read a passage, produced a monologue and said the days of the week. Data were analysed acoustically for measures of timing, frequency and intensity. There was a clear effect of group across most acoustic measures, so that speech performance differed in-line with disease progression. Comparisons across groups revealed significant differences between the control and the early stage HD group on measures of timing (e.g., speech rate). Participants carrying the mutant HTT gene presented with slower rates of speech, took longer to say words and produced greater silences between and within words compared to healthy controls. Importantly, speech rate showed a significant correlation to burden of disease scores. The speech of early stage HD differed significantly from controls. The speech of PreHD, although not reaching significance, tended to lie between the performance of controls and early stage HD. This suggests that changes in speech production appear to be developing prior to diagnosis. Copyright © 2012 Elsevier Ltd. All rights reserved.
Expressive Language during Conversational Speech in Boys with Fragile X Syndrome

ERIC Educational Resources Information Center

Roberts, Joanne E.; Hennon, Elizabeth A.; Price, Johanna R.; Dear, Elizabeth; Anderson, Kathleen; Vandergrift, Nathan A.

2007-01-01

We compared the expressive syntax and vocabulary skills of 35 boys with fragile X syndrome and 27 younger typically developing boys who were at similar nonverbal mental levels. During a conversational speech sample, the boys with fragile X syndrome used shorter, less complex utterances and produced fewer different words than did the typically…
Examining Acoustic and Kinematic Measures of Articulatory Working Space: Effects of Speech Intensity

ERIC Educational Resources Information Center

Whitfield, Jason A.; Dromey, Christopher; Palmer, Panika

2018-01-01

Purpose: The purpose of this study was to examine the effect of speech intensity on acoustic and kinematic vowel space measures and conduct a preliminary examination of the relationship between kinematic and acoustic vowel space metrics calculated from continuously sampled lingual marker and formant traces. Method: Young adult speakers produced 3…
Effects of Visual Information on Intelligibility of Open and Closed Class Words in Predictable Sentences Produced by Speakers with Dysarthria

ERIC Educational Resources Information Center

Hustad, Katherine C.; Dardis, Caitlin M.; Mccourt, Kelly A.

2007-01-01

This study examined the independent and interactive effects of visual information and linguistic class of words on intelligibility of dysarthric speech. Seven speakers with dysarthria participated in the study, along with 224 listeners who transcribed speech samples in audiovisual (AV) or audio-only (AO) listening conditions. Orthographic…
Effects of speaking task on intelligibility in Parkinson’s disease

PubMed Central

TJADEN, KRIS; WILDING, GREG

2017-01-01

Intelligibility tests for dysarthria typically provide an estimate of overall severity for speech materials elicited through imitation or read from a printed script. The extent to which these types of tasks and procedures reflect intelligibility for extemporaneous speech is not well understood. The purpose of this study was to compare intelligibility estimates obtained for a reading passage and an extemporaneous monologue produced by12 speakers with Parkinson’s disease (PD). The relationship between structural characteristics of utterances and scaled intelligibility was explored within speakers. Speakers were audio-recorded while reading a paragraph and producing a monologue. Speech samples were separated into individual utterances for presentation to 70 listeners who judged intelligibility using orthographic transcription and direct magnitude estimation (DME). Results suggest that scaled estimates of intelligibility for reading show potential for indexing intelligibility of an extemporaneous monologue. Within-speaker variation in scaled intelligibility also was related to the number of words per speech run for extemporaneous speech. PMID:20887216
Emotional and physiological responses of fluent listeners while watching the speech of adults who stutter.

PubMed

Guntupalli, Vijaya K; Everhart, D Erik; Kalinowski, Joseph; Nanjundeswaran, Chayadevie; Saltuklaroglu, Tim

2007-01-01

People who stutter produce speech that is characterized by intermittent, involuntary part-word repetitions and prolongations. In addition to these signature acoustic manifestations, those who stutter often display repetitive and fixated behaviours outside the speech producing mechanism (e.g. in the head, arm, fingers, nares, etc.). Previous research has examined the attitudes and perceptions of those who stutter and people who frequently interact with them (e.g. relatives, parents, employers). Results have shown an unequivocal, powerful and robust negative stereotype despite a lack of defined differences in personality structure between people who stutter and normally fluent individuals. However, physiological investigations of listener responses during moments of stuttering are limited. There is a need for data that simultaneously examine physiological responses (e.g. heart rate and galvanic skin conductance) and subjective behavioural responses to stuttering. The pairing of these objective and subjective data may provide information that casts light on the genesis of negative stereotypes associated with stuttering, the development of compensatory mechanisms in those who stutter, and the true impact of stuttering on senders and receivers alike. To compare the emotional and physiological responses of fluent speakers while listening and observing fluent and severe stuttered speech samples. Twenty adult participants (mean age = 24.15 years, standard deviation = 3.40) observed speech samples of two fluent speakers and two speakers who stutter reading aloud. Participants' skin conductance and heart rate changes were measured as physiological responses to stuttered or fluent speech samples. Participants' subjective responses on arousal (excited-calm) and valence (happy-unhappy) dimensions were assessed via the Self-Assessment Manikin (SAM) rating scale with an additional questionnaire comprised of a set of nine bipolar adjectives. Results showed significantly increased skin conductance and lower mean heart rate during the presentation of stuttered speech relative to the presentation of fluent speech samples (p<0.05). Listeners also self-rated themselves as being more aroused, unhappy, nervous, uncomfortable, sad, tensed, unpleasant, avoiding, embarrassed, and annoyed while viewing stuttered speech relative to the fluent speech. These data support the notion that stutter-filled speech can elicit physiological and emotional responses in listeners. Clinicians who treat stuttering should be aware that listeners show involuntary physiological responses to moderate-severe stuttering that probably remain salient over time and contribute to the evolution of negative stereotypes of people who stutter. With this in mind, it is hoped that clinicians can work with people who stutter to develop appropriate coping strategies. The role of amygdala and mirror neural mechanism in physiological and subjective responses to stuttering is discussed.
Factors affecting articulation skills in children with velocardiofacial syndrome and children with cleft palate or velopharyngeal dysfunction: A preliminary report

PubMed Central

Baylis, Adriane L.; Munson, Benjamin; Moller, Karlind T.

2010-01-01

Objective To examine the influence of speech perception, cognition, and implicit phonological learning on articulation skills of children with Velocardiofacial syndrome (VCFS) and children with cleft palate or velopharyngeal dysfunction (VPD). Design Cross-sectional group experimental design. Participants 8 children with VCFS and 5 children with non-syndromic cleft palate or VPD. Methods and Measures All children participated in a phonetic inventory task, speech perception task, implicit priming nonword repetition task, conversational sample, nonverbal intelligence test, and hearing screening. Speech tasks were scored for percentage of phonemes correctly produced. Group differences and relations among measures were examined using nonparametric statistics. Results Children in the VCFS group demonstrated significantly poorer articulation skills and lower standard scores of nonverbal intelligence compared to the children with cleft palate or VPD. There were no significant group differences in speech perception skills. For the implicit priming task, both groups of children were more accurate in producing primed nonwords than unprimed nonwords. Nonverbal intelligence and severity of velopharyngeal inadequacy for speech were correlated with articulation skills. Conclusions In this study, children with VCFS had poorer articulation skills compared to children with cleft palate or VPD. Articulation difficulties seen in the children with VCFS did not appear to be associated with speech perception skills or the ability to learn new phonological representations. Future research should continue to examine relationships between articulation, cognition, and velopharyngeal dysfunction in a larger sample of children with cleft palate and VCFS. PMID:18333642

Measuring Speech Comprehensibility in Students with Down Syndrome

PubMed Central

Woynaroski, Tiffany; Camarata, Stephen

2016-01-01

Purpose There is an ongoing need to develop assessments of spontaneous speech that focus on whether the child's utterances are comprehensible to listeners. This study sought to identify the attributes of a stable ratings-based measure of speech comprehensibility, which enabled examining the criterion-related validity of an orthography-based measure of the comprehensibility of conversational speech in students with Down syndrome. Method Participants were 10 elementary school students with Down syndrome and 4 unfamiliar adult raters. Averaged across-observer Likert ratings of speech comprehensibility were called a ratings-based measure of speech comprehensibility. The proportion of utterance attempts fully glossed constituted an orthography-based measure of speech comprehensibility. Results Averaging across 4 raters on four 5-min segments produced a reliable (G = .83) ratings-based measure of speech comprehensibility. The ratings-based measure was strongly (r > .80) correlated with the orthography-based measure for both the same and different conversational samples. Conclusion Reliable and valid measures of speech comprehensibility are achievable with the resources available to many researchers and some clinicians. PMID:27299989
Sound frequency affects speech emotion perception: results from congenital amusia

PubMed Central

Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche

2015-01-01

Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718
Impact of cognitive function and dysarthria on spoken language and perceived speech severity in multiple sclerosis

NASA Astrophysics Data System (ADS)

Feenaughty, Lynda

Purpose: The current study sought to investigate the separate effects of dysarthria and cognitive status on global speech timing, speech hesitation, and linguistic complexity characteristics and how these speech behaviors impose on listener impressions for three connected speech tasks presumed to differ in cognitive-linguistic demand for four carefully defined speaker groups; 1) MS with cognitive deficits (MSCI), 2) MS with clinically diagnosed dysarthria and intact cognition (MSDYS), 3) MS without dysarthria or cognitive deficits (MS), and 4) healthy talkers (CON). The relationship between neuropsychological test scores and speech-language production and perceptual variables for speakers with cognitive deficits was also explored. Methods: 48 speakers, including 36 individuals reporting a neurological diagnosis of MS and 12 healthy talkers participated. The three MS groups and control group each contained 12 speakers (8 women and 4 men). Cognitive function was quantified using standard clinical tests of memory, information processing speed, and executive function. A standard z-score of ≤ -1.50 indicated deficits in a given cognitive domain. Three certified speech-language pathologists determined the clinical diagnosis of dysarthria for speakers with MS. Experimental speech tasks of interest included audio-recordings of an oral reading of the Grandfather passage and two spontaneous speech samples in the form of Familiar and Unfamiliar descriptive discourse. Various measures of spoken language were of interest. Suprasegmental acoustic measures included speech and articulatory rate. Linguistic speech hesitation measures included pause frequency (i.e., silent and filled pauses), mean silent pause duration, grammatical appropriateness of pauses, and interjection frequency. For the two discourse samples, three standard measures of language complexity were obtained including subordination index, inter-sentence cohesion adequacy, and lexical diversity. Ten listeners judged each speech sample using the perceptual construct of Speech Severity using a visual analog scale. Additional measures obtained to describe participants included the Sentence Intelligibility Test (SIT), the 10-item Communication Participation Item Bank (CPIB), and standard biopsychosocial measures of depression (Beck Depression Inventory-Fast Screen; BDI-FS), fatigue (Fatigue Severity Scale; FSS), and overall disease severity (Expanded Disability Status Scale; EDSS). Healthy controls completed all measures, with the exception of the CPIB and EDSS. All data were analyzed using standard, descriptive and parametric statistics. For the MSCI group, the relationship between neuropsychological test scores and speech-language variables were explored for each speech task using Pearson correlations. The relationship between neuropsychological test scores and Speech Severity also was explored. Results and Discussion: Topic familiarity for descriptive discourse did not strongly influence speech production or perceptual variables; however, results indicated predicted task-related differences for some spoken language measures. With the exception of the MSCI group, all speaker groups produced the same or slower global speech timing (i.e., speech and articulatory rates), more silent and filled pauses, more grammatical and longer silent pause durations in spontaneous discourse compared to reading aloud. Results revealed no appreciable task differences for linguistic complexity measures. Results indicated group differences for speech rate. The MSCI group produced significantly faster speech rates compared to the MSDYS group. Both the MSDYS and the MSCI groups were judged to have significantly poorer perceived Speech Severity compared to typically aging adults. The Task x Group interaction was only significant for the number of silent pauses. The MSDYS group produced fewer silent pauses in spontaneous speech and more silent pauses in the reading task compared to other groups. Finally, correlation analysis revealed moderate relationships between neuropsychological test scores and speech hesitation measures, within the MSCI group. Slower information processing and poorer memory were significantly correlated with more silent pauses and poorer executive function was associated with fewer filled pauses in the Unfamiliar discourse task. Results have both clinical and theoretical implications. Overall, clinicians should demonstrate caution when interpreting global measures of speech timing and perceptual measures in the absence of information about cognitive ability. Results also have implications for a comprehensive model of spoken language incorporating cognitive, linguistic, and motor variables.
Echolalia and comprehension in autistic children.

PubMed

Roberts, J M

1989-06-01

The research reported in this paper investigates the phenomenon of echolalia in the speech of autistic children by examining the relationship between the frequency of echolalia and receptive language ability. The receptive language skills of 10 autistic children were assessed, and spontaneous speech samples were recorded. Analysis of these data showed that those children with poor receptive language skills produced significantly more echolalic utterances than those children whose receptive skills were more age-appropriate. Children who produced fewer echolalic utterances, and had more advanced receptive language ability, evidenced a higher proportion of mitigated echolalia. The most common type of mitigation was echo plus affirmation or denial.
A Generative Model of Speech Production in Broca’s and Wernicke’s Areas

PubMed Central

Price, Cathy J.; Crinion, Jenny T.; MacSweeney, Mairéad

2011-01-01

Speech production involves the generation of an auditory signal from the articulators and vocal tract. When the intended auditory signal does not match the produced sounds, subsequent articulatory commands can be adjusted to reduce the difference between the intended and produced sounds. This requires an internal model of the intended speech output that can be compared to the produced speech. The aim of this functional imaging study was to identify brain activation related to the internal model of speech production after activation related to vocalization, auditory feedback, and movement in the articulators had been controlled. There were four conditions: silent articulation of speech, non-speech mouth movements, finger tapping, and visual fixation. In the speech conditions, participants produced the mouth movements associated with the words “one” and “three.” We eliminated auditory feedback from the spoken output by instructing participants to articulate these words without producing any sound. The non-speech mouth movement conditions involved lip pursing and tongue protrusions to control for movement in the articulators. The main difference between our speech and non-speech mouth movement conditions is that prior experience producing speech sounds leads to the automatic and covert generation of auditory and phonological associations that may play a role in predicting auditory feedback. We found that, relative to non-speech mouth movements, silent speech activated Broca’s area in the left dorsal pars opercularis and Wernicke’s area in the left posterior superior temporal sulcus. We discuss these results in the context of a generative model of speech production and propose that Broca’s and Wernicke’s areas may be involved in predicting the speech output that follows articulation. These predictions could provide a mechanism by which rapid movement of the articulators is precisely matched to the intended speech outputs during future articulations. PMID:21954392
Measuring word complexity in speech screening: single-word sampling to identify phonological delay/disorder in preschool children.

PubMed

Anderson, Carolyn; Cohen, Wendy

2012-01-01

Children's speech sound development is assessed by comparing speech production with the typical development of speech sounds based on a child's age and developmental profile. One widely used method of sampling is to elicit a single-word sample along with connected speech. Words produced spontaneously rather than imitated may give a more accurate indication of a child's speech development. A published word complexity measure can be used to score later-developing speech sounds and more complex word patterns. There is a need for a screening word list that is quick to administer and reliably differentiates children with typically developing speech from children with patterns of delayed/disordered speech. To identify a short word list based on word complexity that could be spontaneously named by most typically developing children aged 3;00-5;05 years. One hundred and five children aged between 3;00 and 5;05 years from three local authority nursery schools took part in the study. Items from a published speech assessment were modified and extended to include a range of phonemic targets in different word positions in 78 monosyllabic and polysyllabic words. The 78 words were ranked both by phonemic/phonetic complexity as measured by word complexity and by ease of spontaneous production. The ten most complex words (hereafter Triage 10) were named spontaneously by more than 90% of the children. There was no significant difference between the complexity measures for five identified age groups when the data were examined in 6-month groups. A qualitative analysis revealed eight children with profiles of phonological delay or disorder. When these children were considered separately, there was a statistically significant difference (p < 0.005) between the mean word complexity measure of the group compared with the mean for the remaining children in all other age groups. The Triage 10 words reliably differentiated children with typically developing speech from those with delayed or disordered speech patterns. The Triage 10 words can be used as a screening tool for triage and general assessment and have the potential to monitor progress during intervention. Further testing is being undertaken to establish reliability with children referred to speech and language therapy services. © 2012 Royal College of Speech and Language Therapists.
An acoustical assessment of pitch-matching accuracy in relation to speech frequency, speech frequency range, age and gender in preschool children

NASA Astrophysics Data System (ADS)

Trollinger, Valerie L.

This study investigated the relationship between acoustical measurement of singing accuracy in relationship to speech fundamental frequency, speech fundamental frequency range, age and gender in preschool-aged children. Seventy subjects from Southeastern Pennsylvania; the San Francisco Bay Area, California; and Terre Haute, Indiana, participated in the study. Speech frequency was measured by having the subjects participate in spontaneous and guided speech activities with the researcher, with 18 diverse samples extracted from each subject's recording for acoustical analysis for fundamental frequency in Hz with the CSpeech computer program. The fundamental frequencies were averaged together to derive a mean speech frequency score for each subject. Speech range was calculated by subtracting the lowest fundamental frequency produced from the highest fundamental frequency produced, resulting in a speech range measured in increments of Hz. Singing accuracy was measured by having the subjects each echo-sing six randomized patterns using the pitches Middle C, D, E, F♯, G and A (440), using the solfege syllables of Do and Re, which were recorded by a 5-year-old female model. For each subject, 18 samples of singing were recorded. All samples were analyzed by the CSpeech for fundamental frequency. For each subject, deviation scores in Hz were derived by calculating the difference between what the model sang in Hz and what the subject sang in response in Hz. Individual scores for each child consisted of an overall mean total deviation frequency, mean frequency deviations for each pattern, and mean frequency deviation for each pitch. Pearson correlations, MANOVA and ANOVA analyses, Multiple Regressions and Discriminant Analysis revealed the following findings: (1) moderate but significant (p < .001) relationships emerged between mean speech frequency and the ability to sing the pitches E, F♯, G and A in the study; (2) mean speech frequency also emerged as the strongest predictor of subjects' ability to sing the notes E and F♯; (3) mean speech frequency correlated moderately and significantly (p < .001) with sharpness and flatness of singing response accuracy in Hz; (4) speech range was the strongest predictor of singing accuracy for the pitches G and A in the study (p < .001); (5) gender emerged as a significant, but not the strongest, predictor for ability to sing the pitches in the study above C and D; (6) gender did not correlate with mean speech frequency and speech range; (7) age in months emerged as a low but significant predictor of ability to sing the lower notes (C and D) in the study; (8) age correlated significantly but negatively low (r = -.23, p < .05, two-tailed) with mean speech frequency; and (9) age did not emerge as a significant predictor of overall singing accuracy. Ancillary findings indicated that there were significant differences in singing accuracy based on geographic location by gender, and that siblings and fraternal twins in the study generally performed similarly. In addition, reliability for using the CSpeech for acoustical analysis revealed test/retest correlations of .99, with one exception at .94. Based on these results, suggestions were made concerning future research concerned with studying the use of voice in speech and how it may affect singing development, overall use in singing, and pitch-matching accuracy.
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

PubMed Central

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca’s aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca’s aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment. PMID:23250889
Everyday listeners' impressions of speech produced by individuals with adductor spasmodic dysphonia.

PubMed

Nagle, Kathleen F; Eadie, Tanya L; Yorkston, Kathryn M

2015-01-01

Individuals with adductor spasmodic dysphonia (ADSD) have reported that unfamiliar communication partners appear to judge them as sneaky, nervous or not intelligent, apparently based on the quality of their speech; however, there is minimal research into the actual everyday perspective of listening to ADSD speech. The purpose of this study was to investigate the impressions of listeners hearing ADSD speech for the first time using a mixed-methods design. Everyday listeners were interviewed following sessions in which they made ratings of ADSD speech. A semi-structured interview approach was used and data were analyzed using thematic content analysis. Three major themes emerged: (1) everyday listeners make judgments about speakers with ADSD; (2) ADSD speech does not sound normal to everyday listeners; and (3) rating overall severity is difficult for everyday listeners. Participants described ADSD speech similarly to existing literature; however, some listeners inaccurately extrapolated speaker attributes based solely on speech samples. Listeners may draw erroneous conclusions about individuals with ADSD and these biases may affect the communicative success of these individuals. Results have implications for counseling individuals with ADSD, as well as the need for education and awareness about ADSD. Copyright © 2015 Elsevier Inc. All rights reserved.
Universal Production Patterns and Ambient Language Influences in Babbling: A Cross-Linguistic Study of Korean- and English-Learning Infants

ERIC Educational Resources Information Center

Lee, Sue Ann S.; Davis, Barbara; MacNeilage, Peter

2010-01-01

The phonetic characteristics of canonical babbling produced by Korean- and English-learning infants were compared with consonant and vowel frequencies observed in infant-directed speech produced by Korean- and English-speaking mothers. For infant output, babbling samples from six Korean-learning infants were compared with an existing English…
Children's Perception of Speech Produced in a Two-Talker Background

ERIC Educational Resources Information Center

Baker, Mallory; Buss, Emily; Jacks, Adam; Taylor, Crystal; Leibold, Lori J.

2014-01-01

Purpose: This study evaluated the degree to which children benefit from the acoustic modifications made by talkers when they produce speech in noise. Method: A repeated measures design compared the speech perception performance of children (5-11 years) and adults in a 2-talker masker. Target speech was produced in a 2-talker background or in…
THE COMPREHENSION OF RAPID SPEECH BY THE BLIND, PART III.

ERIC Educational Resources Information Center

FOULKE, EMERSON

A REVIEW OF THE RESEARCH ON THE COMPREHENSION OF RAPID SPEECH BY THE BLIND IDENTIFIES FIVE METHODS OF SPEECH COMPRESSION--SPEECH CHANGING, ELECTROMECHANICAL SAMPLING, COMPUTER SAMPLING, SPEECH SYNTHESIS, AND FREQUENCY DIVIDING WITH THE HARMONIC COMPRESSOR. THE SPEECH CHANGING AND ELECTROMECHANICAL SAMPLING METHODS AND THE NECESSARY APPARATUS HAVE…
Testing the influence of external and internal cues on smoking motivation using a community sample.

PubMed

Litvin, Erika B; Brandon, Thomas H

2010-02-01

Exposing smokers to either external cues (e.g., pictures of cigarettes) or internal cues (e.g., negative affect induction) can induce urge to smoke and other behavioral and physiological responses. However, little is known about whether the two types of cues interact when presented in close proximity, as is likely the case in the real word. Additionally, potential moderators of cue reactivity have rarely been examined. Finally, few cue-reactivity studies have used representative samples of smokers. In a randomized 2 x 2 crossed factorial between-subjects design, the current study tested the effects of a negative affect cue intended to produce anxiety (speech preparation task) and an external smoking cue on urge and behavioral reactivity in a community sample of adult smokers (N = 175), and whether trait impulsivity moderated the effects. Both types of cues produced main effects on urges to smoke, despite the speech task failing to increase anxiety significantly. The speech task increased smoking urge related to anticipation of negative affect relief, whereas the external smoking cues increased urges related to anticipation of pleasure; however, the cues did not interact. Impulsivity measures predicted urge and other smoking-related variables, but did not moderate cue-reactivity. Results suggest independent rather than synergistic effects of these contributors to smoking motivation. (PsycINFO Database Record (c) 2010 APA, all rights reserved).
How our own speech rate influences our perception of others.

PubMed

Bosker, Hans Rutger

2017-08-01

In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through 6 experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing prerecorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Gender differences in identifying emotions from auditory and visual stimuli.

PubMed

Waaramaa, Teija

2017-12-01

The present study focused on gender differences in emotion identification from auditory and visual stimuli produced by two male and two female actors. Differences in emotion identification from nonsense samples, language samples and prolonged vowels were investigated. It was also studied whether auditory stimuli can convey the emotional content of speech without visual stimuli, and whether visual stimuli can convey the emotional content of speech without auditory stimuli. The aim was to get a better knowledge of vocal attributes and a more holistic understanding of the nonverbal communication of emotion. Females tended to be more accurate in emotion identification than males. Voice quality parameters played a role in emotion identification in both genders. The emotional content of the samples was best conveyed by nonsense sentences, better than by prolonged vowels or shared native language of the speakers and participants. Thus, vocal non-verbal communication tends to affect the interpretation of emotion even in the absence of language. The emotional stimuli were better recognized from visual stimuli than auditory stimuli by both genders. Visual information about speech may not be connected to the language; instead, it may be based on the human ability to understand the kinetic movements in speech production more readily than the characteristics of the acoustic cues.
Duration of the speech disfluencies of beginning stutterers.

PubMed

Zebrowski, P M

1991-06-01

This study compared the duration of within-word disfluencies and the number of repeated units per instance of sound/syllable and whole-word repetitions of beginning stutterers to those produced by age- and sex-matched nonstuttering children. Subjects were 10 stuttering children [9 males and 1 female; mean age 4:1 (years:months); age range 3:2-5:1), and 10 nonstuttering children (9 males and 1 female; mean age 4:0; age range: 2:10-5:1). Mothers of the stuttering children reported that their children had been stuttering for 1 year or less. One 300-word conversational speech sample from each of the stuttering and nonstuttering children was analyzed for (a) mean duration of sound/syllable repetition and sound prolongation, (b) mean number of repeated units per instance of sound/syllable and whole-word repetition, and (c) various related measures of the frequency of all between- and within-word speech disfluencies. There were no significant between-group differences for either the duration of acoustically measured sound/syllable repetitions and sound prolongations or the number of repeated units per instance of sound/syllable and whole-word repetition. Unlike frequency and type of speech disfluency produced, average duration of within-word disfluencies and number of repeated units per repetition do not differentiate the disfluent speech of beginning stutterers and their nonstuttering peers. Additional analyses support findings from previous perceptual work that type and frequency of speech disfluency, not duration, are the principal characteristics listeners use in distinguishing these two talker groups.
The Disfluent Speech of Bilingual Spanish–English Children: Considerations for Differential Diagnosis of Stuttering

PubMed Central

Bedore, Lisa M.; Ramos, Daniel

2015-01-01

Purpose The primary purpose of this study was to describe the frequency and types of speech disfluencies that are produced by bilingual Spanish–English (SE) speaking children who do not stutter. The secondary purpose was to determine whether their disfluent speech is mediated by language dominance and/or language produced. Method Spanish and English narratives (a retell and a tell in each language) were elicited and analyzed relative to the frequency and types of speech disfluencies produced. These data were compared with the monolingual English-speaking guidelines for differential diagnosis of stuttering. Results The mean frequency of stuttering-like speech behaviors in the bilingual SE participants ranged from 3% to 22%, exceeding the monolingual English standard of 3 per 100 words. There was no significant frequency difference in stuttering-like or non-stuttering-like speech disfluency produced relative to the child's language dominance. There was a significant difference relative to the language the child was speaking; all children produced significantly more stuttering-like speech disfluencies in Spanish than in English. Conclusion Results demonstrate that the disfluent speech of bilingual SE children should be carefully considered relative to the complex nature of bilingualism. PMID:25215876
When Infants Talk, Infants Listen: Pre-Babbling Infants Prefer Listening to Speech with Infant Vocal Properties

ERIC Educational Resources Information Center

Masapollo, Matthew; Polka, Linda; Ménard, Lucie

2016-01-01

To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to…
Techniques for decoding speech phonemes and sounds: A concept

NASA Technical Reports Server (NTRS)

Lokerson, D. C.; Holby, H. G.

1975-01-01

Techniques studied involve conversion of speech sounds into machine-compatible pulse trains. (1) Voltage-level quantizer produces number of output pulses proportional to amplitude characteristics of vowel-type phoneme waveforms. (2) Pulses produced by quantizer of first speech formants are compared with pulses produced by second formants.
Loss of regional accent after damage to the speech production network.

PubMed

Berthier, Marcelo L; Dávila, Guadalupe; Moreno-Torres, Ignacio; Beltrán-Corbellini, Álvaro; Santana-Moreno, Daniel; Roé-Vellvé, Núria; Thurnhofer-Hemsi, Karl; Torres-Prioris, María José; Massone, María Ignacia; Ruiz-Cruces, Rafael

2015-01-01

Lesion-symptom mapping studies reveal that selective damage to one or more components of the speech production network can be associated with foreign accent syndrome, changes in regional accent (e.g., from Parisian accent to Alsatian accent), stronger regional accent, or re-emergence of a previously learned and dormant regional accent. Here, we report loss of regional accent after rapidly regressive Broca's aphasia in three Argentinean patients who had suffered unilateral or bilateral focal lesions in components of the speech production network. All patients were monolingual speakers with three different native Spanish accents (Cordobés or central, Guaranítico or northeast, and Bonaerense). Samples of speech production from the patient with native Córdoba accent were compared with previous recordings of his voice, whereas data from the patient with native Guaranítico accent were compared with speech samples from one healthy control matched for age, gender, and native accent. Speech samples from the patient with native Buenos Aires's accent were compared with data obtained from four healthy control subjects with the same accent. Analysis of speech production revealed discrete slowing in speech rate, inappropriate long pauses, and monotonous intonation. Phonemic production remained similar to those of healthy Spanish speakers, but phonetic variants peculiar to each accent (e.g., intervocalic aspiration of /s/ in Córdoba accent) were absent. While basic normal prosodic features of Spanish prosody were preserved, features intrinsic to melody of certain geographical areas (e.g., rising end F0 excursion in declarative sentences intoned with Córdoba accent) were absent. All patients were also unable to produce sentences with different emotional prosody. Brain imaging disclosed focal left hemisphere lesions involving the middle part of the motor cortex, the post-central cortex, the posterior inferior and/or middle frontal cortices, insula, anterior putamen and supplementary motor area. Our findings suggest that lesions affecting the middle part of the left motor cortex and other components of the speech production network disrupt neural processes involved in the production of regional accent features.

Loss of regional accent after damage to the speech production network

PubMed Central

Berthier, Marcelo L.; Dávila, Guadalupe; Moreno-Torres, Ignacio; Beltrán-Corbellini, Álvaro; Santana-Moreno, Daniel; Roé-Vellvé, Núria; Thurnhofer-Hemsi, Karl; Torres-Prioris, María José; Massone, María Ignacia; Ruiz-Cruces, Rafael

2015-01-01

Lesion-symptom mapping studies reveal that selective damage to one or more components of the speech production network can be associated with foreign accent syndrome, changes in regional accent (e.g., from Parisian accent to Alsatian accent), stronger regional accent, or re-emergence of a previously learned and dormant regional accent. Here, we report loss of regional accent after rapidly regressive Broca’s aphasia in three Argentinean patients who had suffered unilateral or bilateral focal lesions in components of the speech production network. All patients were monolingual speakers with three different native Spanish accents (Cordobés or central, Guaranítico or northeast, and Bonaerense). Samples of speech production from the patient with native Córdoba accent were compared with previous recordings of his voice, whereas data from the patient with native Guaranítico accent were compared with speech samples from one healthy control matched for age, gender, and native accent. Speech samples from the patient with native Buenos Aires’s accent were compared with data obtained from four healthy control subjects with the same accent. Analysis of speech production revealed discrete slowing in speech rate, inappropriate long pauses, and monotonous intonation. Phonemic production remained similar to those of healthy Spanish speakers, but phonetic variants peculiar to each accent (e.g., intervocalic aspiration of /s/ in Córdoba accent) were absent. While basic normal prosodic features of Spanish prosody were preserved, features intrinsic to melody of certain geographical areas (e.g., rising end F0 excursion in declarative sentences intoned with Córdoba accent) were absent. All patients were also unable to produce sentences with different emotional prosody. Brain imaging disclosed focal left hemisphere lesions involving the middle part of the motor cortex, the post-central cortex, the posterior inferior and/or middle frontal cortices, insula, anterior putamen and supplementary motor area. Our findings suggest that lesions affecting the middle part of the left motor cortex and other components of the speech production network disrupt neural processes involved in the production of regional accent features. PMID:26594161
Immediate effects of AAF devices on the characteristics of stuttering: a clinical analysis.

PubMed

Unger, Julia P; Glück, Christian W; Cholewa, Jürgen

2012-06-01

The present study investigated the immediate effects of altered auditory feedback (AAF) and one Inactive Condition (AAF parameters set to 0) on clinical attributes of stuttering during scripted and spontaneous speech. Two commercially available, portable AAF devices were used to create the combined delayed auditory feedback (DAF) and frequency altered feedback (FAF) effects. Thirty adults, who stutter, aged 18-68 years (M=36.5; SD=15.2), participated in this investigation. Each subject produced four sets of 5-min of oral reading, three sets of 5-min monologs as well as 10-min dialogs. These speech samples were analyzed to detect changes in descriptive features of stuttering (frequency, duration, speech/articulatory rate, core behaviors) across the various speech samples and within two SSI-4 (Riley, 2009) based severity ratings. A statistically significant difference was found in the frequency of stuttered syllables (%SS) during both Active Device conditions (p=.000) for all speech samples. The most sizable reductions in %SS occurred within scripted speech. In the analysis of stuttering type, it was found that blocks were reduced significantly (Device A: p=.017; Device B: p=.049). To evaluate the impact on severe and mild stuttering, participants were grouped into two SSI-4 based categories; mild and moderate-severe. During the Inactive Condition those participants within the moderate-severe group (p=.024) showed a statistically significant reduction in overall disfluencies. This result indicates, that active AAF parameters alone may not be the sole cause of a fluency-enhancement when using a technical speech aid. The reader will learn and be able to describe: (1) currently available scientific evidence on the use of altered auditory feedback (AAF) during scripted and spontaneous speech, (2) which characteristics of stuttering are impacted by an AAF device (frequency, duration, core behaviors, speech & articulatory rate, stuttering severity), (3) the effects of an Inactive Condition on people who stutter (PWS) falling into two severity groups, and (4) how the examined participants perceived the use of AAF devices. Copyright © 2012 Elsevier Inc. All rights reserved.
Understanding the abstract role of speech in communication at 12 months.

PubMed

Martin, Alia; Onishi, Kristine H; Vouloumanos, Athena

2012-04-01

Adult humans recognize that even unfamiliar speech can communicate information between third parties, demonstrating an ability to separate communicative function from linguistic content. We examined whether 12-month-old infants understand that speech can communicate before they understand the meanings of specific words. Specifically, we test the understanding that speech permits the transfer of information about a Communicator's target object to a Recipient. Initially, the Communicator selectively grasped one of two objects. In test, the Communicator could no longer reach the objects. She then turned to the Recipient and produced speech (a nonsense word) or non-speech (coughing). Infants looked longer when the Recipient selected the non-target than the target object when the Communicator had produced speech but not coughing (Experiment 1). Looking time patterns differed from the speech condition when the Recipient rather than the Communicator produced the speech (Experiment 2), and when the Communicator produced a positive emotional vocalization (Experiment 3), but did not differ when the Recipient had previously received information about the target by watching the Communicator's selective grasping (Experiment 4). Thus infants understand the information-transferring properties of speech and recognize some of the conditions under which others' information states can be updated. These results suggest that infants possess an abstract understanding of the communicative function of speech, providing an important potential mechanism for language and knowledge acquisition. Copyright © 2011 Elsevier B.V. All rights reserved.
Influence of speech sample on perceptual rating of hypernasality.

PubMed

Medeiros, Maria Natália Leite de; Fukushiro, Ana Paula; Yamashita, Renata Paciello

2016-07-07

To investigate the influence of speech sample of spontaneous conversation or sentences repetition on intra and inter-rater hypernasality reliability. One hundred and twenty audio recorded speech samples (60 containing spontaneous conversation and 60 containing repeated sentences) of individuals with repaired cleft palate±lip, both genders, aged between 6 and 52 years old (mean=21±10) were selected and edited. Three experienced speech and language pathologists rated hypernasality according to their own criteria using 4-point scale: 1=absence of hypernasality, 2=mild hypernasality, 3=moderate hypernasality and 4=severe hypernasality, first in spontaneous speech samples and 30 days after, in sentences repetition samples. Intra- and inter-rater agreements were calculated for both speech samples and were statistically compared by the Z test at a significance level of 5%. Comparison of intra-rater agreements between both speech samples showed an increase of the coefficients obtained in the analysis of sentences repetition compared to those obtained in spontaneous conversation. Comparison between inter-rater agreement showed no significant difference among the three raters for the two speech samples. Sentences repetition improved intra-raters reliability of perceptual judgment of hypernasality. However, the speech sample had no influence on reliability among different raters.
Don’t speak too fast! Processing of fast rate speech in children with specific language impairment

PubMed Central

Bedoin, Nathalie; Krifi-Papoz, Sonia; Herbillon, Vania; Caillot-Bascoul, Aurélia; Gonzalez-Monge, Sibylle; Boulenger, Véronique

2018-01-01

Background Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Method Sixteen French children with SLI (8–13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d′) to semantically incongruent sentence-final words was measured. Results Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. Conclusion In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception. PMID:29373610
Beginning to Talk Like an Adult: Increases in Speech-like Utterances in Young Cochlear Implant Recipients and Toddlers with Normal Hearing

PubMed Central

Ertmer, David J.; Jung, Jongmin; Kloiber, Diana True

2013-01-01

Background Speech-like utterances containing rapidly combined consonants and vowels eventually dominate the prelinguistic and early word productions of toddlers who are developing typically (TD). It seems reasonable to expect a similar phenomenon in young cochlear implants (CI) recipients. This study sought to determine the number of months of robust hearing experience needed to achieve a majority of speech-like utterances in both of these groups. Methods Speech samples were recorded at 3-month intervals during the first 2 years of CI experience, and between 6- and 24 months of age in TD children. Speech-like utterances were operationally defined as those belonging to the Basic Canonical Syllables (BCS) or Advanced Forms (AF) levels of the Consolidated Stark Assessment of Early Vocal Development-Revised. Results On average, the CI group achieved a majority of speech- like utterances after 12 months, and the TD group after 18 months of robust hearing experience. The CI group produced greater percentages of speech-like utterances at each interval until 24-months, when both groups approximated 80%. Conclusion Auditory deprivation did not limit progress in vocal development as young CI recipients showed more-rapid-than-typical speech development during the first 2 years of device use. Implications for the Infraphonological model of speech development are considered. PMID:23813203
Monitoring Progress in Vocal Development in Young Cochlear Implant Recipients: Relationships between Speech Samples and Scores from the Conditioned Assessment of Speech Production (CASP)

PubMed Central

Ertmer, David J.; Jung, Jongmin

2012-01-01

Background Evidence of auditory-guided speech development can be heard as the prelinguistic vocalizations of young cochlear implant recipients become increasingly complex, phonetically diverse, and speech-like. In research settings, these changes are most often documented by collecting and analyzing speech samples. Sampling, however, may be too time-consuming and impractical for widespread use in clinical settings. The Conditioned Assessment of Speech Production (CASP; Ertmer & Stoel-Gammon, 2008) is an easily administered and time-efficient alternative to speech sample analysis. The current investigation examined the concurrent validity of the CASP and data obtained from speech samples recorded at the same intervals. Methods Nineteen deaf children who received CIs before their third birthdays participated in the study. Speech samples and CASP scores were gathered at 6, 12, 18, and 24 months post-activation. Correlation analyses were conducted to assess the concurrent validity of CASP scores and data from samples. Results CASP scores showed strong concurrent validity with scores from speech samples gathered across all recording sessions (6 – 24 months). Conclusions The CASP was found to be a valid, reliable, and time-efficient tool for assessing progress in vocal development during young CI recipient’s first 2 years of device experience. PMID:22628109
Multilevel Analysis in Analyzing Speech Data

ERIC Educational Resources Information Center

Guddattu, Vasudeva; Krishna, Y.

2011-01-01

The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
How much is a word? Predicting ease of articulation planning from apraxic speech error patterns.

PubMed

Ziegler, Wolfram; Aichert, Ingrid

2015-08-01

According to intuitive concepts, 'ease of articulation' is influenced by factors like word length or the presence of consonant clusters in an utterance. Imaging studies of speech motor control use these factors to systematically tax the speech motor system. Evidence from apraxia of speech, a disorder supposed to result from speech motor planning impairment after lesions to speech motor centers in the left hemisphere, supports the relevance of these and other factors in disordered speech planning and the genesis of apraxic speech errors. Yet, there is no unified account of the structural properties rendering a word easy or difficult to pronounce. To model the motor planning demands of word articulation by a nonlinear regression model trained to predict the likelihood of accurate word production in apraxia of speech. We used a tree-structure model in which vocal tract gestures are embedded in hierarchically nested prosodic domains to derive a recursive set of terms for the computation of the likelihood of accurate word production. The model was trained with accuracy data from a set of 136 words averaged over 66 samples from apraxic speakers. In a second step, the model coefficients were used to predict a test dataset of accuracy values for 96 new words, averaged over 120 samples produced by a different group of apraxic speakers. Accurate modeling of the first dataset was achieved in the training study (R(2)adj = .71). In the cross-validation, the test dataset was predicted with a high accuracy as well (R(2)adj = .67). The model shape, as reflected by the coefficient estimates, was consistent with current phonetic theories and with clinical evidence. In accordance with phonetic and psycholinguistic work, a strong influence of word stress on articulation errors was found. The proposed model provides a unified and transparent account of the motor planning requirements of word articulation. Copyright © 2015 Elsevier Ltd. All rights reserved.
Reduced efficiency of audiovisual integration for nonnative speech.

PubMed

Yi, Han-Gyol; Phelps, Jasmine E B; Smiljanic, Rajka; Chandrasekaran, Bharath

2013-11-01

The role of visual cues in native listeners' perception of speech produced by nonnative speakers has not been extensively studied. Native perception of English sentences produced by native English and Korean speakers in audio-only and audiovisual conditions was examined. Korean speakers were rated as more accented in audiovisual than in the audio-only condition. Visual cues enhanced word intelligibility for native English speech but less so for Korean-accented speech. Reduced intelligibility of Korean-accented audiovisual speech was associated with implicit visual biases, suggesting that listener-related factors partially influence the efficiency of audiovisual integration for nonnative speech perception.
Speech production in experienced cochlear implant users undergoing short-term auditory deprivation

NASA Astrophysics Data System (ADS)

Greenman, Geoffrey; Tjaden, Kris; Kozak, Alexa T.

2005-09-01

This study examined the effect of short-term auditory deprivation on the speech production of five postlingually deafened women, all of whom were experienced cochlear implant users. Each cochlear implant user, as well as age and gender matched control speakers, produced CVC target words embedded in a reading passage. Speech samples for the deafened adults were collected on two separate occasions. First, the speakers were recorded after wearing their speech processor consistently for at least two to three hours prior to recording (implant ``ON''). The second recording occurred when the speakers had their speech processors turned off for approximately ten to twelve hours prior to recording (implant ``OFF''). Acoustic measures, including fundamental frequency (F0), the first (F1) and second (F2) formants of the vowels, vowel space area, vowel duration, spectral moments of the consonants, as well as utterance duration and sound pressure level (SPL) across the entire utterance were analyzed in both speaking conditions. For each implant speaker, acoustic measures will be compared across implant ``ON'' and implant ``OFF'' speaking conditions, and will also be compared to data obtained from normal hearing speakers.
Aging-related gains and losses associated with word production in connected speech.

PubMed

Dennis, Paul A; Hess, Thomas M

2016-11-01

Older adults have been observed to use more nonnormative, or atypical, words than younger adults in connected speech. We examined whether aging-related losses in word-finding abilities or gains in language expertise underlie these age differences. Sixty younger and 60 older adults described two neutral photographs. These descriptions were processed into word types, and textual analysis was used to identify interrupted speech (e.g., pauses), reflecting word-finding difficulty. Word types were assessed for normativeness, with nonnormative word types defined as those used by six (5%) or fewer participants to describe a particular picture. Accuracy and precision ratings were provided by another sample of 48 high-vocabulary younger and older adults. Older adults produced more interrupted and, as predicted, nonnormative words than younger adults. Older adults were more likely than younger adults to use nonnormative language via interrupted speech, suggesting a compensatory process. However, older adults' nonnormative words were more precise and trended for having higher accuracy, reflecting expertise. In tasks offering response flexibility, like connected speech, older adults may be able to offset instances of aging-related deficits by maximizing their expertise in other instances.
Language Attitudes in Guangzhou, China.

ERIC Educational Resources Information Center

Kalmar, Ivan; And Others

1987-01-01

Presents comparison of Cantonese and non-Cantonese students' expressed judgments of two samples of speech produced by same person but presented as coming from two different speakers. Although all students thought the better speaker would have a better chance for social advancement, Cantonese subjects who spoke with heavy Cantonese accents were…
Quantification and Systematic Characterization of Stuttering-Like Disfluencies in Acquired Apraxia of Speech.

PubMed

Bailey, Dallin J; Blomgren, Michael; DeLong, Catharine; Berggren, Kiera; Wambaugh, Julie L

2017-06-22

The purpose of this article is to quantify and describe stuttering-like disfluencies in speakers with acquired apraxia of speech (AOS), utilizing the Lidcombe Behavioural Data Language (LBDL). Additional purposes include measuring test-retest reliability and examining the effect of speech sample type on disfluency rates. Two types of speech samples were elicited from 20 persons with AOS and aphasia: repetition of mono- and multisyllabic words from a protocol for assessing AOS (Duffy, 2013), and connected speech tasks (Nicholas & Brookshire, 1993). Sampling was repeated at 1 and 4 weeks following initial sampling. Stuttering-like disfluencies were coded using the LBDL, which is a taxonomy that focuses on motoric aspects of stuttering. Disfluency rates ranged from 0% to 13.1% for the connected speech task and from 0% to 17% for the word repetition task. There was no significant effect of speech sampling time on disfluency rate in the connected speech task, but there was a significant effect of time for the word repetition task. There was no significant effect of speech sample type. Speakers demonstrated both major types of stuttering-like disfluencies as categorized by the LBDL (fixed postures and repeated movements). Connected speech samples yielded more reliable tallies over repeated measurements. Suggestions are made for modifying the LBDL for use in AOS in order to further add to systematic descriptions of motoric disfluencies in this disorder.
A behavior analytic analogue of learning to use synonyms, syntax, and parts of speech.

PubMed

Chase, Philip N; Ellenwood, David W; Madden, Gregory

2008-01-01

Matching-to-sample and sequence training procedures were used to develop responding to stimulus classes that were considered analogous to 3 aspects of verbal behavior: identifying synonyms and parts of speech, and using syntax. Matching-to-sample procedures were used to train 12 paired associates from among 24 stimuli. These pairs were analogous to synonyms. Then, sequence characteristics were trained to 6 of the stimuli. The result was the formation of 3 classes of 4 stimuli, with the classes controlling a sequence response analogous to a simple ordering syntax: first, second, and third. Matching-to-sample procedures were then used to add 4 stimuli to each class. These stimuli, without explicit sequence training, also began to control the same sequence responding as the other members of their class. Thus, three 8-member functionally equivalent sequence classes were formed. These classes were considered to be analogous to parts of speech. Further testing revealed three 8-member equivalence classes and 512 different sequences of first, second, and third. The study indicated that behavior analytic procedures may be used to produce some generative aspects of verbal behavior related to simple syntax and semantics.
Relations among questionnaire and experience sampling measures of inner speech: a smartphone app study

PubMed Central

Alderson-Day, Ben; Fernyhough, Charles

2015-01-01

Inner speech is often reported to be a common and central part of inner experience, but its true prevalence is unclear. Many questionnaire-based measures appear to lack convergent validity and it has been claimed that they overestimate inner speech in comparison to experience sampling methods (which involve collecting data at random timepoints). The present study compared self-reporting of inner speech collected via a general questionnaire and experience sampling, using data from a custom-made smartphone app (Inner Life). Fifty-one university students completed a generalized self-report measure of inner speech (the Varieties of Inner Speech Questionnaire, VISQ) and responded to at least seven random alerts to report on incidences of inner speech over a 2-week period. Correlations and pairwise comparisons were used to compare generalized endorsements and randomly sampled scores for each VISQ subscale. Significant correlations were observed between general and randomly sampled measures for only two of the four VISQ subscales, and endorsements of inner speech with evaluative or motivational characteristics did not correlate at all across different measures. Endorsement of inner speech items was significantly lower for random sampling compared to generalized self-report, for all VISQ subscales. Exploratory analysis indicated that specific inner speech characteristics were also related to anxiety and future-oriented thinking. PMID:25964773
Acoustic properties of naturally produced clear speech at normal speaking rates

NASA Astrophysics Data System (ADS)

Krause, Jean C.; Braida, Louis D.

2004-01-01

Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.
Children perceive speech onsets by ear and eye*

PubMed Central

JERGER, SUSAN; DAMIAN, MARKUS F.; TYE-MURRAY, NANCY; ABDI, HERVÉ

2016-01-01

Adults use vision to perceive low-fidelity speech; yet how children acquire this ability is not well understood. The literature indicates that children show reduced sensitivity to visual speech from kindergarten to adolescence. We hypothesized that this pattern reflects the effects of complex tasks and a growth period with harder-to-utilize cognitive resources, not lack of sensitivity. We investigated sensitivity to visual speech in children via the phonological priming produced by low-fidelity (non-intact onset) auditory speech presented audiovisually (see dynamic face articulate consonant/rhyme b/ag; hear non-intact onset/rhyme: −b/ag) vs. auditorily (see still face; hear exactly same auditory input). Audiovisual speech produced greater priming from four to fourteen years, indicating that visual speech filled in the non-intact auditory onsets. The influence of visual speech depended uniquely on phonology and speechreading. Children – like adults – perceive speech onsets multimodally. Findings are critical for incorporating visual speech into developmental theories of speech perception. PMID:26752548
The effects of complementary and alternative medicine on the speech of patients with depression

NASA Astrophysics Data System (ADS)

Fraas, Michael; Solloway, Michele

2004-05-01

It is well documented that patients suffering from depression exhibit articulatory timing deficits and speech that is monotonous and lacking pitch variation. Traditional remediation of depression has left many patients with adverse side effects and ineffective outcomes. Recent studies indicate that many Americans are seeking complementary and alternative forms of medicine to supplement traditional therapy approaches. The current investigation wishes to determine the efficacy of complementary and alternative medicine (CAM) on the remediation of speech deficits associated with depression. Subjects with depression and normal controls will participate in an 8-week treatment session using polarity therapy, a form of CAM. Subjects will be recorded producing a series of spontaneous and narrative speech samples. Acoustic analysis of mean fundamental frequency (F0), variation in F0 (standard deviation of F0), average rate of F0 change, and pause and utterance durations will be conducted. Differences pre- and post-CAM therapy between subjects with depression and normal controls will be discussed.
Improved Speech Coding Based on Open-Loop Parameter Estimation

NASA Technical Reports Server (NTRS)

Juang, Jer-Nan; Chen, Ya-Chin; Longman, Richard W.

2000-01-01

A nonlinear optimization algorithm for linear predictive speech coding was developed early that not only optimizes the linear model coefficients for the open loop predictor, but does the optimization including the effects of quantization of the transmitted residual. It also simultaneously optimizes the quantization levels used for each speech segment. In this paper, we present an improved method for initialization of this nonlinear algorithm, and demonstrate substantial improvements in performance. In addition, the new procedure produces monotonically improving speech quality with increasing numbers of bits used in the transmitted error residual. Examples of speech encoding and decoding are given for 8 speech segments and signal to noise levels as high as 47 dB are produced. As in typical linear predictive coding, the optimization is done on the open loop speech analysis model. Here we demonstrate that minimizing the error of the closed loop speech reconstruction, instead of the simpler open loop optimization, is likely to produce negligible improvement in speech quality. The examples suggest that the algorithm here is close to giving the best performance obtainable from a linear model, for the chosen order with the chosen number of bits for the codebook.

The effects of speech production and vocabulary training on different components of spoken language performance.

PubMed

Paatsch, Louise E; Blamey, Peter J; Sarant, Julia Z; Bow, Catherine P

2006-01-01

A group of 21 hard-of-hearing and deaf children attending primary school were trained by their teachers on the production of selected consonants and on the meanings of selected words. Speech production, vocabulary knowledge, reading aloud, and speech perception measures were obtained before and after each type of training. The speech production training produced a small but significant improvement in the percentage of consonants correctly produced in words. The vocabulary training improved knowledge of word meanings substantially. Performance on speech perception and reading aloud were significantly improved by both types of training. These results were in accord with the predictions of a mathematical model put forward to describe the relationships between speech perception, speech production, and language measures in children (Paatsch, Blamey, Sarant, Martin, & Bow, 2004). These training data demonstrate that the relationships between the measures are causal. In other words, improvements in speech production and vocabulary performance produced by training will carry over into predictable improvements in speech perception and reading scores. Furthermore, the model will help educators identify the most effective methods of improving receptive and expressive spoken language for individual children who are deaf or hard of hearing.
Noise Hampers Children’s Expressive Word Learning

PubMed Central

Riley, Kristine Grohne; McGregor, Karla K.

2013-01-01

Purpose To determine the effects of noise and speech style on word learning in typically developing school-age children. Method Thirty-one participants ages 9;0 (years; months) to 10;11 attempted to learn 2 sets of 8 novel words and their referents. They heard all of the words 13 times each within meaningful narrative discourse. Signal-to-noise ratio (noise vs. quiet) and speech style (plain vs. clear) were manipulated such that half of the children heard the new words in broadband white noise and half heard them in quiet; within those conditions, each child heard one set of words produced in a plain speech style and another set in a clear speech style. Results Children who were trained in quiet learned to produce the word forms more accurately than those who were trained in noise. Clear speech resulted in more accurate word form productions than plain speech, whether the children had learned in noise or quiet. Learning from clear speech in noise and plain speech in quiet produced comparable results. Conclusion Noise limits expressive vocabulary growth in children, reducing the quality of word form representation in the lexicon. Clear speech input can aid expressive vocabulary growth in children, even in noisy environments. PMID:22411494
Perception of the Voicing Distinction in Speech Produced during Simultaneous Communication

ERIC Educational Resources Information Center

MacKenzie, Douglas J.; Schiavetti, Nicholas; Whitehead, Robert L.; Metz, Dale Evan

2006-01-01

This study investigated the perception of voice onset time (VOT) in speech produced during simultaneous communication (SC). Four normally hearing, experienced sign language users were recorded under SC and speech alone (SA) conditions speaking stimulus words with voiced and voiceless initial consonants embedded in a sentence. Twelve…
Gesturing with an injured brain: How gesture helps children with early brain injury learn linguistic constructions

PubMed Central

Özçalışkan, Şeyda; Levine, Susan C.; Goldin-Meadow, Susan

2013-01-01

Children with pre/perinatal unilateral brain lesions (PL) show remarkable plasticity for language development. Is this plasticity characterized by the same developmental trajectory that characterizes typically developing (TD) children, with gesture leading the way into speech? We explored this question, comparing 11 children with PL—matched to 30 TD children on expressive vocabulary—in the second year of life. Children with PL showed similarities to TD children for simple but not complex sentence types. Children with PL produced simple sentences across gesture and speech several months before producing them entirely in speech, exhibiting parallel delays in both gesture+speech and speech-alone. However, unlike TD children, children with PL produced complex sentence types first in speech-alone. Overall, the gesture-speech system appears to be a robust feature of language-learning for simple—but not complex—sentence constructions, acting as a harbinger of change in language development even when that language is developing in an injured brain. PMID:23217292
Preschool speech error patterns predict articulation and phonological awareness outcomes in children with histories of speech sound disorders.

PubMed

Preston, Jonathan L; Hull, Margaret; Edwards, Mary Louise

2013-05-01

To determine if speech error patterns in preschoolers with speech sound disorders (SSDs) predict articulation and phonological awareness (PA) outcomes almost 4 years later. Twenty-five children with histories of preschool SSDs (and normal receptive language) were tested at an average age of 4;6 (years;months) and were followed up at age 8;3. The frequency of occurrence of preschool distortion errors, typical substitution and syllable structure errors, and atypical substitution and syllable structure errors was used to predict later speech sound production, PA, and literacy outcomes. Group averages revealed below-average school-age articulation scores and low-average PA but age-appropriate reading and spelling. Preschool speech error patterns were related to school-age outcomes. Children for whom >10% of their speech sound errors were atypical had lower PA and literacy scores at school age than children who produced <10% atypical errors. Preschoolers who produced more distortion errors were likely to have lower school-age articulation scores than preschoolers who produced fewer distortion errors. Different preschool speech error patterns predict different school-age clinical outcomes. Many atypical speech sound errors in preschoolers may be indicative of weak phonological representations, leading to long-term PA weaknesses. Preschoolers' distortions may be resistant to change over time, leading to persisting speech sound production problems.
English speech sound development in preschool-aged children from bilingual English-Spanish environments.

PubMed

Gildersleeve-Neumann, Christina E; Kester, Ellen S; Davis, Barbara L; Peña, Elizabeth D

2008-07-01

English speech acquisition by typically developing 3- to 4-year-old children with monolingual English was compared to English speech acquisition by typically developing 3- to 4-year-old children with bilingual English-Spanish backgrounds. We predicted that exposure to Spanish would not affect the English phonetic inventory but would increase error frequency and type in bilingual children. Single-word speech samples were collected from 33 children. Phonetically transcribed samples for the 3 groups (monolingual English children, English-Spanish bilingual children who were predominantly exposed to English, and English-Spanish bilingual children with relatively equal exposure to English and Spanish) were compared at 2 time points and for change over time for phonetic inventory, phoneme accuracy, and error pattern frequencies. Children demonstrated similar phonetic inventories. Some bilingual children produced Spanish phonemes in their English and produced few consonant cluster sequences. Bilingual children with relatively equal exposure to English and Spanish averaged more errors than did bilingual children who were predominantly exposed to English. Both bilingual groups showed higher error rates than English-only children overall, particularly for syllable-level error patterns. All language groups decreased in some error patterns, although the ones that decreased were not always the same across language groups. Some group differences of error patterns and accuracy were significant. Vowel error rates did not differ by language group. Exposure to English and Spanish may result in a higher English error rate in typically developing bilinguals, including the application of Spanish phonological properties to English. Slightly higher error rates are likely typical for bilingual preschool-aged children. Change over time at these time points for all 3 groups was similar, suggesting that all will reach an adult-like system in English with exposure and practice.
Perceived Liveliness and Speech Comprehensibility in Aphasia: The Effects of Direct Speech in Auditory Narratives

ERIC Educational Resources Information Center

Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

2014-01-01

Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…
Clear Speech Modifications in Children Aged 6-10

NASA Astrophysics Data System (ADS)

Taylor, Griffin Lijding

Modifications to speech production made by adult talkers in response to instructions to speak clearly have been well documented in the literature. Targeting adult populations has been motivated by efforts to improve speech production for the benefit of the communication partners, however, many adults also have communication partners who are children. Surprisingly, there is limited literature on whether children can change their speech production when cued to speak clearly. Pettinato, Tuomainen, Granlund, and Hazan (2016) showed that by age 12, children exhibited enlarged vowel space areas and reduced articulation rate when prompted to speak clearly, but did not produce any other adult-like clear speech modifications in connected speech. Moreover, Syrett and Kawahara (2013) suggested that preschoolers produced longer and more intense vowels when prompted to speak clearly at the word level. These findings contrasted with adult talkers who show significant temporal and spectral differences between speech produced in control and clear speech conditions. Therefore, it was the purpose of this study to analyze changes in temporal and spectral characteristics of speech production that children aged 6-10 made in these experimental conditions. It is important to elucidate the clear speech profile of this population to better understand which adult-like clear speech modifications they make spontaneously and which modifications are still developing. Understanding these baselines will advance future studies that measure the impact of more explicit instructions and children's abilities to better accommodate their interlocutors, which is a critical component of children's pragmatic and speech-motor development.
Differentiating primary progressive aphasias in a brief sample of connected speech

PubMed Central

Evans, Emily; O'Shea, Jessica; Powers, John; Boller, Ashley; Weinberg, Danielle; Haley, Jenna; McMillan, Corey; Irwin, David J.; Rascovsky, Katya; Grossman, Murray

2013-01-01

Objective: A brief speech expression protocol that can be administered and scored without special training would aid in the differential diagnosis of the 3 principal forms of primary progressive aphasia (PPA): nonfluent/agrammatic PPA, logopenic variant PPA, and semantic variant PPA. Methods: We used a picture-description task to elicit a short speech sample, and we evaluated impairments in speech-sound production, speech rate, lexical retrieval, and grammaticality. We compared the results with those obtained by a longer, previously validated protocol and further validated performance with multimodal imaging to assess the neuroanatomical basis of the deficits. Results: We found different patterns of impaired grammar in each PPA variant, and additional language production features were impaired in each: nonfluent/agrammatic PPA was characterized by speech-sound errors; logopenic variant PPA by dysfluencies (false starts and hesitations); and semantic variant PPA by poor retrieval of nouns. Strong correlations were found between this brief speech sample and a lengthier narrative speech sample. A composite measure of grammaticality and other measures of speech production were correlated with distinct regions of gray matter atrophy and reduced white matter fractional anisotropy in each PPA variant. Conclusions: These findings provide evidence that large-scale networks are required for fluent, grammatical expression; that these networks can be selectively disrupted in PPA syndromes; and that quantitative analysis of a brief speech sample can reveal the corresponding distinct speech characteristics. PMID:23794681
Intonation Contrast in Cantonese Speakers with Hypokinetic Dysarthria Associated with Parkinson's Disease

ERIC Educational Resources Information Center

Ma, Joan K.-Y.; Whitehill, Tara L.; So, Susanne Y.-S.

2010-01-01

Purpose: Speech produced by individuals with hypokinetic dysarthria associated with Parkinson's disease (PD) is characterized by a number of features including impaired speech prosody. The purpose of this study was to investigate intonation contrasts produced by this group of speakers. Method: Speech materials with a question-statement contrast…
Talker Differences in Clear and Conversational Speech: Acoustic Characteristics of Vowels

ERIC Educational Resources Information Center

Ferguson, Sarah Hargus; Kewley-Port, Diane

2007-01-01

Purpose: To determine the specific acoustic changes that underlie improved vowel intelligibility in clear speech. Method: Seven acoustic metrics were measured for conversational and clear vowels produced by 12 talkers--6 who previously were found (S. H. Ferguson, 2004) to produce a large clear speech vowel intelligibility effect for listeners with…
Emotional and Physiological Responses of Fluent Listeners while Watching the Speech of Adults Who Stutter

ERIC Educational Resources Information Center

Guntupalli, Vijaya K.; Everhart, D. Erik; Kalinowski, Joseph; Nanjundeswaran, Chayadevie; Saltuklaroglu, Tim

2007-01-01

Background: People who stutter produce speech that is characterized by intermittent, involuntary part-word repetitions and prolongations. In addition to these signature acoustic manifestations, those who stutter often display repetitive and fixated behaviours outside the speech producing mechanism (e.g. in the head, arm, fingers, nares, etc.).…
Phonological processes in the speech of school-age children with hearing loss: Comparisons with children with normal hearing.

PubMed

Asad, Areej Nimer; Purdy, Suzanne C; Ballard, Elaine; Fairgray, Liz; Bowen, Caroline

2018-04-27

In this descriptive study, phonological processes were examined in the speech of children aged 5;0-7;6 (years; months) with mild to profound hearing loss using hearing aids (HAs) and cochlear implants (CIs), in comparison to their peers. A second aim was to compare phonological processes of HA and CI users. Children with hearing loss (CWHL, N = 25) were compared to children with normal hearing (CWNH, N = 30) with similar age, gender, linguistic, and socioeconomic backgrounds. Speech samples obtained from a list of 88 words, derived from three standardized speech tests, were analyzed using the CASALA (Computer Aided Speech and Language Analysis) program to evaluate participants' phonological systems, based on lax (a process appeared at least twice in the speech of at least two children) and strict (a process appeared at least five times in the speech of at least two children) counting criteria. Developmental phonological processes were eliminated in the speech of younger and older CWNH while eleven developmental phonological processes persisted in the speech of both age groups of CWHL. CWHL showed a similar trend of age of elimination to CWNH, but at a slower rate. Children with HAs and CIs produced similar phonological processes. Final consonant deletion, weak syllable deletion, backing, and glottal replacement were present in the speech of HA users, affecting their overall speech intelligibility. Developmental and non-developmental phonological processes persist in the speech of children with mild to profound hearing loss compared to their peers with typical hearing. The findings indicate that it is important for clinicians to consider phonological assessment in pre-school CWHL and the use of evidence-based speech therapy in order to reduce non-developmental and non-age-appropriate developmental processes, thereby enhancing their speech intelligibility. Copyright © 2018 Elsevier Inc. All rights reserved.
The Acoustic Correlates of Perceived Masculinity, Perceived Femininity, and Perceived Sexual Orientation

ERIC Educational Resources Information Center

Munson, Benjamin

2007-01-01

Previous studies have shown that a subset of gay, lesbian, and bisexual (GLB) and heterosexual adults produce distinctive patterns of phonetic variation that allow listeners to detect their sexual orientation from audio-only samples of read speech. The current investigation examined the extent to which judgments of sexual orientation from speech…
Automated Intelligibility Assessment of Pathological Speech Using Phonological Features

NASA Astrophysics Data System (ADS)

Middag, Catherine; Martens, Jean-Pierre; Van Nuffelen, Gwen; De Bodt, Marc

2009-12-01

It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words) and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008) is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.
Spectral and temporal changes to speech produced in the presence of energetic and informational maskers.

PubMed

Cooke, Martin; Lu, Youyi

2010-10-01

Talkers change the way they speak in noisy conditions. For energetic maskers, speech production changes are relatively well-understood, but less is known about how informational maskers such as competing speech affect speech production. The current study examines the effect of energetic and informational maskers on speech production by talkers speaking alone or in pairs. Talkers produced speech in quiet and in backgrounds of speech-shaped noise, speech-modulated noise, and competing speech. Relative to quiet, speech output level and fundamental frequency increased and spectral tilt flattened in proportion to the energetic masking capacity of the background. In response to modulated backgrounds, talkers were able to reduce substantially the degree of temporal overlap with the noise, with greater reduction for the competing speech background. Reduction in foreground-background overlap can be expected to lead to a release from both energetic and informational masking for listeners. Passive changes in speech rate, mean pause length or pause distribution cannot explain the overlap reduction, which appears instead to result from a purposeful process of listening while speaking. Talkers appear to monitor the background and exploit upcoming pauses, a strategy which is particularly effective for backgrounds containing intelligible speech.
Co-Thought and Co-Speech Gestures Are Generated by the Same Action Generation Process

ERIC Educational Resources Information Center

Chu, Mingyuan; Kita, Sotaro

2016-01-01

People spontaneously gesture when they speak (co-speech gestures) and when they solve problems silently (co-thought gestures). In this study, we first explored the relationship between these 2 types of gestures and found that individuals who produced co-thought gestures more frequently also produced co-speech gestures more frequently (Experiments…
The Disfluent Speech of Bilingual Spanish-English Children: Considerations for Differential Diagnosis of Stuttering

ERIC Educational Resources Information Center

Byrd, Courtney T.; Bedore, Lisa M.; Ramos, Daniel

2015-01-01

Purpose: The primary purpose of this study was to describe the frequency and types of speech disfluencies that are produced by bilingual Spanish-English (SE) speaking children who do not stutter. The secondary purpose was to determine whether their disfluent speech is mediated by language dominance and/or language produced. Method: Spanish and…
The effect of filtered speech feedback on the frequency of stuttering

NASA Astrophysics Data System (ADS)

Rami, Manish Krishnakant

2000-10-01

This study investigated the effects of filtered components of speech and whispered speech on the frequency of stuttering. It is known that choral speech, shadowing, and altered auditory feedback are the only conditions which induce fluency without any additional effort than normally required to speak on the part of people who stutter. All these conditions use speech as a second signal. This experiment examined the role of components of speech signal as delineated by the source- filter theory of speech production. Three filtered speech signals, a whispered speech signal, and a choral speech signal formed the stimuli. It was postulated that if the speech signal in whole was necessary for producing fluency in people who stutter, then all other conditions except choral speech should fail to produce fluency enhancement. If the glottal source alone was adequate in restoring fluency, then only the conditions of NAF and whispered speech should fail in promoting fluency. In the event that full filter characteristics are necessary for the fluency creating effects, then all conditions except the choral speech and whispered speech should fail to produce fluency. If any part of the filter characteristics is sufficient in yielding fluency, then only the NAF and the approximate glottal source should fail to demonstrate an increase in the amount of fluency. Twelve adults who stuttered read passages under the six conditions while receiving auditory feedback consisting of one of the six experimental conditions: (a)NAF; (b)approximate glottal source; (c)glottal source and first formant; (d)glottal source and first two formants; and (e)whispered speech. Frequencies of stuttering were obtained for each condition and submitted to descriptive and inferential statistical analysis. Statistically significant differences in means were found within the choral feedback conditions. Specifically, the choral speech, the source and first formant, source and the first two formants, and the whispered speech conditions all decreased the frequency of stuttering while the approximate glottal source did not. It is suggested that articulatory events, chiefly the encoded speech output of the vocal tract origin, afford effective cues and induces fluent speech in people who stutter.
Lip movements affect infants' audiovisual speech perception.

PubMed

Yeung, H Henny; Werker, Janet F

2013-05-01

Speech is robustly audiovisual from early in infancy. Here we show that audiovisual speech perception in 4.5-month-old infants is influenced by sensorimotor information related to the lip movements they make while chewing or sucking. Experiment 1 consisted of a classic audiovisual matching procedure, in which two simultaneously displayed talking faces (visual [i] and [u]) were presented with a synchronous vowel sound (audio /i/ or /u/). Infants' looking patterns were selectively biased away from the audiovisual matching face when the infants were producing lip movements similar to those needed to produce the heard vowel. Infants' looking patterns returned to those of a baseline condition (no lip movements, looking longer at the audiovisual matching face) when they were producing lip movements that did not match the heard vowel. Experiment 2 confirmed that these sensorimotor effects interacted with the heard vowel, as looking patterns differed when infants produced these same lip movements while seeing and hearing a talking face producing an unrelated vowel (audio /a/). These findings suggest that the development of speech perception and speech production may be mutually informative.

Objective measurement of motor speech characteristics in the healthy pediatric population.

PubMed

Wong, A W; Allegro, J; Tirado, Y; Chadha, N; Campisi, P

2011-12-01

To obtain objective measurements of motor speech characteristics in normal children, using a computer-based motor speech software program. Cross-sectional, observational design in a university-based ambulatory pediatric otolaryngology clinic. Participants included 112 subjects (54 females and 58 males) aged 4-18 years. Participants with previously diagnosed hearing loss, voice and motor disorders, and children unable to repeat a passage in English were excluded. Voice samples were recorded and analysed using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ). The MSP produced measures of diadochokinetics, second formant transition, intonation, and syllabic rates. Demographic data, including sex, age, and cigarette smoke exposure were obtained. Normative data for several motor speech characteristics were derived for children ranging from age 4 to 18 years. A number of age-dependent changes were indentified, including an increase in average diadochokinetic rate (p<0.001) and standard syllabic duration (p<0.001) with age. There were no identified differences in motor speech characteristics between males and females across the measured age range. Variations in fundamental frequency (Fo) during speech did not change significantly with age for both males and females. To our knowledge, this is the first pediatric normative database for the MSP progam. The MSP is suitable for testing children and can be used to study developmental changes in motor speech. The analysis demonstrated that males and females behave similarly and show the same relationship with age for the motor speech characteristics studied. This normative database will provide essential comparative data for future studies exploring alterations in motor speech that may occur with hearing, voice, and motor disorders and to assess the results of targeted therapies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Effects of utterance length and vocal loudness on speech breathing in older adults.

PubMed

Huber, Jessica E

2008-12-31

Age-related reductions in pulmonary elastic recoil and respiratory muscle strength can affect how older adults generate subglottal pressure required for speech production. The present study examined age-related changes in speech breathing by manipulating utterance length and loudness during a connected speech task (monologue). Twenty-three older adults and twenty-eight young adults produced a monologue at comfortable loudness and pitch and with multi-talker babble noise playing in the room to elicit louder speech. Dependent variables included sound pressure level, speech rate, and lung volume initiation, termination, and excursion. Older adults produced shorter utterances than young adults overall. Age-related effects were larger for longer utterances. Older adults demonstrated very different lung volume adjustments for loud speech than young adults. These results suggest that older adults have a more difficult time when the speech system is being taxed by both utterance length and loudness. The data were consistent with the hypothesis that both young and older adults use utterance length in premotor speech planning processes.
Fluid-acoustic interactions and their impact on pathological voiced speech

NASA Astrophysics Data System (ADS)

Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.; Plesniak, Michael W.

2011-11-01

Voiced speech is produced by vibration of the vocal fold structures. Vocal fold dynamics arise from aerodynamic pressure loadings, tissue properties, and acoustic modulation of the driving pressures. Recent speech science advancements have produced a physiologically-realistic fluid flow solver (BLEAP) capable of prescribing asymmetric intraglottal flow attachment that can be easily assimilated into reduced order models of speech. The BLEAP flow solver is extended to incorporate acoustic loading and sound propagation in the vocal tract by implementing a wave reflection analog approach for sound propagation based on the governing BLEAP equations. This enhanced physiological description of the physics of voiced speech is implemented into a two-mass model of speech. The impact of fluid-acoustic interactions on vocal fold dynamics is elucidated for both normal and pathological speech through linear and nonlinear analysis techniques. Supported by NSF Grant CBET-1036280.
Acquisition of speech rhythm in first language.

PubMed

Polyanskaya, Leona; Ordin, Mikhail

2015-09-01

Analysis of English rhythm in speech produced by children and adults revealed that speech rhythm becomes increasingly more stress-timed as language acquisition progresses. Children reach the adult-like target by 11 to 12 years. The employed speech elicitation paradigm ensured that the sentences produced by adults and children at different ages were comparable in terms of lexical content, segmental composition, and phonotactic complexity. Detected differences between child and adult rhythm and between rhythm in child speech at various ages cannot be attributed to acquisition of phonotactic language features or vocabulary, and indicate the development of language-specific phonetic timing in the course of acquisition.
Tongue-Palate Contact Pressure, Oral Air Pressure, and Acoustics of Clear Speech

ERIC Educational Resources Information Center

Searl, Jeff; Evitts, Paul M.

2013-01-01

Purpose: The authors compared articulatory contact pressure (ACP), oral air pressure (Po), and speech acoustics for conversational versus clear speech. They also assessed the relationship of these measures to listener perception. Method: Twelve adults with normal speech produced monosyllables in a phrase using conversational and clear speech.…
A universal multilingual weightless neural network tagger via quantitative linguistics.

PubMed

Carneiro, Hugo C C; Pedreira, Carlos E; França, Felipe M G; Lima, Priscila M V

2017-07-01

In the last decade, given the availability of corpora in several distinct languages, research on multilingual part-of-speech tagging started to grow. Amongst the novelties there is mWANN-Tagger (multilingual weightless artificial neural network tagger), a weightless neural part-of-speech tagger capable of being used for mostly-suffix-oriented languages. The tagger was subjected to corpora in eight languages of quite distinct natures and had a remarkable accuracy with very low sample deviation in every one of them, indicating the robustness of weightless neural systems for part-of-speech tagging tasks. However, mWANN-Tagger needed to be tuned for every new corpus, since each one required a different parameter configuration. For mWANN-Tagger to be truly multilingual, it should be usable for any new language with no need of parameter tuning. This article proposes a study that aims to find a relation between the lexical diversity of a language and the parameter configuration that would produce the best performing mWANN-Tagger instance. Preliminary analyses suggested that a single parameter configuration may be applied to the eight aforementioned languages. The mWANN-Tagger instance produced by this configuration was as accurate as the language-dependent ones obtained through tuning. Afterwards, the weightless neural tagger was further subjected to new corpora in languages that range from very isolating to polysynthetic ones. The best performing instances of mWANN-Tagger are again the ones produced by the universal parameter configuration. Hence, mWANN-Tagger can be applied to new corpora with no need of parameter tuning, making it a universal multilingual part-of-speech tagger. Further experiments with Universal Dependencies treebanks reveal that mWANN-Tagger may be extended and that it has potential to outperform most state-of-the-art part-of-speech taggers if better word representations are provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
Effect of concurrent walking and interlocutor distance on conversational speech intensity and rate in Parkinson's disease.

PubMed

McCaig, Cassandra M; Adams, Scott G; Dykstra, Allyson D; Jog, Mandar

2016-01-01

Previous studies have demonstrated a negative effect of concurrent walking and talking on gait in Parkinson's disease (PD) but there is limited information about the effect of concurrent walking on speech production. The present study examined the effect of sitting, standing, and three concurrent walking tasks (slow, normal, fast) on conversational speech intensity and speech rate in fifteen individuals with hypophonia related to idiopathic Parkinson's disease (PD) and fourteen age-equivalent controls. Interlocuter (talker-to-talker) distance effects and walking speed were also examined. Concurrent walking was found to produce a significant increase in speech intensity, relative to standing and sitting, in both the control and PD groups. Faster walking produced significantly greater speech intensity than slower walking. Concurrent walking had no effect on speech rate. Concurrent walking and talking produced significant reductions in walking speed in both the control and PD groups. In general, the results of the present study indicate that concurrent walking tasks and the speed of concurrent walking can have a significant positive effect on conversational speech intensity. These positive, "energizing" effects need to be given consideration in future attempts to develop a comprehensive model of speech intensity regulation and they may have important implications for the development of new evaluation and treatment procedures for individuals with hypophonia related to PD. Crown Copyright © 2015. Published by Elsevier B.V. All rights reserved.
The Prevalence of Speech Disorders among University Students in Jordan

ERIC Educational Resources Information Center

Alaraifi, Jehad Ahmad; Amayreh, Mousa Mohammad; Saleh, Mohammad Yusef

2014-01-01

Problem: There are no available studies on the prevalence, and distribution of speech disorders among Arabic speaking undergraduate students in Jordan. Method: A convenience sample of 400 undergraduate students at the University of Jordan was screened for speech disorders. Two spontaneous speech samples and an oral reading of a passage were…
Effects of linear and nonlinear speech rate changes on speech intelligibility in stationary and fluctuating maskers

PubMed Central

Cooke, Martin; Aubanel, Vincent

2017-01-01

Algorithmic modifications to the durational structure of speech designed to avoid intervals of intense masking lead to increases in intelligibility, but the basis for such gains is not clear. The current study addressed the possibility that the reduced information load produced by speech rate slowing might explain some or all of the benefits of durational modifications. The study also investigated the influence of masker stationarity on the effectiveness of durational changes. Listeners identified keywords in sentences that had undergone linear and nonlinear speech rate changes resulting in overall temporal lengthening in the presence of stationary and fluctuating maskers. Relative to unmodified speech, a slower speech rate produced no intelligibility gains for the stationary masker, suggesting that a reduction in information rate does not underlie intelligibility benefits of durationally modified speech. However, both linear and nonlinear modifications led to substantial intelligibility increases in fluctuating noise. One possibility is that overall increases in speech duration provide no new phonetic information in stationary masking conditions, but that temporal fluctuations in the background increase the likelihood of glimpsing additional salient speech cues. Alternatively, listeners may have benefitted from an increase in the difference in speech rates between the target and background. PMID:28618803
Vector adaptive predictive coder for speech and audio

NASA Technical Reports Server (NTRS)

Chen, Juin-Hwey (Inventor); Gersho, Allen (Inventor)

1990-01-01

A real-time vector adaptive predictive coder which approximates each vector of K speech samples by using each of M fixed vectors in a first codebook to excite a time-varying synthesis filter and picking the vector that minimizes distortion. Predictive analysis for each frame determines parameters used for computing from vectors in the first codebook zero-state response vectors that are stored at the same address (index) in a second codebook. Encoding of input speech vectors s.sub.n is then carried out using the second codebook. When the vector that minimizes distortion is found, its index is transmitted to a decoder which has a codebook identical to the first codebook of the decoder. There the index is used to read out a vector that is used to synthesize an output speech vector s.sub.n. The parameters used in the encoder are quantized, for example by using a table, and the indices are transmitted to the decoder where they are decoded to specify transfer characteristics of filters used in producing the vector s.sub.n from the receiver codebook vector selected by the vector index transmitted.
Do parents lead their children by the hand?

PubMed

Ozçalişkan, Seyda; Goldin-Meadow, Susan

2005-08-01

The types of gesture+speech combinations children produce during the early stages of language development change over time. This change, in turn, predicts the onset of two-word speech and thus might reflect a cognitive transition that the child is undergoing. An alternative, however, is that the change merely reflects changes in the types of gesture + speech combinations that their caregivers produce. To explore this possibility, we videotaped 40 American child-caregiver dyads in their homes for 90 minutes when the children were 1;2, 1;6, and 1;10. Each gesture was classified according to type (deictic, conventional, representational) and the relation it held to speech (reinforcing, disambiguating, supplementary). Children and their caregivers produced the same types of gestures and in approximately the same distribution. However, the children differed from their caregivers in the way they used gesture in relation to speech. Over time, children produced many more REINFORCING (bike+point at bike), DISAMBIGUATING (that one+ point at bike), and SUPPLEMENTARY combinations (ride+point at bike). In contrast, the frequency and distribution of caregivers' gesture+speech combinations remained constant over time. Thus, the changing relation between gesture and speech observed in the children cannot be traced back to the gestural input the children receive. Rather, it appears to reflect changes in the children's own skills, illustrating once again gesture's ability to shed light on developing cognitive and linguistic processes.
A novel speech-processing strategy incorporating tonal information for cochlear implants.

PubMed

Lan, N; Nie, K B; Gao, S K; Zeng, F G

2004-05-01

Good performance in cochlear implant users depends in large part on the ability of a speech processor to effectively decompose speech signals into multiple channels of narrow-band electrical pulses for stimulation of the auditory nerve. Speech processors that extract only envelopes of the narrow-band signals (e.g., the continuous interleaved sampling (CIS) processor) may not provide sufficient information to encode the tonal cues in languages such as Chinese. To improve the performance in cochlear implant users who speak tonal language, we proposed and developed a novel speech-processing strategy, which extracted both the envelopes of the narrow-band signals and the fundamental frequency (F0) of the speech signal, and used them to modulate both the amplitude and the frequency of the electrical pulses delivered to stimulation electrodes. We developed an algorithm to extract the fundatmental frequency and identified the general patterns of pitch variations of four typical tones in Chinese speech. The effectiveness of the extraction algorithm was verified with an artificial neural network that recognized the tonal patterns from the extracted F0 information. We then compared the novel strategy with the envelope-extraction CIS strategy in human subjects with normal hearing. The novel strategy produced significant improvement in perception of Chinese tones, phrases, and sentences. This novel processor with dynamic modulation of both frequency and amplitude is encouraging for the design of a cochlear implant device for sensorineurally deaf patients who speak tonal languages.
A Privileged Status for Male Infant-Directed Speech in Infants of Depressed Mothers? Role of Father Involvement

ERIC Educational Resources Information Center

Kaplan, Peter S.; Danko, Christina M.; Diaz, Andres

2010-01-01

Prior research showed that 5- to 13-month-old infants of chronically depressed mothers did not learn to associate a segment of infant-directed speech produced by their own mothers or an unfamiliar nondepressed mother with a smiling female face, but showed better-than-normal learning when a segment of infant-directed speech produced by an…
Acoustics of Clear Speech: Effect of Instruction

ERIC Educational Resources Information Center

Lam, Jennifer; Tjaden, Kris; Wilding, Greg

2012-01-01

Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…
Lip Movement Exaggerations During Infant-Directed Speech

PubMed Central

Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana

2011-01-01

Purpose Although a growing body of literature has indentified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their infants. Method Lip movements were recorded from 25 mothers as they spoke to their infants and other adults. Lip shapes were analyzed for differences across speaking conditions. The maximum fundamental frequency, duration, acoustic intensity, and first and second formant frequency of each vowel also were measured. Results Lip movements were significantly larger during IDS than during adult-directed speech, although the exaggerations were vowel specific. All of the vowels produced during IDS were characterized by an elevated vocal pitch and a slowed speaking rate when compared with vowels produced during adult-directed speech. Conclusion The pattern of lip-shape exaggerations did not provide support for the hypothesis that mothers produce exemplar visual models of vowels during IDS. Future work is required to determine whether the observed increases in vertical lip aperture engender visual and acoustic enhancements that facilitate the early learning of speech. PMID:20699342
The right hemisphere is highlighted in connected natural speech production and perception.

PubMed

Alexandrou, Anna Maria; Saarinen, Timo; Mäkelä, Sasu; Kujala, Jan; Salmelin, Riitta

2017-05-15

Current understanding of the cortical mechanisms of speech perception and production stems mostly from studies that focus on single words or sentences. However, it has been suggested that processing of real-life connected speech may rely on additional cortical mechanisms. In the present study, we examined the neural substrates of natural speech production and perception with magnetoencephalography by modulating three central features related to speech: amount of linguistic content, speaking rate and social relevance. The amount of linguistic content was modulated by contrasting natural speech production and perception to speech-like non-linguistic tasks. Meaningful speech was produced and perceived at three speaking rates: normal, slow and fast. Social relevance was probed by having participants attend to speech produced by themselves and an unknown person. These speech-related features were each associated with distinct spatiospectral modulation patterns that involved cortical regions in both hemispheres. Natural speech processing markedly engaged the right hemisphere in addition to the left. In particular, the right temporo-parietal junction, previously linked to attentional processes and social cognition, was highlighted in the task modulations. The present findings suggest that its functional role extends to active generation and perception of meaningful, socially relevant speech. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Speech Characteristics of Patients with Pallido-Ponto-Nigral Degeneration and Their Application to Presymptomatic Detection in At-Risk Relatives

ERIC Educational Resources Information Center

Liss, Julie M.; Krein-Jones, Kari; Wszolek, Zbigniew K.; Caviness, John N.

2006-01-01

Purpose: This report describes the speech characteristics of individuals with a neurodegenerative syndrome called pallido-ponto-nigral degeneration (PPND) and examines the speech samples of at-risk, but asymptomatic, relatives for possible preclinical detection. Method: Speech samples of 9 members of a PPND kindred were subjected to perceptual…
Swahili speech development: preliminary normative data from typically developing pre-school children in Tanzania.

PubMed

Gangji, Nazneen; Pascoe, Michelle; Smouse, Mantoa

2015-01-01

Swahili is widely spoken in East Africa, but to date there are no culturally and linguistically appropriate materials available for speech-language therapists working in the region. The challenges are further exacerbated by the limited research available on the typical acquisition of Swahili phonology. To describe the speech development of 24 typically developing first language Swahili-speaking children between the ages of 3;0 and 5;11 years in Dar es Salaam, Tanzania. A cross-sectional design was used with six groups of four children in 6-month age bands. Single-word speech samples were obtained from each child using a set of culturally appropriate pictures designed to elicit all consonants and vowels of Swahili. Each child's speech was audio-recorded and phonetically transcribed using International Phonetic Alphabet (IPA) conventions. Children's speech development is described in terms of (1) phonetic inventory, (2) syllable structure inventory, (3) phonological processes and (4) percentage consonants correct (PCC) and percentage vowels correct (PVC). Results suggest a gradual progression in the acquisition of speech sounds and syllables between the ages of 3;0 and 5;11 years. Vowel acquisition was completed and most of the consonants acquired by age 3;0. Fricatives/z, s, h/ were later acquired at 4 years and /θ/and /r/ were the last acquired consonants at age 5;11. Older children were able to produce speech sounds more accurately and had fewer phonological processes in their speech than younger children. Common phonological processes included lateralization and sound preference substitutions. The study contributes a preliminary set of normative data on speech development of Swahili-speaking children. Findings are discussed in relation to theories of phonological development, and may be used as a basis for further normative studies with larger numbers of children and ultimately the development of a contextually relevant assessment of the phonology of Swahili-speaking children. © 2014 Royal College of Speech and Language Therapists.
The influence of visual speech information on the intelligibility of English consonants produced by non-native speakers.

PubMed

Kawase, Saya; Hannah, Beverly; Wang, Yue

2014-09-01

This study examines how visual speech information affects native judgments of the intelligibility of speech sounds produced by non-native (L2) speakers. Native Canadian English perceivers as judges perceived three English phonemic contrasts (/b-v, θ-s, l-ɹ/) produced by native Japanese speakers as well as native Canadian English speakers as controls. These stimuli were presented under audio-visual (AV, with speaker voice and face), audio-only (AO), and visual-only (VO) conditions. The results showed that, across conditions, the overall intelligibility of Japanese productions of the native (Japanese)-like phonemes (/b, s, l/) was significantly higher than the non-Japanese phonemes (/v, θ, ɹ/). In terms of visual effects, the more visually salient non-Japanese phonemes /v, θ/ were perceived as significantly more intelligible when presented in the AV compared to the AO condition, indicating enhanced intelligibility when visual speech information is available. However, the non-Japanese phoneme /ɹ/ was perceived as less intelligible in the AV compared to the AO condition. Further analysis revealed that, unlike the native English productions, the Japanese speakers produced /ɹ/ without visible lip-rounding, indicating that non-native speakers' incorrect articulatory configurations may decrease the degree of intelligibility. These results suggest that visual speech information may either positively or negatively affect L2 speech intelligibility.
When infants talk, infants listen: pre-babbling infants prefer listening to speech with infant vocal properties.

PubMed

Masapollo, Matthew; Polka, Linda; Ménard, Lucie

2016-03-01

To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to vowel sounds with infant vocal properties over vowel sounds with adult vocal properties. A listening preference favoring infant vowels may derive from their higher voice pitch, which has been shown to attract infant attention in infant-directed speech (IDS). In addition, infants' nascent articulatory abilities may induce a bias favoring infant speech given that 4- to 6-month-olds are beginning to produce vowel sounds. We created infant and adult /i/ ('ee') vowels using a production-based synthesizer that simulates the act of speaking in talkers at different ages and then tested infants across four experiments using a sequential preferential listening task. The findings provide the first evidence that infants preferentially attend to vowel sounds with infant voice pitch and/or formants over vowel sounds with no infant-like vocal properties, supporting the view that infants' production abilities influence how they process infant speech. The findings with respect to voice pitch also reveal parallels between IDS and infant speech, raising new questions about the role of this speech register in infant development. Research exploring the underpinnings and impact of this perceptual bias can expand our understanding of infant language development. © 2015 John Wiley & Sons Ltd.

Inferring Speaker Affect in Spoken Natural Language Communication

ERIC Educational Resources Information Center

Pon-Barry, Heather Roberta

2013-01-01

The field of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards…
Do 6-Month-Olds Understand That Speech Can Communicate?

ERIC Educational Resources Information Center

Vouloumanos, Athena; Martin, Alia; Onishi, Kristine H.

2014-01-01

Adults and 12-month-old infants recognize that even unfamiliar speech can communicate information between third parties, suggesting that they can separate the communicative function of speech from its lexical content. But do infants recognize that speech can communicate due to their experience understanding and producing language, or do they…
Enhancing Speech Intelligibility: Interactions among Context, Modality, Speech Style, and Masker

ERIC Educational Resources Information Center

Van Engen, Kristin J.; Phelps, Jasmine E. B.; Smiljanic, Rajka; Chandrasekaran, Bharath

2014-01-01

Purpose: The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method: Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous…
Feasibility of automated speech sample collection with stuttering children using interactive voice response (IVR) technology.

PubMed

Vogel, Adam P; Block, Susan; Kefalianos, Elaina; Onslow, Mark; Eadie, Patricia; Barth, Ben; Conway, Laura; Mundt, James C; Reilly, Sheena

2015-04-01

To investigate the feasibility of adopting automated interactive voice response (IVR) technology for remotely capturing standardized speech samples from stuttering children. Participants were 10 6-year-old stuttering children. Their parents called a toll-free number from their homes and were prompted to elicit speech from their children using a standard protocol involving conversation, picture description and games. The automated IVR system was implemented using an off-the-shelf telephony software program and delivered by a standard desktop computer. The software infrastructure utilizes voice over internet protocol. Speech samples were automatically recorded during the calls. Video recordings were simultaneously acquired in the home at the time of the call to evaluate the fidelity of the telephone collected samples. Key outcome measures included syllables spoken, percentage of syllables stuttered and an overall rating of stuttering severity using a 10-point scale. Data revealed a high level of relative reliability in terms of intra-class correlation between the video and telephone acquired samples on all outcome measures during the conversation task. Findings were less consistent for speech samples during picture description and games. Results suggest that IVR technology can be used successfully to automate remote capture of child speech samples.
Auditory Serial Position Effects in Story Retelling for Non-Brain-Injured Participants and Persons with Aphasia

ERIC Educational Resources Information Center

Brodsky, Martin B.; McNeil, Malcolm R.; Doyle, Patrick J.; Fossett, Tepanata R. D.; Timm, Neil H.

2003-01-01

Using story retelling as an index of language ability, it is difficult to disambiguate comprehension and memory deficits. Collecting data on the serial position effect (SPE), however, illuminates the memory component. This study examined the SPE of the percentage of information units (%IU) produced in the connected speech samples of adults with…
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

PubMed

Greene, Beth G; Logan, John S; Pisoni, David B

1986-03-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.
Segmental intelligibility of synthetic speech produced by rule.

PubMed

Logan, J S; Greene, B G; Pisoni, D B

1989-08-01

This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk--Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener's processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener.
Segmental intelligibility of synthetic speech produced by rule

PubMed Central

Logan, John S.; Greene, Beth G.; Pisoni, David B.

2012-01-01

This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk—Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener’s processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener. PMID:2527884
Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

PubMed Central

Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

2016-01-01

Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
The impact of impaired semantic knowledge on spontaneous iconic gesture production

PubMed Central

Cocks, Naomi; Dipper, Lucy; Pritchard, Madeleine; Morgan, Gary

2013-01-01

Background Previous research has found that people with aphasia produce more spontaneous iconic gesture than control participants, especially during word-finding difficulties. There is some evidence that impaired semantic knowledge impacts on the diversity of gestural handshapes, as well as the frequency of gesture production. However, no previous research has explored how impaired semantic knowledge impacts on the frequency and type of iconic gestures produced during fluent speech compared with those produced during word-finding difficulties. Aims To explore the impact of impaired semantic knowledge on the frequency and type of iconic gestures produced during fluent speech and those produced during word-finding difficulties. Methods & Procedures A group of 29 participants with aphasia and 29 control participants were video recorded describing a cartoon they had just watched. All iconic gestures were tagged and coded as either “manner,” “path only,” “shape outline” or “other”. These gestures were then separated into either those occurring during fluent speech or those occurring during a word-finding difficulty. The relationships between semantic knowledge and gesture frequency and form were then investigated in the two different conditions. Outcomes & Results As expected, the participants with aphasia produced a higher frequency of iconic gestures than the control participants, but when the iconic gestures produced during word-finding difficulties were removed from the analysis, the frequency of iconic gesture was not significantly different between the groups. While there was not a significant relationship between the frequency of iconic gestures produced during fluent speech and semantic knowledge, there was a significant positive correlation between semantic knowledge and the proportion of word-finding difficulties that contained gesture. There was also a significant positive correlation between the speakers' semantic knowledge and the proportion of gestures that were produced during fluent speech that were classified as “manner”. Finally while not significant, there was a positive trend between semantic knowledge of objects and the production of “shape outline” gestures during word-finding difficulties for objects. Conclusions The results indicate that impaired semantic knowledge in aphasia impacts on both the iconic gestures produced during fluent speech and those produced during word-finding difficulties but in different ways. These results shed new light on the relationship between impaired language and iconic co-speech gesture production and also suggest that analysis of iconic gesture may be a useful addition to clinical assessment. PMID:24058228
Automatic Speech Recognition from Neural Signals: A Focused Review.

PubMed

Herff, Christian; Schultz, Tanja

2016-01-01

Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
Comparisons of stuttering frequency during and after speech initiation in unaltered feedback, altered auditory feedback and choral speech conditions.

PubMed

Saltuklaroglu, Tim; Kalinowski, Joseph; Robbins, Mary; Crawcour, Stephen; Bowers, Andrew

2009-01-01

Stuttering is prone to strike during speech initiation more so than at any other point in an utterance. The use of auditory feedback (AAF) has been found to produce robust decreases in the stuttering frequency by creating an electronic rendition of choral speech (i.e., speaking in unison). However, AAF requires users to self-initiate speech before it can go into effect and, therefore, it might not be as helpful as true choral speech during speech initiation. To examine how AAF and choral speech differentially enhance fluency during speech initiation and in subsequent portions of utterances. Ten participants who stuttered read passages without altered feedback (NAF), under four AAF conditions and under a true choral speech condition. Each condition was blocked into ten 10 s trials separated by 5 s intervals so each trial required 'cold' speech initiation. In the first analysis, comparisons of stuttering frequencies were made across conditions. A second, finer grain analysis involved examining stuttering frequencies on the initial syllable, the subsequent four syllables produced and the five syllables produced immediately after the midpoint of each trial. On average, AAF reduced stuttering by approximately 68% relative to the NAF condition. Stuttering frequencies on the initial syllables were considerably higher than on the other syllables analysed (0.45 and 0.34 for NAF and AAF conditions, respectively). After the first syllable was produced, stuttering frequencies dropped precipitously and remained stable. However, this drop in stuttering frequency was significantly greater (approximately 84%) in the AAF conditions than in the NAF condition (approximately 66%) with frequencies on the last nine syllables analysed averaging 0.15 and 0.05 for NAF and AAF conditions, respectively. In the true choral speech condition, stuttering was virtually (approximately 98%) eliminated across all utterances and all syllable positions. Altered auditory feedback effectively inhibits stuttering immediately after speech has been initiated. However, unlike a true choral signal, which is exogenously initiated and offers the most complete fluency enhancement, AAF requires speech to be initiated by the user and 'fed back' before it can directly inhibit stuttering. It is suggested that AAF can be a viable clinical option for those who stutter and should often be used in combination with therapeutic techniques, particularly those that aid speech initiation. The substantially higher rate of stuttering occurring on initiation supports a hypothesis that overt stuttering events help 'release' and 'inhibit' central stuttering blocks. This perspective is examined in the context of internal models and mirror neurons.
Why Should Speech Rate (Tempo) Be Integrated into Pronunciation Teaching Curriculum

ERIC Educational Resources Information Center

Yurtbasi, Meti

2015-01-01

The pace of speech i.e. tempo can be varied to our mood of the moment. Fast speech can convey urgency, whereas slower speech can be used for emphasis. In public speaking, orators produce powerful effects by varying the loudness and pace of their speech. The juxtaposition of very loud and very quiet utterances is a device often used by those trying…
Pulse Vector-Excitation Speech Encoder

NASA Technical Reports Server (NTRS)

Davidson, Grant; Gersho, Allen

1989-01-01

Proposed pulse vector-excitation speech encoder (PVXC) encodes analog speech signals into digital representation for transmission or storage at rates below 5 kilobits per second. Produces high quality of reconstructed speech, but with less computation than required by comparable speech-encoding systems. Has some characteristics of multipulse linear predictive coding (MPLPC) and of code-excited linear prediction (CELP). System uses mathematical model of vocal tract in conjunction with set of excitation vectors and perceptually-based error criterion to synthesize natural-sounding speech.
Evaluation of NASA speech encoder

NASA Technical Reports Server (NTRS)

1976-01-01

Techniques developed by NASA for spaceflight instrumentation were used in the design of a quantizer for speech-decoding. Computer simulation of the actions of the quantizer was tested with synthesized and real speech signals. Results were evaluated by a phometician. Topics discussed include the relationship between the number of quantizer levels and the required sampling rate; reconstruction of signals; digital filtering; speech recording, sampling, and storage, and processing results.
Reaction Times of Normal Listeners to Laryngeal, Alaryngeal, and Synthetic Speech

ERIC Educational Resources Information Center

Evitts, Paul M.; Searl, Jeff

2006-01-01

The purpose of this study was to compare listener processing demands when decoding alaryngeal compared to laryngeal speech. Fifty-six listeners were presented with single words produced by 1 proficient speaker from 5 different modes of speech: normal, tracheosophageal (TE), esophageal (ES), electrolaryngeal (EL), and synthetic speech (SS).…
Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data

PubMed Central

Cao, Beiming; Kim, Myungjong; Mau, Ted; Wang, Jun

2017-01-01

Individuals with larynx (vocal folds) impaired have problems in controlling their glottal vibration, producing whispered speech with extreme hoarseness. Standard automatic speech recognition using only acoustic cues is typically ineffective for whispered speech because the corresponding spectral characteristics are distorted. Articulatory cues such as the tongue and lip motion may help in recognizing whispered speech since articulatory motion patterns are generally not affected. In this paper, we investigated whispered speech recognition for patients with reconstructed larynx using articulatory movement data. A data set with both acoustic and articulatory motion data was collected from a patient with surgically reconstructed larynx using an electromagnetic articulograph. Two speech recognition systems, Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network-HMM (DNN-HMM), were used in the experiments. Experimental results showed adding either tongue or lip motion data to acoustic features such as mel-frequency cepstral coefficient (MFCC) significantly reduced the phone error rates on both speech recognition systems. Adding both tongue and lip data achieved the best performance. PMID:29423453
Imitation of contrastive lexical stress in children with speech delay

NASA Astrophysics Data System (ADS)

Vick, Jennell C.; Moore, Christopher A.

2005-09-01

This study examined the relationship between acoustic correlates of stress in trochaic (strong-weak), spondaic (strong-strong), and iambic (weak-strong) nonword bisyllables produced by children (30-50) with normal speech acquisition and children with speech delay. Ratios comparing the acoustic measures (vowel duration, rms, and f0) of the first syllable to the second syllable were calculated to evaluate the extent to which each phonetic parameter was used to mark stress. In addition, a calculation of the variability of jaw movement in each bisyllable was made. Finally, perceptual judgments of accuracy of stress production were made. Analysis of perceptual judgments indicated a robust difference between groups: While both groups of children produced errors in imitating the contrastive lexical stress models (~40%), the children with normal speech acquisition tended to produce trochaic forms in substitution for other stress types, whereas children with speech delay showed no preference for trochees. The relationship between segmental acoustic parameters, kinematic variability, and the ratings of stress by trained listeners will be presented.
Phonological neighborhood and word frequency effects in the stuttered disfluencies of children who stutter.

PubMed

Anderson, Julie D

2007-02-01

The purpose of this study was to examine (a) the role of neighborhood density (number of words that are phonologically similar to a target word) and frequency variables on the stuttering-like disfluencies of preschool children who stutter, and (b) whether these variables have an effect on the type of stuttering-like disfluency produced. A 500+ word speech sample was obtained from each participant (N = 15). Each stuttered word was randomly paired with the firstly produced word that closely matched it in grammatical class, familiarity, and number of syllables/phonemes. Frequency, neighborhood density, and neighborhood frequency values were obtained for the stuttered and fluent words from an online database. Findings revealed that stuttered words were lower in frequency and neighborhood frequency than fluent words. Words containing part-word repetitions and sound prolongations were also lower in frequency and/or neighborhood frequency than fluent words, but these frequency variables did not have an effect on single-syllable word repetitions. Neighborhood density failed to influence the susceptibility of words to stuttering, as well as the type of stuttering-like disfluency produced. In general, findings suggest that neighborhood and frequency variables not only influence the fluency with which words are produced in speech, but also have an impact on the type of stuttering-like disfluency produced.
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

PubMed Central

GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.

2012-01-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916

Female voice communications in high levels of aircraft cockpit noises--Part I: spectra, levels, and microphones.

PubMed

Nixon, C W; Morris, L J; McCavitt, A R; McKinley, R L; Anderson, T R; McDaniel, M P; Yeager, D G

1998-07-01

Female produced speech, although more intelligible than male speech in some noise spectra, may be more vulnerable to degradation by high levels of some military aircraft cockpit noises. The acoustic features of female speech are higher in frequency, lower in power, and appear more susceptible than male speech to masking by some of these military noises. Current military aircraft voice communication systems were optimized for the male voice and may not adequately accommodate the female voice in these high level noises. This applied study investigated the intelligibility of female and male speech produced in the noise spectra of four military aircraft cockpits at levels ranging from 95 dB to 115 dB. The experimental subjects used standard flight helmets and headsets, noise-canceling microphones, and military aircraft voice communications systems during the measurements. The intelligibility of female speech was lower than that of male speech for all experimental conditions; however, differences were small and insignificant except at the highest levels of the cockpit noises. Intelligibility for both genders varied with aircraft noise spectrum and level. Speech intelligibility of both genders was acceptable during normal cruise noises of all four aircraft, but improvements are required in the higher levels of noise created during aircraft maximum operating conditions. The intelligibility of female speech was unacceptable at the highest measured noise level of 115 dB and may constitute a problem for other military aviators. The intelligibility degradation due to the noise can be neutralized by use of an available, improved noise-canceling microphone, by the application of current active noise reduction technology to the personal communication equipment, and by the development of a voice communications system to accommodate the speech produced by both female and male aviators.
[Combining speech sample and feature bilateral selection algorithm for classification of Parkinson's disease].

PubMed

Zhang, Xiaoheng; Wang, Lirui; Cao, Yao; Wang, Pin; Zhang, Cheng; Yang, Liuyang; Li, Yongming; Zhang, Yanling; Cheng, Oumei

2018-02-01

Diagnosis of Parkinson's disease (PD) based on speech data has been proved to be an effective way in recent years. However, current researches just care about the feature extraction and classifier design, and do not consider the instance selection. Former research by authors showed that the instance selection can lead to improvement on classification accuracy. However, no attention is paid on the relationship between speech sample and feature until now. Therefore, a new diagnosis algorithm of PD is proposed in this paper by simultaneously selecting speech sample and feature based on relevant feature weighting algorithm and multiple kernel method, so as to find their synergy effects, thereby improving classification accuracy. Experimental results showed that this proposed algorithm obtained apparent improvement on classification accuracy. It can obtain mean classification accuracy of 82.5%, which was 30.5% higher than the relevant algorithm. Besides, the proposed algorithm detected the synergy effects of speech sample and feature, which is valuable for speech marker extraction.
Speech disruptions in relation to language growth in children who stutter: an exploratory study.

PubMed

Wagovich, Stacy A; Hall, Nancy E; Clifford, Betsy A

2009-12-01

Young children with typical fluency demonstrate a range of disfluencies, or speech disruptions. One type of disruption, revision, appears to increase in frequency as syntactic skills develop. To date, this phenomenon has not been studied in children who stutter (CWS). Rispoli, Hadley, and Holt (2008) suggest a schema for categorizing speech disruptions in terms of revisions and stalls. The purpose of this exploratory study was to use this schema to evaluate whether CWS show a pattern over time in their production of stuttering, revisions, and stalls. Nine CWS, ages 2;1 to 4;11, participated in the study, producing language samples each month for 10 months. MLU and vocd analyses were performed for samples across three time periods. Active declarative sentences within these samples were examined for the presence of disruptions. Results indicated that the proportion of sentences containing revisions increased over time, but proportions for stalls and stuttering did not. Visual inspection revealed that more stuttering and stalls occurred on longer utterances than on shorter utterances. Upon examination of individual children's language, it appears two-thirds of the children showed a pattern in which, as MLU increased, revisions increased as well. Findings are similar to studies of children with typical fluency, suggesting that, despite the fact that CWS display more (and different) disfluencies relative to typically fluent peers, revisions appear to increase over time and correspond to increases in MLU, just as is the case with peers. The reader will be able to: (1) describe the three types of speech disruptions assessed in this article; (2) compare present findings of disruptions in children who stutter to findings of previous research with children who are typically fluent; and (3) discuss future directions in this area of research, given the findings and implications of this study.
Segregation of Whispered Speech Interleaved with Noise or Speech Maskers

DTIC Science & Technology

2011-08-01

range over which the talker can be heard. Whispered speech is produced by modulating the flow of air through partially open vocal folds. Because the...source of excitation is turbulent air flow , the acoustic characteristics of whispered speech differs from voiced speech [1, 2]. Despite the acoustic...signals provided by cochlear implants. Two studies investigated the segregation of simultaneously presented whispered vowels [7, 8] in a standard
Preschool speech error patterns predict articulation and phonological awareness outcomes in children with histories of speech sound disorders

PubMed Central

Preston, Jonathan L.; Hull, Margaret; Edwards, Mary Louise

2012-01-01

Purpose To determine if speech error patterns in preschoolers with speech sound disorders (SSDs) predict articulation and phonological awareness (PA) outcomes almost four years later. Method Twenty-five children with histories of preschool SSDs (and normal receptive language) were tested at an average age of 4;6 and followed up at 8;3. The frequency of occurrence of preschool distortion errors, typical substitution and syllable structure errors, and atypical substitution and syllable structure errors were used to predict later speech sound production, PA, and literacy outcomes. Results Group averages revealed below-average school-age articulation scores and low-average PA, but age-appropriate reading and spelling. Preschool speech error patterns were related to school-age outcomes. Children for whom more than 10% of their speech sound errors were atypical had lower PA and literacy scores at school-age than children who produced fewer than 10% atypical errors. Preschoolers who produced more distortion errors were likely to have lower school-age articulation scores. Conclusions Different preschool speech error patterns predict different school-age clinical outcomes. Many atypical speech sound errors in preschool may be indicative of weak phonological representations, leading to long-term PA weaknesses. Preschool distortions may be resistant to change over time, leading to persisting speech sound production problems. PMID:23184137
Comparison of singer's formant, speaker's ring, and LTA spectrum among classical singers and untrained normal speakers.

PubMed

Oliveira Barrichelo, V M; Heuer, R J; Dean, C M; Sataloff, R T

2001-09-01

Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.
Systematic studies of modified vocalization: the effect of speech rate on speech production measures during metronome-paced speech in persons who stutter.

PubMed

Davidow, Jason H

2014-01-01

Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech in order to determine changes that may be important for fluency during this fluency-inducing condition. Thirteen persons who stutter (PWS), aged 18-62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Vowel duration, voice onset time, pressure rise time and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30-100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. © 2013 Royal College of Speech and Language Therapists.
Speech and language development in 2-year-old children with cerebral palsy.

PubMed

Hustad, Katherine C; Allison, Kristen; McFadd, Emily; Riehle, Katherine

2014-06-01

We examined early speech and language development in children who had cerebral palsy. Questions addressed whether children could be classified into early profile groups on the basis of speech and language skills and whether there were differences on selected speech and language measures among groups. Speech and language assessments were completed on 27 children with CP who were between the ages of 24 and 30 months (mean age 27.1 months; SD 1.8). We examined several measures of expressive and receptive language, along with speech intelligibility. Two-step cluster analysis was used to identify homogeneous groups of children based on their performance on the seven dependent variables characterizing speech and language performance. Three groups of children identified were those not yet talking (44% of the sample); those whose talking abilities appeared to be emerging (41% of the sample); and those who were established talkers (15% of the sample). Group differences were evident on all variables except receptive language skills. 85% of 2-year-old children with CP in this study had clinical speech and/or language delays relative to age expectations. Findings suggest that children with CP should receive speech and language assessment and treatment at or before 2 years of age.
Investigation of habitual pitch during free play activities for preschool-aged children.

PubMed

Chen, Yang; Kimelman, Mikael D Z; Micco, Katie

2009-01-01

This study is designed to compare the habitual pitch measured in two different speech activities (free play activity and traditionally used structured speech activity) for normally developing preschool-aged children to explore to what extent preschoolers vary their vocal pitch among different speech environments. Habitual pitch measurements were conducted for 10 normally developing children (2 boys, 8 girls) between the ages of 31 months and 71 months during two different activities: (1) free play; and (2) structured speech. Speech samples were recorded using a throat microphone connected with a wireless transmitter in both activities. The habitual pitch (in Hz) was measured for all collected speech samples by using voice analysis software (Real-Time Pitch). Significantly higher habitual pitch is found during free play in contrast to structured speech activities. In addition, there is no showing of significant difference of habitual pitch elicited across a variety of structured speech activities. Findings suggest that the vocal usage of preschoolers appears to be more effortful during free play than during structured activities. It is recommended that a comprehensive evaluation for young children's voice needs to be based on the speech/voice samples collected from both free play and structured activities.
Language-Specific Developmental Differences in Speech Production: A Cross-Language Acoustic Study

ERIC Educational Resources Information Center

Li, Fangfang

2012-01-01

Speech productions of 40 English- and 40 Japanese-speaking children (aged 2-5) were examined and compared with the speech produced by 20 adult speakers (10 speakers per language). Participants were recorded while repeating words that began with "s" and "sh" sounds. Clear language-specific patterns in adults' speech were found,…
Of Mouths and Men: Non-Native Listeners' Identification and Evaluation of Varieties of English.

ERIC Educational Resources Information Center

Jarvella, Robert J.; Bang, Eva; Jakobsen, Arnt Lykke; Mees, Inger M.

2001-01-01

Advanced Danish students of English tried to identify the national origin of young men from Ireland, Scotland, England, and the United States from their speech and then rated the speech for attractiveness. Listeners rated speech produced by Englishmen as most attractive, and speech by Americans as least attractive. (Author/VWL)
Recognition of Time-Compressed and Natural Speech with Selective Temporal Enhancements by Young and Elderly Listeners

ERIC Educational Resources Information Center

Gordon-Salant, Sandra; Fitzgibbons, Peter J.; Friedman, Sarah A.

2007-01-01

Purpose: The goal of this experiment was to determine whether selective slowing of speech segments improves recognition performance by young and elderly listeners. The hypotheses were (a) the benefits of time expansion occur for rapid speech but not for natural-rate speech, (b) selective time expansion of consonants produces greater score…
Auditory Long Latency Responses to Tonal and Speech Stimuli

ERIC Educational Resources Information Center

Swink, Shannon; Stuart, Andrew

2012-01-01

Purpose: The effects of type of stimuli (i.e., nonspeech vs. speech), speech (i.e., natural vs. synthetic), gender of speaker and listener, speaker (i.e., self vs. other), and frequency alteration in self-produced speech on the late auditory cortical evoked potential were examined. Method: Young adult men (n = 15) and women (n = 15), all with…
Iconic Gestures for Robot Avatars, Recognition and Integration with Speech.

PubMed

Bremner, Paul; Leonards, Ute

2016-01-01

Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realized remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances.
The effect of short-term auditory deprivation on the control of intraoral pressure in pediatric cochlear implant users.

PubMed

Jones, David L; Gao, Sujuan; Svirsky, Mario A

2003-06-01

The purpose of this study was to determine whether 2 speech measures (peak intraoral air pressure [IOP] and IOP duration) obtained during the production of intervocalic stops would be altered as a function of the presence or absence of auditory stimulation provided by a cochlear implant (CI). Five pediatric CI users were required to produce repetitions of the words puppy and baby with their CIs turned on. The CIs were then turned off for 1 hr, at which time the speech sample was repeated with the CI still turned off. Seven children with normal hearing formed a comparison group. They were also tested twice, with a 1-hr intermediate interval. IOP and IOP duration were measured for the medial consonant in both auditory conditions. The results show that auditory condition affected peak IOP more so than IOP duration. Peak IOP was greater for /p/ than /b/ with the CI off, but some participants reduced or reversed this contrast when the CI was on. The findings suggest that different speakers with CIs may use different speech production strategies as they learn to use the auditory signal for speech.
Cortical Tracking of Global and Local Variations of Speech Rhythm during Connected Natural Speech Perception.

PubMed

Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

2018-06-19

During natural speech perception, listeners must track the global speaking rate, that is, the overall rate of incoming linguistic information, as well as transient, local speaking rate variations occurring within the global speaking rate. Here, we address the hypothesis that this tracking mechanism is achieved through coupling of cortical signals to the amplitude envelope of the perceived acoustic speech signals. Cortical signals were recorded with magnetoencephalography (MEG) while participants perceived spontaneously produced speech stimuli at three global speaking rates (slow, normal/habitual, and fast). Inherently to spontaneously produced speech, these stimuli also featured local variations in speaking rate. The coupling between cortical and acoustic speech signals was evaluated using audio-MEG coherence. Modulations in audio-MEG coherence spatially differentiated between tracking of global speaking rate, highlighting the temporal cortex bilaterally and the right parietal cortex, and sensitivity to local speaking rate variations, emphasizing the left parietal cortex. Cortical tuning to the temporal structure of natural connected speech thus seems to require the joint contribution of both auditory and parietal regions. These findings suggest that cortical tuning to speech rhythm operates on two functionally distinct levels: one encoding the global rhythmic structure of speech and the other associated with online, rapidly evolving temporal predictions. Thus, it may be proposed that speech perception is shaped by evolutionary tuning, a preference for certain speaking rates, and predictive tuning, associated with cortical tracking of the constantly changing rate of linguistic information in a speech stream.
Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures During Metronome-Paced Speech in Persons who Stutter

PubMed Central

Davidow, Jason H.

2013-01-01

Background Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. Aims This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech, in order to determine changes that may be important for fluency during this fluency-inducing condition. Methods and Procedures Thirteen persons who stutter (PWS), aged 18–62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Outcomes & Results Vowel duration, voice onset time, pressure rise time, and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30–100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. Conclusions & Implications A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. PMID:24372888
[Speech fluency developmental profile in Brazilian Portuguese speakers].

PubMed

Martins, Vanessa de Oliveira; Andrade, Claudia Regina Furquim de

2008-01-01

speech fluency varies from one individual to the next, fluent or stutterer, depending on several factors. Studies that investigate the influence of age on fluency patterns have been identified; however these differences were investigated in isolated age groups. Studies about life span fluency variations were not found. to verify the speech fluency developmental profile. speech samples of 594 fluent participants of both genders, with ages between 2:0 and 99:11 years, speakers of the Brazilian Portuguese language, were analyzed. Participants were grouped as follows: pre-scholars, scholars, early adolescence, late adolescence, adults and elderlies. Speech samples were analyzed according to the Speech Fluency Profile variables and were compared regarding: typology of speech disruptions (typical and less typical), speech rate (words and syllables per minute) and frequency of speech disruptions (percentage of speech discontinuity). although isolated variations were identified, overall there was no significant difference between the age groups for the speech disruption indexes (typical and less typical speech disruptions and percentage of speech discontinuity). Significant differences were observed between the groups when considering speech rate. the development of the neurolinguistic system for speech fluency, in terms of speech disruptions, seems to stabilize itself during the first years of life, presenting no alterations during the life span. Indexes of speech rate present variations in the age groups, indicating patterns of acquisition, development, stabilization and degeneration.
The McGurk effect in children with autism and Asperger syndrome.

PubMed

Bebko, James M; Schroeder, Jessica H; Weiss, Jonathan A

2014-02-01

Children with autism may have difficulties in audiovisual speech perception, which has been linked to speech perception and language development. However, little has been done to examine children with Asperger syndrome as a group on tasks assessing audiovisual speech perception, despite this group's often greater language skills. Samples of children with autism, Asperger syndrome, and Down syndrome, as well as a typically developing sample, were presented with an auditory-only condition, a speech-reading condition, and an audiovisual condition designed to elicit the McGurk effect. Children with autism demonstrated unimodal performance at the same level as the other groups, yet showed a lower rate of the McGurk effect compared with the Asperger, Down and typical samples. These results suggest that children with autism may have unique intermodal speech perception difficulties linked to their representations of speech sounds. © 2013 International Society for Autism Research, Wiley Periodicals, Inc.
Movement goals and feedback and feedforward control mechanisms in speech production

PubMed Central

Perkell, Joseph S.

2010-01-01

Studies of speech motor control are described that support a theoretical framework in which fundamental control variables for phonemic movements are multi-dimensional regions in auditory and somatosensory spaces. Auditory feedback is used to acquire and maintain auditory goals and in the development and function of feedback and feedforward control mechanisms. Several lines of evidence support the idea that speakers with more acute sensory discrimination acquire more distinct goal regions and therefore produce speech sounds with greater contrast. Feedback modification findings indicate that fluently produced sound sequences are encoded as feedforward commands, and feedback control serves to correct mismatches between expected and produced sensory consequences. PMID:22661828

Movement goals and feedback and feedforward control mechanisms in speech production.

PubMed

Perkell, Joseph S

2012-09-01

Studies of speech motor control are described that support a theoretical framework in which fundamental control variables for phonemic movements are multi-dimensional regions in auditory and somatosensory spaces. Auditory feedback is used to acquire and maintain auditory goals and in the development and function of feedback and feedforward control mechanisms. Several lines of evidence support the idea that speakers with more acute sensory discrimination acquire more distinct goal regions and therefore produce speech sounds with greater contrast. Feedback modification findings indicate that fluently produced sound sequences are encoded as feedforward commands, and feedback control serves to correct mismatches between expected and produced sensory consequences.
The evolution of viscous flow structures in the esophagus during tracheoesophageal speech

NASA Astrophysics Data System (ADS)

Erath, Byron; Hemsing, Frank

2015-11-01

A laryngectomy is an invasive surgical procedure whereby the entire larynx is removed, usually as a result of cancer. Removal of the larynx renders conventional voiced speech impossible, with the most common remediation following surgery being tracheoeosphageal (TE) speech. TE speech is produced by inserting a one-way valve to connect the posterior wall of the trachea with the anterior wall of the esophagus. As air is forced up from the lungs it passes through the prosthesis and into the esophagus. The resulting esophageal pressure field incites self-sustained oscillations of the pharyngoesophageal segment (PES), which ultimately produces sound. Unfortunately, the physics of TE speech are not well understood, with up to 50% of individuals unable to produce intelligible sound. This failure can be related to a lack of understanding regarding the esophageal flow field, where all previous scientific investigations have assumed the flow is one-dimensional and steady. An experimental TE speech flow facility was constructed and particle image velocimetry measurements were acquired at the exit of the model prosthesis (entrance of the esophagus). The flow is observed to be highly unsteady, and the formation and propagation of vortical flow structures through the esophageal tract are identified. Observations regarding the influence of the flow dynamics on the esophageal pressure field and its relation to the successful production of TE speech are discussed.
Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

PubMed Central

Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Savariaux, Christophe; Yvert, Blaise

2016-01-01

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer. PMID:27880768
Speech motor planning and execution deficits in early childhood stuttering.

PubMed

Walsh, Bridget; Mettel, Kathleen Marie; Smith, Anne

2015-01-01

Five to eight percent of preschool children develop stuttering, a speech disorder with clearly observable, hallmark symptoms: sound repetitions, prolongations, and blocks. While the speech motor processes underlying stuttering have been widely documented in adults, few studies to date have assessed the speech motor dynamics of stuttering near its onset. We assessed fundamental characteristics of speech movements in preschool children who stutter and their fluent peers to determine if atypical speech motor characteristics described for adults are early features of the disorder or arise later in the development of chronic stuttering. Orofacial movement data were recorded from 58 children who stutter and 43 children who do not stutter aged 4;0 to 5;11 (years; months) in a sentence production task. For single speech movements and multiple speech movement sequences, we computed displacement amplitude, velocity, and duration. For the phrase level movement sequence, we computed an index of articulation coordination consistency for repeated productions of the sentence. Boys who stutter, but not girls, produced speech with reduced amplitudes and velocities of articulatory movement. All children produced speech with similar durations. Boys, particularly the boys who stuttered, had more variable patterns of articulatory coordination compared to girls. This study is the first to demonstrate sex-specific differences in speech motor control processes between preschool boys and girls who are stuttering. The sex-specific lag in speech motor development in many boys who stutter likely has significant implications for the dramatically different recovery rates between male and female preschoolers who stutter. Further, our findings document that atypical speech motor development is an early feature of stuttering.
Using others' words: conversational use of reported speech by individuals with aphasia and their communication partners.

PubMed

Hengst, Julie A; Frame, Simone R; Neuman-Stritzel, Tiffany; Gannaway, Rachel

2005-02-01

Reported speech, wherein one quotes or paraphrases the speech of another, has been studied extensively as a set of linguistic and discourse practices. Researchers agree that reported speech is pervasive, found across languages, and used in diverse contexts. However, to date, there have been no studies of the use of reported speech among individuals with aphasia. Grounded in an interactional sociolinguistic perspective, the study presented here documents and analyzes the use of reported speech by 7 adults with mild to moderately severe aphasia and their routine communication partners. Each of the 7 pairs was videotaped in 4 everyday activities at home or around the community, yielding over 27 hr of conversational interaction for analysis. A coding scheme was developed that identified 5 types of explicitly marked reported speech: direct, indirect, projected, indexed, and undecided. Analysis of the data documented reported speech as a common discourse practice used successfully by the individuals with aphasia and their communication partners. All participants produced reported speech at least once, and across all observations the target pairs produced 400 reported speech episodes (RSEs), 149 by individuals with aphasia and 251 by their communication partners. For all participants, direct and indirect forms were the most prevalent (70% of RSEs). Situated discourse analysis of specific episodes of reported speech used by 3 of the pairs provides detailed portraits of the diverse interactional, referential, social, and discourse functions of reported speech and explores ways that the pairs used reported speech to successfully frame talk despite their ongoing management of aphasia.
Language Sampling for Preschoolers With Severe Speech Impairments

PubMed Central

Ragsdale, Jamie; Bustos, Aimee

2016-01-01

Purpose The purposes of this investigation were to determine if measures such as mean length of utterance (MLU) and percentage of comprehensible words can be derived reliably from language samples of children with severe speech impairments and if such measures correlate with tools that measure constructs assumed to be related. Method Language samples of 15 preschoolers with severe speech impairments (but receptive language within normal limits) were transcribed independently by 2 transcribers. Nonparametric statistics were used to determine which measures, if any, could be transcribed reliably and to determine if correlations existed between language sample measures and standardized measures of speech, language, and cognition. Results Reliable measures were extracted from the majority of the language samples, including MLU in words, mean number of syllables per utterance, and percentage of comprehensible words. Language sample comprehensibility measures were correlated with a single word comprehensibility task. Also, language sample MLUs and mean length of the participants' 3 longest sentences from the MacArthur–Bates Communicative Development Inventory (Fenson et al., 2006) were correlated. Conclusion Language sampling, given certain modifications, may be used for some 3-to 5-year-old children with normal receptive language who have severe speech impairments to provide reliable expressive language and comprehensibility information. PMID:27552110
Language Sampling for Preschoolers With Severe Speech Impairments.

PubMed

Binger, Cathy; Ragsdale, Jamie; Bustos, Aimee

2016-11-01

The purposes of this investigation were to determine if measures such as mean length of utterance (MLU) and percentage of comprehensible words can be derived reliably from language samples of children with severe speech impairments and if such measures correlate with tools that measure constructs assumed to be related. Language samples of 15 preschoolers with severe speech impairments (but receptive language within normal limits) were transcribed independently by 2 transcribers. Nonparametric statistics were used to determine which measures, if any, could be transcribed reliably and to determine if correlations existed between language sample measures and standardized measures of speech, language, and cognition. Reliable measures were extracted from the majority of the language samples, including MLU in words, mean number of syllables per utterance, and percentage of comprehensible words. Language sample comprehensibility measures were correlated with a single word comprehensibility task. Also, language sample MLUs and mean length of the participants' 3 longest sentences from the MacArthur-Bates Communicative Development Inventory (Fenson et al., 2006) were correlated. Language sampling, given certain modifications, may be used for some 3-to 5-year-old children with normal receptive language who have severe speech impairments to provide reliable expressive language and comprehensibility information.
Sounds Exaggerate Visual Shape

ERIC Educational Resources Information Center

Sweeny, Timothy D.; Guzman-Martinez, Emmanuel; Ortega, Laura; Grabowecky, Marcia; Suzuki, Satoru

2012-01-01

While perceiving speech, people see mouth shapes that are systematically associated with sounds. In particular, a vertically stretched mouth produces a /woo/ sound, whereas a horizontally stretched mouth produces a /wee/ sound. We demonstrate that hearing these speech sounds alters how we see aspect ratio, a basic visual feature that contributes…
Speech Characteristics Associated with Three Genotypes of Ataxia

ERIC Educational Resources Information Center

Sidtis, John J.; Ahn, Ji Sook; Gomez, Christopher; Sidtis, Diana

2011-01-01

Purpose: Advances in neurobiology are providing new opportunities to investigate the neurological systems underlying motor speech control. This study explores the perceptual characteristics of the speech of three genotypes of spino-cerebellar ataxia (SCA) as manifest in four different speech tasks. Methods: Speech samples from 26 speakers with SCA…
Automated Speech Rate Measurement in Dysarthria

ERIC Educational Resources Information Center

Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

2015-01-01

Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…
Effect of Parkinson's disease on the production of structured and unstructured speaking tasks: Respiratory physiologic and linguistic considerations

PubMed Central

Huber, Jessica E.; Darling, Meghan

2012-01-01

Purpose The purpose of the present study was to examine the effects of cognitive-linguistic deficits and respiratory physiologic changes on respiratory support for speech in PD, using two speech tasks, reading and extemporaneous speech. Methods Five women with PD, 9 men with PD, and 14 age- and sex-matched control participants read a passage and spoke extemporaneously on a topic of their choice at comfortable loudness. Sound pressure level, syllables per breath group, speech rate, and lung volume parameters were measured. Number of formulation errors, disfluencies, and filled pauses were counted. Results Individuals with PD produced shorter utterances as compared to control participants. The relationships between utterance length and lung volume initiation and inspiratory duration were weaker in individuals with PD than for control participants, particularly for the extemporaneous speech task. These results suggest less consistent planning for utterance length by individuals with PD in extemporaneous speech. Individuals with PD produced more formulation errors in both tasks and significantly fewer filled pauses in extemporaneous speech. Conclusions Both respiratory physiologic and cognitive-linguistic issues affected speech production by individuals with PD. Overall, individuals with PD had difficulty planning or coordinating language formulation and respiratory support, in particular during extemporaneous speech. PMID:20844256
Three speech sounds, one motor action: evidence for speech-motor disparity from English flap production.

PubMed

Derrick, Donald; Stavness, Ian; Gick, Bryan

2015-03-01

The assumption that units of speech production bear a one-to-one relationship to speech motor actions pervades otherwise widely varying theories of speech motor behavior. This speech production and simulation study demonstrates that commonly occurring flap sequences may violate this assumption. In the word "Saturday," a sequence of three sounds may be produced using a single, cyclic motor action. Under this view, the initial upward tongue tip motion, starting with the first vowel and moving to contact the hard palate on the way to a retroflex position, is under active muscular control, while the downward movement of the tongue tip, including the second contact with the hard palate, results from gravity and elasticity during tongue muscle relaxation. This sequence is reproduced using a three-dimensional computer simulation of human vocal tract biomechanics and differs greatly from other observed sequences for the same word, which employ multiple targeted speech motor actions. This outcome suggests that a goal of a speaker is to produce an entire sequence in a biomechanically efficient way at the expense of maintaining parity within the individual parts of the sequence.
Nouns slow down speech across structurally and culturally diverse languages

PubMed Central

Danielsen, Swintha; Hartmann, Iren; Pakendorf, Brigitte; Witzlack-Makarevich, Alena; de Jong, Nivja H.

2018-01-01

By force of nature, every bit of spoken language is produced at a particular speed. However, this speed is not constant—speakers regularly speed up and slow down. Variation in speech rate is influenced by a complex combination of factors, including the frequency and predictability of words, their information status, and their position within an utterance. Here, we use speech rate as an index of word-planning effort and focus on the time window during which speakers prepare the production of words from the two major lexical classes, nouns and verbs. We show that, when naturalistic speech is sampled from languages all over the world, there is a robust cross-linguistic tendency for slower speech before nouns compared with verbs, both in terms of slower articulation and more pauses. We attribute this slowdown effect to the increased amount of planning that nouns require compared with verbs. Unlike verbs, nouns can typically only be used when they represent new or unexpected information; otherwise, they have to be replaced by pronouns or be omitted. These conditions on noun use appear to outweigh potential advantages stemming from differences in internal complexity between nouns and verbs. Our findings suggest that, beneath the staggering diversity of grammatical structures and cultural settings, there are robust universals of language processing that are intimately tied to how speakers manage referential information when they communicate with one another. PMID:29760059
Neural evidence for predictive coding in auditory cortex during speech production.

PubMed

Okada, Kayoko; Matchin, William; Hickok, Gregory

2018-02-01

Recent models of speech production suggest that motor commands generate forward predictions of the auditory consequences of those commands, that these forward predications can be used to monitor and correct speech output, and that this system is hierarchically organized (Hickok, Houde, & Rong, Neuron, 69(3), 407--422, 2011; Pickering & Garrod, Behavior and Brain Sciences, 36(4), 329--347, 2013). Recent psycholinguistic research has shown that internally generated speech (i.e., imagined speech) produces different types of errors than does overt speech (Oppenheim & Dell, Cognition, 106(1), 528--537, 2008; Oppenheim & Dell, Memory & Cognition, 38(8), 1147-1160, 2010). These studies suggest that articulated speech might involve predictive coding at additional levels than imagined speech. The current fMRI experiment investigates neural evidence of predictive coding in speech production. Twenty-four participants from UC Irvine were recruited for the study. Participants were scanned while they were visually presented with a sequence of words that they reproduced in sync with a visual metronome. On each trial, they were cued to either silently articulate the sequence or to imagine the sequence without overt articulation. As expected, silent articulation and imagined speech both engaged a left hemisphere network previously implicated in speech production. A contrast of silent articulation with imagined speech revealed greater activation for articulated speech in inferior frontal cortex, premotor cortex and the insula in the left hemisphere, consistent with greater articulatory load. Although both conditions were silent, this contrast also produced significantly greater activation in auditory cortex in dorsal superior temporal gyrus in both hemispheres. We suggest that these activations reflect forward predictions arising from additional levels of the perceptual/motor hierarchy that are involved in monitoring the intended speech output.
Auditory cortex activation to natural speech and simulated cochlear implant speech measured with functional near-infrared spectroscopy.

PubMed

Pollonini, Luca; Olds, Cristen; Abaya, Homer; Bortfeld, Heather; Beauchamp, Michael S; Oghalai, John S

2014-03-01

The primary goal of most cochlear implant procedures is to improve a patient's ability to discriminate speech. To accomplish this, cochlear implants are programmed so as to maximize speech understanding. However, programming a cochlear implant can be an iterative, labor-intensive process that takes place over months. In this study, we sought to determine whether functional near-infrared spectroscopy (fNIRS), a non-invasive neuroimaging method which is safe to use repeatedly and for extended periods of time, can provide an objective measure of whether a subject is hearing normal speech or distorted speech. We used a 140 channel fNIRS system to measure activation within the auditory cortex in 19 normal hearing subjects while they listed to speech with different levels of intelligibility. Custom software was developed to analyze the data and compute topographic maps from the measured changes in oxyhemoglobin and deoxyhemoglobin concentration. Normal speech reliably evoked the strongest responses within the auditory cortex. Distorted speech produced less region-specific cortical activation. Environmental sounds were used as a control, and they produced the least cortical activation. These data collected using fNIRS are consistent with the fMRI literature and thus demonstrate the feasibility of using this technique to objectively detect differences in cortical responses to speech of different intelligibility. Copyright © 2013 Elsevier B.V. All rights reserved.
Speech and Language Development in 2 Year Old Children with Cerebral Palsy

PubMed Central

Hustad, Katherine C.; Allison, Kristen; McFadd, Emily; Riehle, Katherine

2013-01-01

Objective We examined early speech and language development in children who had cerebral palsy. Questions addressed whether children could be classified into early profile groups on the basis of speech and language skills and whether there were differences on selected speech and language measures among groups. Methods Speech and language assessments were completed on 27 children with CP who were between the ages of 24-30 months (mean age 27.1 months; SD 1.8). We examined several measures of expressive and receptive language, along with speech intelligibility. Results 2-step cluster analysis was used to identify homogeneous groups of children based on their performance on the 7 dependent variables characterizing speech and language performance. Three groups of children identified were those not yet talking (44% of the sample); those whose talking abilities appeared to be emerging (41% of the sample); and those who were established talkers (15% of the sample). Group differences were evident on all variables except receptive language skills. Conclusion 85% of 2 year old children with CP in this study had clinical speech and /or language delays relative to age expectations. Findings suggest that children with CP should receive speech and language assessment and treatment to identify and treat those with delays at or before 2 years of age. PMID:23627373
Speech Analysis of Bengali Speaking Children with Repaired Cleft Lip & Palate

ERIC Educational Resources Information Center

Chakrabarty, Madhushree; Kumar, Suman; Chatterjee, Indranil; Maheshwari, Neha

2012-01-01

The present study aims at analyzing speech samples of four Bengali speaking children with repaired cleft palates with a view to differentiate between the misarticulations arising out of a deficit in linguistic skills and structural or motoric limitations. Spontaneous speech samples were collected and subjected to a number of linguistic analyses…
Applications of Text Analysis Tools for Spoken Response Grading

ERIC Educational Resources Information Center

Crossley, Scott; McNamara, Danielle

2013-01-01

This study explores the potential for automated indices related to speech delivery, language use, and topic development to model human judgments of TOEFL speaking proficiency in second language (L2) speech samples. For this study, 244 transcribed TOEFL speech samples taken from 244 L2 learners were analyzed using automated indices taken from…
The importance of laughing in your face: influences of visual laughter on auditory laughter perception.

PubMed

Jordan, Timothy R; Abedipour, Lily

2010-01-01

Hearing the sound of laughter is important for social communication, but processes contributing to the audibility of laughter remain to be determined. Production of laughter resembles production of speech in that both involve visible facial movements accompanying socially significant auditory signals. However, while it is known that speech is more audible when the facial movements producing the speech sound can be seen, similar visual enhancement of the audibility of laughter remains unknown. To address this issue, spontaneously occurring laughter was edited to produce stimuli comprising visual laughter, auditory laughter, visual and auditory laughter combined, and no laughter at all (either visual or auditory), all presented in four levels of background noise. Visual laughter and no-laughter stimuli produced very few reports of auditory laughter. However, visual laughter consistently made auditory laughter more audible, compared to the same auditory signal presented without visual laughter, resembling findings reported previously for speech.
Discrimination of speech and non-speech sounds following theta-burst stimulation of the motor cortex

PubMed Central

Rogers, Jack C.; Möttönen, Riikka; Boyles, Rowan; Watkins, Kate E.

2014-01-01

Perceiving speech engages parts of the motor system involved in speech production. The role of the motor cortex in speech perception has been demonstrated using low-frequency repetitive transcranial magnetic stimulation (rTMS) to suppress motor excitability in the lip representation and disrupt discrimination of lip-articulated speech sounds (Möttönen and Watkins, 2009). Another form of rTMS, continuous theta-burst stimulation (cTBS), can produce longer-lasting disruptive effects following a brief train of stimulation. We investigated the effects of cTBS on motor excitability and discrimination of speech and non-speech sounds. cTBS was applied for 40 s over either the hand or the lip representation of motor cortex. Motor-evoked potentials recorded from the lip and hand muscles in response to single pulses of TMS revealed no measurable change in motor excitability due to cTBS. This failure to replicate previous findings may reflect the unreliability of measurements of motor excitability related to inter-individual variability. We also measured the effects of cTBS on a listener’s ability to discriminate: (1) lip-articulated speech sounds from sounds not articulated by the lips (“ba” vs. “da”); (2) two speech sounds not articulated by the lips (“ga” vs. “da”); and (3) non-speech sounds produced by the hands (“claps” vs. “clicks”). Discrimination of lip-articulated speech sounds was impaired between 20 and 35 min after cTBS over the lip motor representation. Specifically, discrimination of across-category ba–da sounds presented with an 800-ms inter-stimulus interval was reduced to chance level performance. This effect was absent for speech sounds that do not require the lips for articulation and non-speech sounds. Stimulation over the hand motor representation did not affect discrimination of speech or non-speech sounds. These findings show that stimulation of the lip motor representation disrupts discrimination of speech sounds in an articulatory feature-specific way. PMID:25076928

Discrimination of speech and non-speech sounds following theta-burst stimulation of the motor cortex.

PubMed

Rogers, Jack C; Möttönen, Riikka; Boyles, Rowan; Watkins, Kate E

2014-01-01

Perceiving speech engages parts of the motor system involved in speech production. The role of the motor cortex in speech perception has been demonstrated using low-frequency repetitive transcranial magnetic stimulation (rTMS) to suppress motor excitability in the lip representation and disrupt discrimination of lip-articulated speech sounds (Möttönen and Watkins, 2009). Another form of rTMS, continuous theta-burst stimulation (cTBS), can produce longer-lasting disruptive effects following a brief train of stimulation. We investigated the effects of cTBS on motor excitability and discrimination of speech and non-speech sounds. cTBS was applied for 40 s over either the hand or the lip representation of motor cortex. Motor-evoked potentials recorded from the lip and hand muscles in response to single pulses of TMS revealed no measurable change in motor excitability due to cTBS. This failure to replicate previous findings may reflect the unreliability of measurements of motor excitability related to inter-individual variability. We also measured the effects of cTBS on a listener's ability to discriminate: (1) lip-articulated speech sounds from sounds not articulated by the lips ("ba" vs. "da"); (2) two speech sounds not articulated by the lips ("ga" vs. "da"); and (3) non-speech sounds produced by the hands ("claps" vs. "clicks"). Discrimination of lip-articulated speech sounds was impaired between 20 and 35 min after cTBS over the lip motor representation. Specifically, discrimination of across-category ba-da sounds presented with an 800-ms inter-stimulus interval was reduced to chance level performance. This effect was absent for speech sounds that do not require the lips for articulation and non-speech sounds. Stimulation over the hand motor representation did not affect discrimination of speech or non-speech sounds. These findings show that stimulation of the lip motor representation disrupts discrimination of speech sounds in an articulatory feature-specific way.
Sensorimotor influences on speech perception in infancy.

PubMed

Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F

2015-11-03

The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.
Facial expressions and the evolution of the speech rhythm.

PubMed

Ghazanfar, Asif A; Takahashi, Daniel Y

2014-06-01

In primates, different vocalizations are produced, at least in part, by making different facial expressions. Not surprisingly, humans, apes, and monkeys all recognize the correspondence between vocalizations and the facial postures associated with them. However, one major dissimilarity between monkey vocalizations and human speech is that, in the latter, the acoustic output and associated movements of the mouth are both rhythmic (in the 3- to 8-Hz range) and tightly correlated, whereas monkey vocalizations have a similar acoustic rhythmicity but lack the concommitant rhythmic facial motion. This raises the question of how we evolved from a presumptive ancestral acoustic-only vocal rhythm to the one that is audiovisual with improved perceptual sensitivity. According to one hypothesis, this bisensory speech rhythm evolved through the rhythmic facial expressions of ancestral primates. If this hypothesis has any validity, we expect that the extant nonhuman primates produce at least some facial expressions with a speech-like rhythm in the 3- to 8-Hz frequency range. Lip smacking, an affiliative signal observed in many genera of primates, satisfies this criterion. We review a series of studies using developmental, x-ray cineradiographic, EMG, and perceptual approaches with macaque monkeys producing lip smacks to further investigate this hypothesis. We then explore its putative neural basis and remark on important differences between lip smacking and speech production. Overall, the data support the hypothesis that lip smacking may have been an ancestral expression that was linked to vocal output to produce the original rhythmic audiovisual speech-like utterances in the human lineage.
Speech versus non-speech as irrelevant sound: controlling acoustic variation.

PubMed

Little, Jason S; Martin, Frances Heritage; Thomson, Richard H S

2010-09-01

Functional differences between speech and non-speech within the irrelevant sound effect were investigated using repeated and changing formats of irrelevant sounds in the form of intelligible words and unintelligible signal correlated noise (SCN) versions of the words. Event-related potentials were recorded from 25 females aged between 18 and 25 while they completed a serial order recall task in the presence of irrelevant sound or silence. As expected and in line with the changing-state hypothesis both words and SCN produced robust changing-state effects. However, words produced a greater changing-state effect than SCN indicating that the spectral detail inherent within speech accounts for the greater irrelevant sound effect and changing-state effect typically observed with speech. ERP data in the form of N1 amplitude was modulated within some irrelevant sound conditions suggesting that attentional aspects are involved in the elicitation of the irrelevant sound effect. Copyright (c) 2010 Elsevier B.V. All rights reserved.
Autonomic Nervous System Responses During Perception of Masked Speech may Reflect Constructs other than Subjective Listening Effort

PubMed Central

Francis, Alexander L.; MacPherson, Megan K.; Chandrasekaran, Bharath; Alvar, Ann M.

2016-01-01

Typically, understanding speech seems effortless and automatic. However, a variety of factors may, independently or interactively, make listening more effortful. Physiological measures may help to distinguish between the application of different cognitive mechanisms whose operation is perceived as effortful. In the present study, physiological and behavioral measures associated with task demand were collected along with behavioral measures of performance while participants listened to and repeated sentences. The goal was to measure psychophysiological reactivity associated with three degraded listening conditions, each of which differed in terms of the source of the difficulty (distortion, energetic masking, and informational masking), and therefore were expected to engage different cognitive mechanisms. These conditions were chosen to be matched for overall performance (keywords correct), and were compared to listening to unmasked speech produced by a natural voice. The three degraded conditions were: (1) Unmasked speech produced by a computer speech synthesizer, (2) Speech produced by a natural voice and masked byspeech-shaped noise and (3) Speech produced by a natural voice and masked by two-talker babble. Masked conditions were both presented at a -8 dB signal to noise ratio (SNR), a level shown in previous research to result in comparable levels of performance for these stimuli and maskers. Performance was measured in terms of proportion of key words identified correctly, and task demand or effort was quantified subjectively by self-report. Measures of psychophysiological reactivity included electrodermal (skin conductance) response frequency and amplitude, blood pulse amplitude and pulse rate. Results suggest that the two masked conditions evoked stronger psychophysiological reactivity than did the two unmasked conditions even when behavioral measures of listening performance and listeners’ subjective perception of task demand were comparable across the three degraded conditions. PMID:26973564
Exploring the role of hand gestures in learning novel phoneme contrasts and vocabulary in a second language

PubMed Central

Kelly, Spencer D.; Hirata, Yukari; Manansala, Michael; Huang, Jessica

2014-01-01

Co-speech hand gestures are a type of multimodal input that has received relatively little attention in the context of second language learning. The present study explored the role that observing and producing different types of gestures plays in learning novel speech sounds and word meanings in an L2. Naïve English-speakers were taught two components of Japanese—novel phonemic vowel length contrasts and vocabulary items comprised of those contrasts—in one of four different gesture conditions: Syllable Observe, Syllable Produce, Mora Observe, and Mora Produce. Half of the gestures conveyed intuitive information about syllable structure, and the other half, unintuitive information about Japanese mora structure. Within each Syllable and Mora condition, half of the participants only observed the gestures that accompanied speech during training, and the other half also produced the gestures that they observed along with the speech. The main finding was that participants across all four conditions had similar outcomes in two different types of auditory identification tasks and a vocabulary test. The results suggest that hand gestures may not be well suited for learning novel phonetic distinctions at the syllable level within a word, and thus, gesture-speech integration may break down at the lowest levels of language processing and learning. PMID:25071646
Automated Speech Rate Measurement in Dysarthria.

PubMed

Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

2015-06-01

In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. The new algorithm was trained and tested using Dutch speech samples of 36 speakers with no history of speech impairment and 40 speakers with mild to moderate dysarthria. We tested the algorithm under various conditions: according to speech task type (sentence reading, passage reading, and storytelling) and algorithm optimization method (speaker group optimization and individual speaker optimization). Correlations between automated and human SR determination were calculated for each condition. High correlations between automated and human SR determination were found in the various testing conditions. The new algorithm measures SR in a sufficiently reliable manner. It is currently being integrated in a clinical software tool for assessing and managing prosody in dysarthric speech. Further research is needed to fine-tune the algorithm to severely dysarthric speech, to make the algorithm less sensitive to background noise, and to evaluate how the algorithm deals with syllabic consonants.
Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

ERIC Educational Resources Information Center

Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

2016-01-01

Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…
Comparisons of Stuttering Frequency during and after Speech Initiation in Unaltered Feedback, Altered Auditory Feedback and Choral Speech Conditions

ERIC Educational Resources Information Center

Saltuklaroglu, Tim; Kalinowski, Joseph; Robbins, Mary; Crawcour, Stephen; Bowers, Andrew

2009-01-01

Background: Stuttering is prone to strike during speech initiation more so than at any other point in an utterance. The use of auditory feedback (AAF) has been found to produce robust decreases in the stuttering frequency by creating an electronic rendition of choral speech (i.e., speaking in unison). However, AAF requires users to self-initiate…
Is Presurgery and Early Postsurgery Performance Related to Speech and Language Outcomes at 3 Years of Age for Children with Cleft Palate?

ERIC Educational Resources Information Center

Chapman, Kathy L.

2004-01-01

This study examined the relationship between presurgery speech measures and speech and language performance at 39 months as well as the relationship between early postsurgery speech measures and speech and language performance at 39 months of age. Fifteen children with cleft lip and palate participated in the study. Spontaneous speech samples were…
Speech Appliances in the Treatment of Phonological Disorders.

ERIC Educational Resources Information Center

Ruscello, Dennis M.

1995-01-01

This article addresses the rationale for and issues related to the use of speech appliances, especially a removable speech appliance that positions the tongue to produce the correct /r/ phoneme. Research results suggest that this appliance was successful with a large group of clients. (Author/DB)
Cognitive-Perceptual Examination of Remediation Approaches to Hypokinetic Dysarthria

ERIC Educational Resources Information Center

McAuliffe, Megan J.; Kerr, Sarah E.; Gibson, Elizabeth M. R.; Anderson, Tim; LaShell, Patrick J.

2014-01-01

Purpose: To determine how increased vocal loudness and reduced speech rate affect listeners' cognitive-perceptual processing of hypokinetic dysarthric speech associated with Parkinson's disease. Method: Fifty-one healthy listener participants completed a speech perception experiment. Listeners repeated phrases produced by 5 individuals…
Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

PubMed Central

Bremner, Paul; Leonards, Ute

2016-01-01

Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realized remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances. PMID:26925010
Measurement of trained speech patterns in stuttering: interjudge and intrajudge agreement of experts by means of modified time-interval analysis.

PubMed

Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus

2010-09-01

Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent speech, and stuttered speech. Seventeen German experts on stuttering judged a speech sample on two occasions. Speakers of the sample were stuttering adults, who were not undergoing therapy, as well as participants in a fluency shaping and a stuttering modification therapy. Results showed satisfactory inter-judge and intra-judge agreement above 80%. Intervals with trained speech patterns were identified as consistently as stuttered and fluent intervals. We discuss limitations of the study, as well as implications of our findings for the development of training for identification of trained speech patterns and future outcome studies. The reader will be able to (a) explain different methods to measure the use of trained speech patterns, (b) evaluate whether German experts are able to discriminate intervals with trained speech patterns reliably from fluent and stuttered intervals and (c) describe how the measurement of trained speech patterns can contribute to outcome studies.
Phonological Neighborhood and Word Frequency Effects in the Stuttered Disfluencies of Children Who Stutter

PubMed Central

Anderson, Julie D.

2008-01-01

Purpose The purpose of this study was to examine (a) the role of neighborhood density (number of words that are phonologically similar to a target word) and frequency variables on the stuttering-like disfluencies of preschool children who stutter, and (b) whether these variables have an effect on the type of stuttering-like disfluency produced. Method A 500+ word speech sample was obtained from each participant (N = 15). Each stuttered word was randomly paired with the firstly produced word that closely matched it in grammatical class, familiarity, and number of syllables/phonemes. Frequency, neighborhood density, and neighborhood frequency values were obtained for the stuttered and fluent words from an online database. Results Findings revealed that stuttered words were lower in frequency and neighborhood frequency than fluent words. Words containing part-word repetitions and sound prolongations were also lower in frequency and/or neighborhood frequency than fluent words, but these frequency variables did not have an effect on single-syllable word repetitions. Neighborhood density failed to influence the susceptibility of words to stuttering, as well as the type of stuttering-like disfluency produced. Conclusions In general, findings suggest that neighborhood and frequency variables not only influence the fluency with which words are produced in speech, but also have an impact on the type of stuttering-like disfluency produced. PMID:17344561
Neurophysiological evidence of efference copies to inner speech

PubMed Central

Jack, Bradley N; Pearson, Daniel; Griffiths, Oren; Luque, David; Harris, Anthony WF; Spencer, Kevin M; Le Pelley, Mike E

2017-01-01

Efference copies refer to internal duplicates of movement-producing neural signals. Their primary function is to predict, and often suppress, the sensory consequences of willed movements. Efference copies have been almost exclusively investigated in the context of overt movements. The current electrophysiological study employed a novel design to show that inner speech – the silent production of words in one’s mind – is also associated with an efference copy. Participants produced an inner phoneme at a precisely specified time, at which an audible phoneme was concurrently presented. The production of the inner phoneme resulted in electrophysiological suppression, but only if the content of the inner phoneme matched the content of the audible phoneme. These results demonstrate that inner speech – a purely mental action – is associated with an efference copy with detailed auditory properties. These findings suggest that inner speech may ultimately reflect a special type of overt speech. PMID:29199947
Transitioning from analog to digital audio recording in childhood speech sound disorders.

PubMed

Shriberg, Lawrence D; McSweeny, Jane L; Anderson, Bruce E; Campbell, Thomas F; Chial, Michael R; Green, Jordan R; Hauner, Katherina K; Moore, Christopher A; Rusiewicz, Heather L; Wilson, David L

2005-06-01

Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants' speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise.
Transitioning from analog to digital audio recording in childhood speech sound disorders

PubMed Central

Shriberg, Lawrence D.; McSweeny, Jane L.; Anderson, Bruce E.; Campbell, Thomas F.; Chial, Michael R.; Green, Jordan R.; Hauner, Katherina K.; Moore, Christopher A.; Rusiewicz, Heather L.; Wilson, David L.

2014-01-01

Few empirical findings or technical guidelines are available on the current transition from analog to digital audio recording in childhood speech sound disorders. Of particular concern in the present context was whether a transition from analog- to digital-based transcription and coding of prosody and voice features might require re-standardizing a reference database for research in childhood speech sound disorders. Two research transcribers with different levels of experience glossed, transcribed, and prosody-voice coded conversational speech samples from eight children with mild to severe speech disorders of unknown origin. The samples were recorded, stored, and played back using representative analog and digital audio systems. Effect sizes calculated for an array of analog versus digital comparisons ranged from negligible to medium, with a trend for participants’ speech competency scores to be slightly lower for samples obtained and transcribed using the digital system. We discuss the implications of these and other findings for research and clinical practise. PMID:16019779
The Basic Course in Communication: A Performance Triad.

ERIC Educational Resources Information Center

Smith, V. A.

The key element to the survival of speech communication and its status in academe is the basic course, which tells the academic community what speech communication is and what it can produce in terms of observable student behavior. This basic course, upon which many communication departments depend, must produce students who are obviously trained…
Speaker Identity Supports Phonetic Category Learning

ERIC Educational Resources Information Center

Mani, Nivedita; Schneider, Signe

2013-01-01

Visual cues from the speaker's face, such as the discriminable mouth movements used to produce speech sounds, improve discrimination of these sounds by adults. The speaker's face, however, provides more information than just the mouth movements used to produce speech--it also provides a visual indexical cue of the identity of the speaker. The…

Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis

ERIC Educational Resources Information Center

Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.

2009-01-01

Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
Speech-Language Pathologists' Assessment Practices for Children with Suspected Speech Sound Disorders: Results of a National Survey

ERIC Educational Resources Information Center

Skahan, Sarah M.; Watson, Maggie; Lof, Gregory L.

2007-01-01

Purpose: This study examined assessment procedures used by speech-language pathologists (SLPs) when assessing children suspected of having speech sound disorders (SSD). This national survey also determined the information participants obtained from clients' speech samples, evaluation of non-native English speakers, and time spent on assessment.…
Attitudes toward Speech Disorders: Sampling the Views of Cantonese-Speaking Americans.

ERIC Educational Resources Information Center

Bebout, Linda; Arthur, Bradford

1997-01-01

A study of 60 Chinese Americans and 46 controls found the Chinese Americans were more likely to believe persons with speech disorders could improve speech by "trying hard," to view people using deaf speech and people with cleft palates as perhaps being emotionally disturbed, and to regard deaf speech as a limitation. (Author/CR)
Clear Speech Variants: An Acoustic Study in Parkinson's Disease

ERIC Educational Resources Information Center

Lam, Jennifer; Tjaden, Kris

2016-01-01

Purpose: The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. Method: A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different…
Predicting Intelligibility Gains in Individuals with Dysarthria from Baseline Speech Features

ERIC Educational Resources Information Center

Fletcher, Annalise R.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Sinex, Donal G.; Liss, Julie M.

2017-01-01

Purpose: Across the treatment literature, behavioral speech modifications have produced variable intelligibility changes in speakers with dysarthria. This study is the first of two articles exploring whether measurements of baseline speech features can predict speakers' responses to these modifications. Method: Fifty speakers (7 older individuals…
Inducing Speech Errors in Dysarthria Using Tongue Twisters

ERIC Educational Resources Information Center

Kember, Heather; Connaghan, Kathryn; Patel, Rupal

2017-01-01

Although tongue twisters have been widely use to study speech production in healthy speakers, few studies have employed this methodology for individuals with speech impairment. The present study compared tongue twister errors produced by adults with dysarthria and age-matched healthy controls. Eight speakers (four female, four male; mean age =…
English Intonation and Computerized Speech Synthesis. Technical Report No. 287.

ERIC Educational Resources Information Center

Levine, Arvin

This work treats some of the important issues encountered in an attempt to synthesize natural sounding English speech from arbitrary written text. Details of the systems that interact in producing speech are described. The principal systems dealt with are phonology (intonation), phonetics, syntax, semantics, and text-view (discourse). Technical…
Specific acoustic models for spontaneous and dictated style in indonesian speech recognition

NASA Astrophysics Data System (ADS)

Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.

2018-03-01

The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.
Child implant users' imitation of happy- and sad-sounding speech

PubMed Central

Wang, David J.; Trehub, Sandra E.; Volkova, Anna; van Lieshout, Pascal

2013-01-01

Cochlear implants have enabled many congenitally or prelingually deaf children to acquire their native language and communicate successfully on the basis of electrical rather than acoustic input. Nevertheless, degraded spectral input provided by the device reduces the ability to perceive emotion in speech. We compared the vocal imitations of 5- to 7-year-old deaf children who were highly successful bilateral implant users with those of a control sample of children who had normal hearing. First, the children imitated several happy and sad sentences produced by a child model. When adults in Experiment 1 rated the similarity of imitated to model utterances, ratings were significantly higher for the hearing children. Both hearing and deaf children produced poorer imitations of happy than sad utterances because of difficulty matching the greater pitch modulation of the happy versions. When adults in Experiment 2 rated electronically filtered versions of the utterances, which obscured the verbal content, ratings of happy and sad utterances were significantly differentiated for deaf as well as hearing children. The ratings of deaf children, however, were significantly less differentiated. Although deaf children's utterances exhibited culturally typical pitch modulation, their pitch modulation was reduced relative to that of hearing children. One practical implication is that therapeutic interventions for deaf children could expand their focus on suprasegmental aspects of speech perception and production, especially intonation patterns. PMID:23801976
Language and motor abilities of preschool children who stutter: Evidence from behavioral and kinematic indices of nonword repetition performance

PubMed Central

Smith, Anne; Goffman, Lisa; Sasisekaran, Jayanthi; Weber-Fox, Christine

2012-01-01

Stuttering is a disorder of speech production that typically arises in the preschool years, and many accounts of its onset and development implicate language and motor processes as critical underlying factors. There have, however, been very few studies of speech motor control processes in preschool children who stutter. Hearing novel nonwords and reproducing them engages multiple neural networks, including those involved in phonological analysis and storage and speech motor programming and execution. We used this task to explore speech motor and language abilities of 31 children aged 4–5 years who were diagnosed as stuttering. We also used sensitive and specific standardized tests of speech and language abilities to determine which of the children who stutter had concomitant language and/or phonological disorders. Approximately half of our sample of stuttering children had language and/or phonological disorders. As previous investigations would suggest, the stuttering children with concomitant language or speech sound disorders produced significantly more errors on the nonword repetition task compared to typically developing children. In contrast, the children who were diagnosed as stuttering, but who had normal speech sound and language abilities, performed the nonword repetition task with equal accuracy compared to their normally fluent peers. Analyses of interarticulator motions during accurate and fluent productions of the nonwords revealed that the children who stutter (without concomitant disorders) showed higher variability in oral motor coordination indices. These results provide new evidence that preschool children diagnosed as stuttering lag their typically developing peers in maturation of speech motor control processes. Educational objectives The reader will be able to: (a) discuss why performance on nonword repetition tasks has been investigated in children who stutter; (b) discuss why children who stutter in the current study had a higher incidence of concomitant language deficits compared to several other studies; (c) describe how performance differed on a nonword repetition test between children who stutter who do and do not have concomitant speech or language deficits; (d) make a general statement about speech motor control for nonword production in children who stutter compared to controls. PMID:23218217
Analysis of glottal source parameters in Parkinsonian speech.

PubMed

Hanratty, Jane; Deegan, Catherine; Walsh, Mary; Kirkpatrick, Barry

2016-08-01

Diagnosis and monitoring of Parkinson's disease has a number of challenges as there is no definitive biomarker despite the broad range of symptoms. Research is ongoing to produce objective measures that can either diagnose Parkinson's or act as an objective decision support tool. Recent research on speech based measures have demonstrated promising results. This study aims to investigate the characteristics of the glottal source signal in Parkinsonian speech. An experiment is conducted in which a selection of glottal parameters are tested for their ability to discriminate between healthy and Parkinsonian speech. Results for each glottal parameter are presented for a database of 50 healthy speakers and a database of 16 speakers with Parkinsonian speech symptoms. Receiver operating characteristic (ROC) curves were employed to analyse the results and the area under the ROC curve (AUC) values were used to quantify the performance of each glottal parameter. The results indicate that glottal parameters can be used to discriminate between healthy and Parkinsonian speech, although results varied for each parameter tested. For the task of separating healthy and Parkinsonian speech, 2 out of the 7 glottal parameters tested produced AUC values of over 0.9.
Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

PubMed

Fels, S S; Hinton, G E

1997-01-01

Glove-Talk II is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-Talk II uses several input devices, a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. With Glove-Talk II, the subject can speak slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Clear Speech Variants: An Acoustic Study in Parkinson's Disease.

PubMed

Lam, Jennifer; Tjaden, Kris

2016-08-01

The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different sentences selected from the Sentence Intelligibility Test (Yorkston & Beukelman, 1996). All speakers produced stimuli in 4 speaking conditions (habitual, clear, overenunciate, and hearing impaired). Segmental acoustic measures included vowel space area and first moment (M1) coefficient difference measures for consonant pairs. Second formant slope of diphthongs and measures of vowel and fricative durations were also obtained. Suprasegmental measures included fundamental frequency, sound pressure level, and articulation rate. For the majority of adjustments, all variants of clear speech instruction differed from the habitual condition. The overenunciate condition elicited the greatest magnitude of change for segmental measures (vowel space area, vowel durations) and the slowest articulation rates. The hearing impaired condition elicited the greatest fricative durations and suprasegmental adjustments (fundamental frequency, sound pressure level). Findings have implications for a model of speech production for healthy speakers as well as for speakers with dysarthria. Findings also suggest that particular clear speech instructions may target distinct speech subsystems.
Accent, intelligibility, and comprehensibility in the perception of foreign-accented Lombard speech

NASA Astrophysics Data System (ADS)

Li, Chi-Nin

2003-10-01

Speech produced in noise (Lombard speech) has been reported to be more intelligible than speech produced in quiet (normal speech). This study examined the perception of non-native Lombard speech in terms of intelligibility, comprehensibility, and degree of foreign accent. Twelve Cantonese speakers and a comparison group of English speakers read simple true and false English statements in quiet and in 70 dB of masking noise. Lombard and normal utterances were mixed with noise at a constant signal-to-noise ratio, and presented along with noise-free stimuli to eight new English listeners who provided transcription scores, comprehensibility ratings, and accent ratings. Analyses showed that, as expected, utterances presented in noise were less well perceived than were noise-free sentences, and that the Cantonese speakers' productions were more accented, but less intelligible and less comprehensible than those of the English speakers. For both groups of speakers, the Lombard sentences were correctly transcribed more often than their normal utterances in noisy conditions. However, the Cantonese-accented Lombard sentences were not rated as easier to understand than was the normal speech in all conditions. The assigned accent ratings were similar throughout all listening conditions. Implications of these findings will be discussed.
A Comparison of LBG and ADPCM Speech Compression Techniques

NASA Astrophysics Data System (ADS)

Bachu, Rajesh G.; Patel, Jignasa; Barkana, Buket D.

Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. In all speech there is a degree of predictability and speech coding techniques exploit this to reduce bit rates yet still maintain a suitable level of quality. This paper is a study and implementation of Linde-Buzo-Gray Algorithm (LBG) and Adaptive Differential Pulse Code Modulation (ADPCM) algorithms to compress speech signals. In here we implemented the methods using MATLAB 7.0. The methods we used in this study gave good results and performance in compressing the speech and listening tests showed that efficient and high quality coding is achieved.
Ultrasound Images of the Tongue: A Tutorial for Assessment and Remediation of Speech Sound Errors.

PubMed

Preston, Jonathan L; McAllister Byun, Tara; Boyce, Suzanne E; Hamilton, Sarah; Tiede, Mark; Phillips, Emily; Rivera-Campos, Ahmed; Whalen, Douglas H

2017-01-03

Diagnostic ultrasound imaging has been a common tool in medical practice for several decades. It provides a safe and effective method for imaging structures internal to the body. There has been a recent increase in the use of ultrasound technology to visualize the shape and movements of the tongue during speech, both in typical speakers and in clinical populations. Ultrasound imaging of speech has greatly expanded our understanding of how sounds articulated with the tongue (lingual sounds) are produced. Such information can be particularly valuable for speech-language pathologists. Among other advantages, ultrasound images can be used during speech therapy to provide (1) illustrative models of typical (i.e. "correct") tongue configurations for speech sounds, and (2) a source of insight into the articulatory nature of deviant productions. The images can also be used as an additional source of feedback for clinical populations learning to distinguish their better productions from their incorrect productions, en route to establishing more effective articulatory habits. Ultrasound feedback is increasingly used by scientists and clinicians as both the expertise of the users increases and as the expense of the equipment declines. In this tutorial, procedures are presented for collecting ultrasound images of the tongue in a clinical context. We illustrate these procedures in an extended example featuring one common error sound, American English /r/. Images of correct and distorted /r/ are used to demonstrate (1) how to interpret ultrasound images, (2) how to assess tongue shape during production of speech sounds, (3), how to categorize tongue shape errors, and (4), how to provide visual feedback to elicit a more appropriate and functional tongue shape. We present a sample protocol for using real-time ultrasound images of the tongue for visual feedback to remediate speech sound errors. Additionally, example data are shown to illustrate outcomes with the procedure.
Speech recognition training for enhancing written language generation by a traumatic brain injury survivor.

PubMed

Manasse, N J; Hux, K; Rankin-Erickson, J L

2000-11-01

Impairments in motor functioning, language processing, and cognitive status may impact the written language performance of traumatic brain injury (TBI) survivors. One strategy to minimize the impact of these impairments is to use a speech recognition system. The purpose of this study was to explore the effect of mild dysarthria and mild cognitive-communication deficits secondary to TBI on a 19-year-old survivor's mastery and use of such a system-specifically, Dragon Naturally Speaking. Data included the % of the participant's words accurately perceived by the system over time, the participant's accuracy over time in using commands for navigation and error correction, and quantitative and qualitative changes in the participant's written texts generated with and without the use of the speech recognition system. Results showed that Dragon NaturallySpeaking was approximately 80% accurate in perceiving words spoken by the participant, and the participant quickly and easily mastered all navigation and error correction commands presented. Quantitatively, the participant produced a greater amount of text using traditional word processing and a standard keyboard than using the speech recognition system. Minimal qualitative differences appeared between writing samples. Discussion of factors that may have contributed to the obtained results and that may affect the generalization of the findings to other TBI survivors is provided.
Changes in Speech Production Associated with Alphabet Supplementation

ERIC Educational Resources Information Center

Hustad, Katherine C.; Lee, Jimin

2008-01-01

Purpose: This study examined the effect of alphabet supplementation (AS) on temporal and spectral features of speech production in individuals with cerebral palsy and dysarthria. Method: Twelve speakers with dysarthria contributed speech samples using habitual speech and while using AS. One hundred twenty listeners orthographically transcribed…
Sources of Variability in Children’s Language Growth

PubMed Central

Huttenlocher, Janellen; Waterfall, Heidi; Vasilyeva, Marina; Vevea, Jack; Hedges, Larry V.

2010-01-01

The present longitudinal study examines the role of caregiver speech in language development, especially syntactic development, using 47 parent-child pairs of diverse SES background from 14 to 46 months. We assess the diversity (variety) of words and syntactic structures produced by caregivers and children. We use lagged correlations to examine language growth and its relation to caregiver speech. Results show substantial individual differences among children, and indicate that diversity of earlier caregiver speech significantly predicts corresponding diversity in later child speech. For vocabulary, earlier child speech also predicts later caregiver speech, suggesting mutual influence. However, for syntax, earlier child speech does not significantly predict later caregiver speech, suggesting a causal flow from caregiver to child. Finally, demographic factors, notably SES, are related to language growth, and are, at least partially, mediated by differences in caregiver speech, showing the pervasive influence of caregiver speech on language growth. PMID:20832781
The Influence of Target and Masker Characteristics on Infants' and Adults' Detection of Speech

ERIC Educational Resources Information Center

Oster, Monika-Maria; Werner, Lynne A.

2017-01-01

Purpose: Several investigators have compared infants' detection of speech in speech and nonspeech maskers to evaluate developmental differences in masking. Such comparisons have produced contradictory results, possibly because each investigation used different stimuli. The current study examined target and masker effects on infants' and adults'…

Striking a Balance: The Speechwriting Educator's Perspective.

ERIC Educational Resources Information Center

Tarver, Jerry

The content of a good speech writing course includes an explanation of the function and impact of speech writers, an examination of speeches produced by professional writers, and a focus on the sharpening of students' writing skills. The content must also be balanced between the practical/professional and the abstract/academic aspects of the…
Anticipatory Posturing of the Vocal Tract Reveals Dissociation of Speech Movement Plans from Linguistic Units

PubMed Central

Tilsen, Sam; Spincemaille, Pascal; Xu, Bo; Doerschuk, Peter; Luh, Wen-Ming; Feldman, Elana; Wang, Yi

2016-01-01

Models of speech production typically assume that control over the timing of speech movements is governed by the selection of higher-level linguistic units, such as segments or syllables. This study used real-time magnetic resonance imaging of the vocal tract to investigate the anticipatory movements speakers make prior to producing a vocal response. Two factors were varied: preparation (whether or not speakers had foreknowledge of the target response) and pre-response constraint (whether or not speakers were required to maintain a specific vocal tract posture prior to the response). In prepared responses, many speakers were observed to produce pre-response anticipatory movements with a variety of articulators, showing that that speech movements can be readily dissociated from higher-level linguistic units. Substantial variation was observed across speakers with regard to the articulators used for anticipatory posturing and the contexts in which anticipatory movements occurred. The findings of this study have important consequences for models of speech production and for our understanding of the normal range of variation in anticipatory speech behaviors. PMID:26760511
Anticipatory Posturing of the Vocal Tract Reveals Dissociation of Speech Movement Plans from Linguistic Units.

PubMed

Tilsen, Sam; Spincemaille, Pascal; Xu, Bo; Doerschuk, Peter; Luh, Wen-Ming; Feldman, Elana; Wang, Yi

2016-01-01

Models of speech production typically assume that control over the timing of speech movements is governed by the selection of higher-level linguistic units, such as segments or syllables. This study used real-time magnetic resonance imaging of the vocal tract to investigate the anticipatory movements speakers make prior to producing a vocal response. Two factors were varied: preparation (whether or not speakers had foreknowledge of the target response) and pre-response constraint (whether or not speakers were required to maintain a specific vocal tract posture prior to the response). In prepared responses, many speakers were observed to produce pre-response anticipatory movements with a variety of articulators, showing that that speech movements can be readily dissociated from higher-level linguistic units. Substantial variation was observed across speakers with regard to the articulators used for anticipatory posturing and the contexts in which anticipatory movements occurred. The findings of this study have important consequences for models of speech production and for our understanding of the normal range of variation in anticipatory speech behaviors.
Differences between the production of [s] and [ʃ] in the speech of adults, typically developing children, and children with speech sound disorders: An ultrasound study.

PubMed

Francisco, Danira Tavares; Wertzner, Haydée Fiszbein

2017-01-01

This study describes the criteria that are used in ultrasound to measure the differences between the tongue contours that produce [s] and [ʃ] sounds in the speech of adults, typically developing children (TDC), and children with speech sound disorder (SSD) with the phonological process of palatal fronting. Overlapping images of the tongue contours that resulted from 35 subjects producing the [s] and [ʃ] sounds were analysed to select 11 spokes on the radial grid that were spread over the tongue contour. The difference was calculated between the mean contour of the [s] and [ʃ] sounds for each spoke. A cluster analysis produced groups with some consistency in the pattern of articulation across subjects and differentiated adults and TDC to some extent and children with SSD with a high level of success. Children with SSD were less likely to show differentiation of the tongue contours between the articulation of [s] and [ʃ].
Speech-Like Rhythm in a Voiced and Voiceless Orangutan Call

PubMed Central

Lameira, Adriano R.; Hardus, Madeleine E.; Bartlett, Adrian M.; Shumaker, Robert W.; Wich, Serge A.; Menken, Steph B. J.

2015-01-01

The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ∼5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech. Three essential predictions remain, however, to be tested to assess this hypothesis' validity; (i) Great apes, our closest relatives, should likewise produce 5Hz-rhythm signals, (ii) speech-like rhythm should involve calls articulatorily similar to consonants and vowels given that speech rhythm is the direct product of stringing together these two basic elements, and (iii) speech-like rhythm should be experience-based. Via cinematic analyses we demonstrate that an ex-entertainment orangutan produces two calls at a speech-like rhythm, coined “clicks” and “faux-speech.” Like voiceless consonants, clicks required no vocal fold action, but did involve independent manoeuvring over lips and tongue. In parallel to vowels, faux-speech showed harmonic and formant modulations, implying vocal fold and supralaryngeal action. This rhythm was several times faster than orangutan chewing rates, as observed in monkeys and humans. Critically, this rhythm was seven-fold faster, and contextually distinct, than any other known rhythmic calls described to date in the largest database of the orangutan repertoire ever assembled. The first two predictions advanced by this study are validated and, based on parsimony and exclusion of potential alternative explanations, initial support is given to the third prediction. Irrespectively of the putative origins of these calls and underlying mechanisms, our findings demonstrate irrevocably that great apes are not respiratorily, articulatorilly, or neurologically constrained for the production of consonant- and vowel-like calls at speech rhythm. Orangutan clicks and faux-speech confirm the importance of rhythmic speech antecedents within the primate lineage, and highlight potential articulatory homologies between great ape calls and human consonants and vowels. PMID:25569211
Minimal Pair Distinctions and Intelligibility in Preschool Children with and without Speech Sound Disorders

ERIC Educational Resources Information Center

Hodge, Megan M.; Gotzke, Carrie L.

2011-01-01

Listeners' identification of young children's productions of minimally contrastive words and predictive relationships between accurately identified words and intelligibility scores obtained from a 100-word spontaneous speech sample were determined for 36 children with typically developing speech (TDS) and 36 children with speech sound disorders…
Comparison of Magnetic Resonance Imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002

PubMed Central

Story, Brad H.

2008-01-01

A new set of area functions for vowels has been obtained with Magnetic Resonance Imaging (MRI) from the same speaker as that previously reported in 1996 [Story, Titze, & Hoffman, JASA, 100, 537–554 (1996)]. The new area functions were derived from image data collected in 2002, whereas the previously reported area functions were based on MR images obtained in 1994. When compared, the new area function sets indicated a tendency toward a constricted pharyngeal region and expanded oral cavity relative to the previous set. Based on calculated formant frequencies and sensitivity functions, these morphological differences were shown to have the primary acoustic effect of systematically shifting the second formant (F2) downward in frequency. Multiple instances of target vocal tract shapes from a specific speaker provide additional sampling of the possible area functions that may be produced during speech production. This may be of benefit for understanding intra-speaker variability in vowel production and for further development of speech synthesizers and speech models that utilize area function information. PMID:18177162
The effect of speech rate on stuttering frequency, phonated intervals, speech effort, and speech naturalness during chorus reading.

PubMed

Davidow, Jason H; Ingham, Roger J

2013-01-01

This study examined the effect of speech rate on phonated intervals (PIs), in order to test whether a reduction in the frequency of short PIs is an important part of the fluency-inducing mechanism of chorus reading. The influence of speech rate on stuttering frequency, speaker-judged speech effort, and listener-judged naturalness was also examined. An added purpose was to determine if chorus reading could be further refined so as to provide a perceptual guide for gauging the level of physical effort exerted during speech production. A repeated-measures design was used to compare data obtained during control reading conditions and during several chorus reading conditions produced at different speech rates. Participants included 8 persons who stutter (PWS) between the ages of 16 and 32 years. There were significant reductions in the frequency of short PIs from the habitual reading condition during slower chorus conditions, no change when speech rates were matched between habitual reading and chorus conditions, and an increase in the frequency of short PIs during chorus reading produced at a faster rate than the habitual condition. Speech rate did not have an effect on stuttering frequency during chorus reading. In general, speech effort ratings improved and naturalness ratings worsened as speech rate decreased. These results provide evidence that (a) a reduction in the frequency of short PIs is not necessary for fluency improvement during chorus reading, and (b) speech rate may be altered to provide PWS with a more appropriate reference for how physically effortful normally fluent speech production should be. Future investigations should examine the necessity of changes in the activation of neural regions during chorus reading, the possibility of defining individualized units on a 9-point effort scale, and if there are upper and lower speech rate boundaries for receiving ratings of "highly natural sounding" speech during chorus reading. The reader will be able to: (1) describe the effect of changes in speech rate on the frequency of short phonated intervals during chorus reading, (2) describe changes to speaker-judged speech effort as speech rate changes during chorus reading, (3) and describe the effect of changes in speech rate on listener-judged naturalness ratings during chorus reading. Copyright © 2012 Elsevier Inc. All rights reserved.
The influence of sexual orientation on vowel production (L)

NASA Astrophysics Data System (ADS)

Pierrehumbert, Janet B.; Bent, Tessa; Munson, Benjamin; Bradlow, Ann R.; Bailey, J. Michael

2004-10-01

Vowel production in gay, lesbian, bisexual (GLB), and heterosexual speakers was examined. Differences in the acoustic characteristics of vowels were found as a function of sexual orientation. Lesbian and bisexual women produced less fronted /u/ and /opena/ than heterosexual women. Gay men produced a more expanded vowel space than heterosexual men. However, the vowels of GLB speakers were not generally shifted toward vowel patterns typical of the opposite sex. These results are inconsistent with the conjecture that innate biological factors have a broadly feminizing influence on the speech of gay men and a broadly masculinizing influence on the speech of lesbian/bisexual women. They are consistent with the idea that innate biological factors influence GLB speech patterns indirectly by causing selective adoption of certain speech patterns characteristic of the opposite sex. .
Effect of the speed of a single-channel dynamic range compressor on intelligibility in a competing speech task

NASA Astrophysics Data System (ADS)

Stone, Michael A.; Moore, Brian C. J.

2003-08-01

Using a ``noise-vocoder'' cochlear implant simulator [Shannon et al., Science 270, 303-304 (1995)], the effect of the speed of dynamic range compression on speech intelligibility was assessed, using normal-hearing subjects. The target speech had a level 5 dB above that of the competing speech. Initially, baseline performance was measured with no compression active, using between 4 and 16 processing channels. Then, performance was measured using a fast-acting compressor and a slow-acting compressor, each operating prior to the vocoder simulation. The fast system produced significant gain variation over syllabic timescales. The slow system produced significant gain variation only over the timescale of sentences. With no compression active, about six channels were necessary to achieve 50% correct identification of words in sentences. Sixteen channels produced near-maximum performance. Slow-acting compression produced no significant degradation relative to the baseline. However, fast-acting compression consistently reduced performance relative to that for the baseline, over a wide range of performance levels. It is suggested that fast-acting compression degrades performance for two reasons: (1) because it introduces correlated fluctuations in amplitude in different frequency bands, which tends to produce perceptual fusion of the target and background sounds and (2) because it reduces amplitude modulation depth and intensity contrasts.
Articulatory-acoustic vowel space: application to clear speech in individuals with Parkinson's disease.

PubMed

Whitfield, Jason A; Goberman, Alexander M

2014-01-01

Individuals with Parkinson disease (PD) often exhibit decreased range of movement secondary to the disease process, which has been shown to affect articulatory movements. A number of investigations have failed to find statistically significant differences between control and disordered groups, and between speaking conditions, using traditional vowel space area measures. The purpose of the current investigation was to evaluate both between-group (PD versus control) and within-group (habitual versus clear) differences in articulatory function using a novel vowel space measure, the articulatory-acoustic vowel space (AAVS). The novel AAVS is calculated from continuously sampled formant trajectories of connected speech. In the current study, habitual and clear speech samples from twelve individuals with PD along with habitual control speech samples from ten neurologically healthy adults were collected and acoustically analyzed. In addition, a group of listeners completed perceptual rating of speech clarity for all samples. Individuals with PD were perceived to exhibit decreased speech clarity compared to controls. Similarly, the novel AAVS measure was significantly lower in individuals with PD. In addition, the AAVS measure significantly tracked changes between the habitual and clear conditions that were confirmed by perceptual ratings. In the current study, the novel AAVS measure is shown to be sensitive to disease-related group differences and within-person changes in articulatory function of individuals with PD. Additionally, these data confirm that individuals with PD can modulate the speech motor system to increase articulatory range of motion and speech clarity when given a simple prompt. The reader will be able to (i) describe articulatory behavior observed in the speech of individuals with Parkinson disease; (ii) describe traditional measures of vowel space area and how they relate to articulation; (iii) describe a novel measure of vowel space, the articulatory-acoustic vowel space and its relationship to articulation and the perception of speech clarity. Copyright © 2014 Elsevier Inc. All rights reserved.
Contrast-marking prosodic emphasis in Williams syndrome: results of detailed phonetic analysis.

PubMed

Ito, Kiwako; Martens, Marilee A

2017-01-01

Past reports on the speech production of individuals with Williams syndrome (WS) suggest that their prosody is anomalous and may lead to challenges in spoken communication. While existing prosodic assessments confirm that individuals with WS fail to use prosodic emphasis to express contrast, those reports typically lack detailed phonetic analysis of speech data. The present study examines the acoustic properties of speech prosody, aiming for the future development of targeted speech interventions. The study examines the three primary acoustic correlates of prosodic emphasis (duration, intensity, F0) and determines whether individuals with WS have difficulty in producing all or a particular set of the three prosodic cues. Speech produced by 12 individuals with WS and 12 chronological age (CA)-matched typically developing individuals were recorded. A sequential picture-naming task elicited production of target phrases in three contexts: (1) no contrast: gorilla with a racket → rabbit with a balloon; (2) contrast on the animal: fox with a balloon → rabbit with a balloon; and (3) contrast on the object: rabbit with a ball → rabbit with a balloon. The three acoustic correlates of prosodic prominence (duration, intensity and F0) were compared across the three referential contexts. The two groups exhibited striking similarities in their use of word duration and intensity for expressing contrast. Both groups showed the reduction and enhancement of final lengthening, and the enhancement and reduction of intensity difference for the animal contrast and for the object contrast conditions, respectively. The two groups differed in their use of F0: the CA group produced higher F0 for the animal than for the object regardless of the context, and this difference was enhanced when the animal noun was contrastive. In contrast, the WS group produced higher F0 for the object than for the animal when the object was contrastive. The present data contradict previous assessment results that report a lack of prosodic skills to mark contrast in individuals with WS. The methodological differences that may account for this variability are discussed. The present data suggest that individuals with WS produce appropriate prosodic cues to express contrast, although their use of pitch may be somewhat atypical. Additional data and future speech comprehension studies will determine whether pitch modulation can be targeted for speech intervention in individuals with WS. © 2016 Royal College of Speech and Language Therapists.
System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

DOEpatents

Burnett, Greg C [Livermore, CA; Holzrichter, John F [Berkeley, CA; Ng, Lawrence C [Danville, CA

2006-08-08

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.
System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

DOEpatents

Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

2004-03-23

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.
System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

DOEpatents

Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

2006-02-14

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.
Speech and motor disturbances in Rett syndrome.

PubMed

Bashina, V M; Simashkova, N V; Grachev, V V; Gorbachevskaya, N L

2002-01-01

Rett syndrome is a severe, genetically determined disease of early childhood which produces a defined clinical phenotype in girls. The main clinical manifestations include lesions affecting speech functions, involving both expressive and receptive speech, as well as motor functions, producing apraxia of the arms and profound abnormalities of gait in the form of ataxia-apraxia. Most investigators note that patients have variability in the severity of derangement to large motor acts and in the damage to fine hand movements and speech functions. The aims of the present work were to study disturbances of speech and motor functions over 2-5 years in 50 girls aged 12 months to 14 years with Rett syndrome and to analyze the correlations between these disturbances. The results of comparing clinical data and EEG traces supported the stepwise involvement of frontal and parietal-temporal cortical structures in the pathological process. The ability to organize speech and motor activity is affected first, with subsequent development of lesions to gnostic functions, which are in turn followed by derangement of subcortical structures and the cerebellum and later by damage to structures in the spinal cord. A clear correlation was found between the severity of lesions to motor and speech functions and neurophysiological data: the higher the level of preservation of elements of speech and motor functions, the smaller were the contributions of theta activity and the greater the contributions of alpha and beta activities to the EEG. The possible pathogenetic mechanisms underlying the motor and speech disturbances in Rett syndrome are discussed.
Hearing Lips and Seeing Voices: How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception

PubMed Central

Skipper, Jeremy I.; van Wassenhove, Virginie; Nusbaum, Howard C.; Small, Steven L.

2009-01-01

Observing a speaker’s mouth profoundly influences speech perception. For example, listeners perceive an “illusory” “ta” when the video of a face producing /ka/ is dubbed onto an audio /pa/. Here, we show how cortical areas supporting speech production mediate this illusory percept and audiovisual (AV) speech perception more generally. Specifically, cortical activity during AV speech perception occurs in many of the same areas that are active during speech production. We find that different perceptions of the same syllable and the perception of different syllables are associated with different distributions of activity in frontal motor areas involved in speech production. Activity patterns in these frontal motor areas resulting from the illusory “ta” percept are more similar to the activity patterns evoked by AV/ta/ than they are to patterns evoked by AV/pa/ or AV/ka/. In contrast to the activity in frontal motor areas, stimulus-evoked activity for the illusory “ta” in auditory and somatosensory areas and visual areas initially resembles activity evoked by AV/pa/ and AV/ka/, respectively. Ultimately, though, activity in these regions comes to resemble activity evoked by AV/ta/. Together, these results suggest that AV speech elicits in the listener a motor plan for the production of the phoneme that the speaker might have been attempting to produce, and that feedback in the form of efference copy from the motor system ultimately influences the phonetic interpretation. PMID:17218482
Perception of emotionally loaded vocal expressions and its connection to responses to music. A cross-cultural investigation: Estonia, Finland, Sweden, Russia, and the USA

PubMed Central

Waaramaa, Teija; Leisiö, Timo

2013-01-01

The present study focused on voice quality and the perception of the basic emotions from speech samples in cross-cultural conditions. It was examined whether voice quality, cultural, or language background, age, or gender were related to the identification of the emotions. Professional actors (n2) and actresses (n2) produced non-sense sentences (n32) and protracted vowels (n8) expressing the six basic emotions, interest, and a neutral emotional state. The impact of musical interests on the ability to distinguish between emotions or valence (on an axis positivity – neutrality – negativity) from voice samples was studied. Listening tests were conducted on location in five countries: Estonia, Finland, Russia, Sweden, and the USA with 50 randomly chosen participants (25 males and 25 females) in each country. The participants (total N = 250) completed a questionnaire eliciting their background information and musical interests. The responses in the listening test and the questionnaires were statistically analyzed. Voice quality parameters and the share of the emotions and valence identified correlated significantly with each other for both genders. The percentage of emotions and valence identified was clearly above the chance level in each of the five countries studied, however, the countries differed significantly from each other for the identified emotions and the gender of the speaker. The samples produced by females were identified significantly better than those produced by males. Listener's age was a significant variable. Only minor gender differences were found for the identification. Perceptual confusion in the listening test between emotions seemed to be dependent on their similar voice production types. Musical interests tended to have a positive effect on the identification of the emotions. The results also suggest that identifying emotions from speech samples may be easier for those listeners who share a similar language or cultural background with the speaker. PMID:23801972
Speech therapy for children with dysarthria acquired before three years of age.

PubMed

Pennington, Lindsay; Parker, Naomi K; Kelly, Helen; Miller, Nick

2016-07-18

Children with motor impairments often have the motor speech disorder dysarthria, a condition which effects the tone, strength and co-ordination of any or all of the muscles used for speech. Resulting speech difficulties can range from mild, with slightly slurred articulation and breathy voice, to profound, with an inability to produce any recognisable words. Children with dysarthria are often prescribed communication aids to supplement their natural forms of communication. However, there is variation in practice regarding the provision of therapy focusing on voice and speech production. Descriptive studies have suggested that therapy may improve speech, but its effectiveness has not been evaluated. To assess whether any speech and language therapy intervention aimed at improving the speech of children with dysarthria is more effective in increasing children's speech intelligibility or communicative participation than no intervention at all , and to compare the efficacy of individual types of speech language therapy in improving the speech intelligibility or communicative participation of children with dysarthria. We searched the Cochrane Central Register of Controlled Trials (CENTRAL; 2015 , Issue 7 ), MEDLINE, EMBASE, CINAHL , LLBA, ERIC, PsychInfo, Web of Science, Scopus, UK National Research Register and Dissertation Abstracts up to July 2015, handsearched relevant journals published between 1980 and July 2015, and searched proceedings of relevant conferences between 1996 to 2015. We placed no restrictions on the language or setting of the studies. A previous version of this review considered studies published up to April 2009. In this update we searched for studies published from April 2009 to July 2015. We considered randomised controlled trials and studies using quasi-experimental designs in which children were allocated to groups using non-random methods. One author (LP) conducted searches of all databases, journals and conference reports. All searches included a reliability check in which a second review author independently checked a random sample comprising 15% of all identified reports. We planned that two review authors would independently assess the quality and extract data from eligible studies. No randomised controlled trials or group studies were identified. This review found no evidence from randomised trials of the effectiveness of speech and language therapy interventions to improve the speech of children with early acquired dysarthria. Rigorous, fully powered randomised controlled trials are needed to investigate if the positive changes in children's speech observed in phase I and phase II studies are generalisable to the population of children with early acquired dysarthria served by speech and language therapy services. Research should examine change in children's speech production and intelligibility. It must also investigate children's participation in social and educational activities, and their quality of life, as well as the cost and acceptability of interventions.
The hearing ear is always found close to the speaking tongue: Review of the role of the motor system in speech perception.

PubMed

Skipper, Jeremy I; Devlin, Joseph T; Lametti, Daniel R

2017-01-01

Does "the motor system" play "a role" in speech perception? If so, where, how, and when? We conducted a systematic review that addresses these questions using both qualitative and quantitative methods. The qualitative review of behavioural, computational modelling, non-human animal, brain damage/disorder, electrical stimulation/recording, and neuroimaging research suggests that distributed brain regions involved in producing speech play specific, dynamic, and contextually determined roles in speech perception. The quantitative review employed region and network based neuroimaging meta-analyses and a novel text mining method to describe relative contributions of nodes in distributed brain networks. Supporting the qualitative review, results show a specific functional correspondence between regions involved in non-linguistic movement of the articulators, covertly and overtly producing speech, and the perception of both nonword and word sounds. This distributed set of cortical and subcortical speech production regions are ubiquitously active and form multiple networks whose topologies dynamically change with listening context. Results are inconsistent with motor and acoustic only models of speech perception and classical and contemporary dual-stream models of the organization of language and the brain. Instead, results are more consistent with complex network models in which multiple speech production related networks and subnetworks dynamically self-organize to constrain interpretation of indeterminant acoustic patterns as listening context requires. Copyright © 2016. Published by Elsevier Inc.

Reaction times of normal listeners to laryngeal, alaryngeal, and synthetic speech.

PubMed

Evitts, Paul M; Searl, Jeff

2006-12-01

The purpose of this study was to compare listener processing demands when decoding alaryngeal compared to laryngeal speech. Fifty-six listeners were presented with single words produced by 1 proficient speaker from 5 different modes of speech: normal, tracheosophageal (TE), esophageal (ES), electrolaryngeal (EL), and synthetic speech (SS). Cognitive processing load was indexed by listener reaction time (RT). To account for significant durational differences among the modes of speech, an RT ratio was calculated (stimulus duration divided by RT). Results indicated that the cognitive processing load was greater for ES and EL relative to normal speech. TE and normal speech did not differ in terms of RT ratio, suggesting fairly comparable cognitive demands placed on the listener. SS required greater cognitive processing load than normal and alaryngeal speech. The results are discussed relative to alaryngeal speech intelligibility and the role of the listener. Potential clinical applications and directions for future research are also presented.
Recognizing intentions in infant-directed speech: evidence for universals.

PubMed

Bryant, Gregory A; Barrett, H Clark

2007-08-01

In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak.
Relative Salience of Speech Rhythm and Speech Rate on Perceived Foreign Accent in a Second Language.

PubMed

Polyanskaya, Leona; Ordin, Mikhail; Busa, Maria Grazia

2017-09-01

We investigated the independent contribution of speech rate and speech rhythm to perceived foreign accent. To address this issue we used a resynthesis technique that allows neutralizing segmental and tonal idiosyncrasies between identical sentences produced by French learners of English at different proficiency levels and maintaining the idiosyncrasies pertaining to prosodic timing patterns. We created stimuli that (1) preserved the idiosyncrasies in speech rhythm while controlling for the differences in speech rate between the utterances; (2) preserved the idiosyncrasies in speech rate while controlling for the differences in speech rhythm between the utterances; and (3) preserved the idiosyncrasies both in speech rate and speech rhythm. All the stimuli were created in intoned (with imposed intonational contour) and flat (with monotonized, constant F0) conditions. The original and the resynthesized sentences were rated by native speakers of English for degree of foreign accent. We found that both speech rate and speech rhythm influence the degree of perceived foreign accent, but the effect of speech rhythm is larger than that of speech rate. We also found that intonation enhances the perception of fine differences in rhythmic patterns but reduces the perceptual salience of fine differences in speech rate.
Influences of Electromagnetic Articulography Sensors on Speech Produced by Healthy Adults and Individuals with Aphasia and Apraxia

ERIC Educational Resources Information Center

Katz, William F.; Bharadwaj, Sneha V.; Stettler, Monica P.

2006-01-01

Purpose: This study examined whether the intraoral transducers used in electromagnetic articulography (EMA) interfere with speech and whether there is an added risk of interference when EMA systems are used to study individuals with aphasia and apraxia. Method: Ten adult talkers (5 individuals with aphasia/apraxia, 5 controls) produced 12 American…
Relative Difficulty of Understanding Foreign Accents as a Marker of Proficiency

ERIC Educational Resources Information Center

Lev-Ari, Shiri; van Heugten, Marieke; Peperkamp, Sharon

2017-01-01

Foreign-accented speech is generally harder to understand than native-accented speech. This difficulty is reduced for non-native listeners who share their first language with the non-native speaker. It is currently unclear, however, how non-native listeners deal with foreign-accented speech produced by speakers of a different language. We show…
Effects of Loud and Amplified Speech on Sentence and Word Intelligibility in Parkinson Disease

ERIC Educational Resources Information Center

Neel, Amy T.

2009-01-01

Purpose: In the two experiments in this study, the author examined the effects of increased vocal effort (loud speech) and amplification on sentence and word intelligibility in speakers with Parkinson disease (PD). Methods: Five talkers with PD produced sentences and words at habitual levels of effort and using loud speech techniques. Amplified…
Intelligibility of Noise-Adapted and Clear Speech in Child, Young Adult, and Older Adult Talkers

ERIC Educational Resources Information Center

Smiljanic, Rajka; Gilbert, Rachael C.

2017-01-01

Purpose: This study examined intelligibility of conversational and clear speech sentences produced in quiet and in noise by children, young adults, and older adults. Relative talker intelligibility was assessed across speaking styles. Method: Sixty-one young adult participants listened to sentences mixed with speech-shaped noise at -5 dB…
APPLICATION OF MOWRER'S AUTISTIC THEORY TO THE SPEECH HABILITATION OF MENTALLY RETARDED PUPILS.

ERIC Educational Resources Information Center

RIGRODSKY, S.; AND OTHERS

A SPEECH THERAPY METHOD FOR MENTAL RETARDATES WAS DEVELOPED AND EVALUATED. THE METHOD WAS BASED UPON THE ESTABLISHMENT OF FAVORABLE ASSOCIATIONS IN THE CHILD BETWEEN THE WORDS AND SOUNDS OF LANGUAGE AND THE PRODUCER OF THE LANGUAGE, USING STIMULUS-REWARD AND SITUATION-REWARD PRINCIPLES. TRADITIONAL METHODS OF SPEECH THERAPY WERE ADMINISTERED,…
Speech Understanding in Noise by Patients with Cochlear Implants Using a Monaural Adaptive Beamformer

ERIC Educational Resources Information Center

Dorman, Michael F.; Natale, Sarah; Spahr, Anthony; Castioni, Erin

2017-01-01

Purpose: The aim of this experiment was to compare, for patients with cochlear implants (CIs), the improvement for speech understanding in noise provided by a monaural adaptive beamformer and for two interventions that produced bilateral input (i.e., bilateral CIs and hearing preservation [HP] surgery). Method: Speech understanding scores for…
Variability and Diagnostic Accuracy of Speech Intelligibility Scores in Children

ERIC Educational Resources Information Center

Hustad, Katherine C.; Oakes, Ashley; Allison, Kristen

2015-01-01

Purpose: We examined variability of speech intelligibility scores and how well intelligibility scores predicted group membership among 5-year-old children with speech motor impairment (SMI) secondary to cerebral palsy and an age-matched group of typically developing (TD) children. Method: Speech samples varying in length from 1-4 words were…
The Suitability of Cloud-Based Speech Recognition Engines for Language Learning

ERIC Educational Resources Information Center

Daniels, Paul; Iwago, Koji

2017-01-01

As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…
Speech recognition systems on the Cell Broadband Engine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Y; Jones, H; Vaidya, S

In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less
The influence of phonological context on the sound errors of a speaker with Wernicke's aphasia.

PubMed

Goldmann, R E; Schwartz, M F; Wilshire, C E

2001-09-01

A corpus of phonological errors produced in narrative speech by a Wernicke's aphasic speaker (R.W.B.) was tested for context effects using two new methods for establishing chance baselines. A reliable anticipatory effect was found using the second method, which estimated chance from the distance between phoneme repeats in the speech sample containing the errors. Relative to this baseline, error-source distances were shorter than expected for anticipations, but not perseverations. R.W.B.'s anticipation/perseveration ratio measured intermediate between a nonaphasic error corpus and that of a more severe aphasic speaker (both reported in Schwartz et al., 1994), supporting the view that the anticipatory bias correlates to severity. Finally, R.W.B's anticipations favored word-initial segments, although errors and sources did not consistently share word or syllable position. Copyright 2001 Academic Press.
Describing Speech Usage in Daily Activities in Typical Adults.

PubMed

Anderson, Laine; Baylor, Carolyn R; Eadie, Tanya L; Yorkston, Kathryn M

2016-01-01

"Speech usage" refers to what people want or need to do with their speech to meet communication demands in life roles. The purpose of this study was to contribute to validation of the Levels of Speech Usage scale by providing descriptive data from a sample of adults without communication disorders, comparing this scale to a published Occupational Voice Demands scale and examining predictors of speech usage levels. This is a survey design. Adults aged ≥25 years without reported communication disorders were recruited nationally to complete an online questionnaire. The questionnaire included the Levels of Speech Usage scale, questions about relevant occupational and nonoccupational activities (eg, socializing, hobbies, childcare, and so forth), and demographic information. Participants were also categorized according to Koufman and Isaacson occupational voice demands scale. A total of 276 participants completed the questionnaires. People who worked for pay tended to report higher levels of speech usage than those who do not work for pay. Regression analyses showed employment to be the major contributor to speech usage; however, considerable variance left unaccounted for suggests that determinants of speech usage and the relationship between speech usage, employment, and other life activities are not yet fully defined. The Levels of Speech Usage may be a viable instrument to systematically rate speech usage because it captures both occupational and nonoccupational speech demands. These data from a sample of typical adults may provide a reference to help in interpreting the impact of communication disorders on speech usage patterns. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Talker identification across source mechanisms: experiments with laryngeal and electrolarynx speech.

PubMed

Perrachione, Tyler K; Stepp, Cara E; Hillman, Robert E; Wong, Patrick C M

2014-10-01

The purpose of this study was to determine listeners' ability to learn talker identity from speech produced with an electrolarynx, explore source and filter differentiation in talker identification, and describe acoustic-phonetic changes associated with electrolarynx use. Healthy adult control listeners learned to identify talkers from speech recordings produced using talkers' normal laryngeal vocal source or an electrolarynx. Listeners' abilities to identify talkers from the trained vocal source (Experiment 1) and generalize this knowledge to the untrained source (Experiment 2) were assessed. Acoustic-phonetic measurements of spectral differences between source mechanisms were performed. Additional listeners attempted to match recordings from different source mechanisms to a single talker (Experiment 3). Listeners successfully learned talker identity from electrolarynx speech but less accurately than from laryngeal speech. Listeners were unable to generalize talker identity to the untrained source mechanism. Electrolarynx use resulted in vowels with higher F1 frequencies compared with laryngeal speech. Listeners matched recordings from different sources to a single talker better than chance. Electrolarynx speech, although lacking individual differences in voice quality, nevertheless conveys sufficient indexical information related to the vocal filter and articulation for listeners to identify individual talkers. Psychologically, perception of talker identity arises from a "gestalt" of the vocal source and filter.
Glove-TalkII--a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

PubMed

Fels, S S; Hinton, G E

1998-01-01

Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a ContactGlove, a three-space tracker, and a foot pedal), a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Talker identification across source mechanisms: Experiments with laryngeal and electrolarynx speech

PubMed Central

Perrachione, Tyler K.; Stepp, Cara E.; Hillman, Robert E.; Wong, Patrick C.M.

2015-01-01

Purpose To determine listeners' ability to learn talker identity from speech produced with an electrolarynx, explore source and filter differentiation in talker identification, and describe acoustic-phonetic changes associated with electrolarynx use. Method Healthy adult control listeners learned to identify talkers from speech recordings produced using talkers' normal laryngeal vocal source or an electrolarynx. Listeners' abilities to identify talkers from the trained vocal source (Experiment 1) and generalize this knowledge to the untrained source (Experiment 2) were assessed. Acoustic-phonetic measurements of spectral differences between source mechanisms were performed. Additional listeners attempted to match recordings from different source mechanisms to a single talker (Experiment 3). Results Listeners successfully learned talker identity from electrolarynx speech, but less accurately than from laryngeal speech. Listeners were unable to generalize talker identity to the untrained source mechanism. Electrolarynx use resulted in vowels with higher F1 frequencies compared to laryngeal speech. Listeners matched recordings from different sources to a single talker better than chance. Conclusions Electrolarynx speech, though lacking individual differences in voice quality, nevertheless conveys sufficient indexical information related to the vocal filter and articulation for listeners to identify individual talkers. Psychologically, perception of talker identity arises from a “gestalt” of the vocal source and filter. PMID:24801962
VERBAL AND SPATIAL WORKING MEMORY LOAD HAVE SIMILARLY MINIMAL EFFECTS ON SPEECH PRODUCTION.

PubMed

Lee, Ogyoung; Redford, Melissa A

2015-08-10

The goal of the present study was to test the effects of working memory on speech production. Twenty American-English speaking adults produced syntactically complex sentences in tasks that taxed either verbal or spatial working memory. Sentences spoken under load were produced with more errors, fewer prosodic breaks, and at faster rates than sentence produced in the control conditions, but other acoustic correlates of rhythm and intonation did not change. Verbal and spatial working memory had very similar effects on production, suggesting that the different span tasks used to tax working memory merely shifted speakers' attention away from the act of speaking. This finding runs contra the hypothesis of incremental phonological/phonetic encoding, which predicts the manipulation of information in verbal working memory during speech production.
Cleft audit protocol for speech (CAPS-A): a comprehensive training package for speech analysis.

PubMed

Sell, D; John, A; Harding-Bell, A; Sweeney, T; Hegarty, F; Freeman, J

2009-01-01

The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been paid to this issue. To design, execute, and evaluate a training programme for speech and language therapists on the systematic and reliable use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A), addressing issues of standardized speech samples, data acquisition, recording, playback, and listening guidelines. Thirty-six specialist speech and language therapists undertook the training programme over four days. This consisted of two days' training on the CAPS-A tool followed by a third day, making independent ratings and transcriptions on ten new cases which had been previously recorded during routine audit data collection. This task was repeated on day 4, a minimum of one month later. Ratings were made using the CAPS-A record form with the CAPS-A definition table. An analysis was made of the speech and language therapists' CAPS-A ratings at occasion 1 and occasion 2 and the intra- and inter-rater reliability calculated. Trained therapists showed consistency in individual judgements on specific sections of the tool. Intraclass correlation coefficients were calculated for each section with good agreement on eight of 13 sections. There were only fair levels of agreement on anterior oral cleft speech characteristics, non-cleft errors/immaturities and voice. This was explained, at least in part, by their low prevalence which affects the calculation of the intraclass correlation coefficient statistic. Speech and language therapists benefited from training on the CAPS-A, focusing on specific aspects of speech using definitions of parameters and scalar points, in order to apply the tool systematically and reliably. Ratings are enhanced by ensuring a high degree of attention to the nature of the data, standardizing the speech sample, data acquisition, the listening process together with the use of high-quality recording and playback equipment. In addition, a method is proposed for maintaining listening skills following training as part of an individual's continuing education.
The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

PubMed Central

Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

2010-01-01

In a sample of 46 children aged 4 to 7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants’ speech, prosody, and voice were compared with data from 40 typically-developing children, 13 preschool children with Speech Delay, and 15 participants aged 5 to 49 years with CAS in neurogenetic disorders. Speech Delay and Speech Errors, respectively, were modestly and substantially more prevalent in participants with ASD than reported population estimates. Double dissociations in speech, prosody, and voice impairments in ASD were interpreted as consistent with a speech attunement framework, rather than with the motor speech impairments that define CAS. Key Words: apraxia, dyspraxia, motor speech disorder, speech sound disorder PMID:20972615

Speech sound classification and detection of articulation disorders with support vector machines and wavelets.

PubMed

Georgoulas, George; Georgopoulos, Voula C; Stylios, Chrysostomos D

2006-01-01

This paper proposes a novel integrated methodology to extract features and classify speech sounds with intent to detect the possible existence of a speech articulation disorder in a speaker. Articulation, in effect, is the specific and characteristic way that an individual produces the speech sounds. A methodology to process the speech signal, extract features and finally classify the signal and detect articulation problems in a speaker is presented. The use of support vector machines (SVMs), for the classification of speech sounds and detection of articulation disorders is introduced. The proposed method is implemented on a data set where different sets of features and different schemes of SVMs are tested leading to satisfactory performance.
Characteristics of speaking style and implications for speech recognition.

PubMed

Shinozaki, Takahiro; Ostendorf, Mari; Atlas, Les

2009-09-01

Differences in speaking style are associated with more or less spectral variability, as well as different modulation characteristics. The greater variation in some styles (e.g., spontaneous speech and infant-directed speech) poses challenges for recognition but possibly also opportunities for learning more robust models, as evidenced by prior work and motivated by child language acquisition studies. In order to investigate this possibility, this work proposes a new method for characterizing speaking style (the modulation spectrum), examines spontaneous, read, adult-directed, and infant-directed styles in this space, and conducts pilot experiments in style detection and sampling for improved speech recognizer training. Speaking style classification is improved by using the modulation spectrum in combination with standard pitch and energy variation. Speech recognition experiments on a small vocabulary conversational speech recognition task show that sampling methods for training with a small amount of data benefit from the new features.
A characterization of verb use in Turkish agrammatic narrative speech.

PubMed

Arslan, Seçkin; Bamyacı, Elif; Bastiaanse, Roelien

2016-01-01

This study investigates the characteristics of narrative-speech production and the use of verbs in Turkish agrammatic speakers (n = 10) compared to non-brain-damaged controls (n = 10). To elicit narrative-speech samples, personal interviews and storytelling tasks were conducted. Turkish has a large and regular verb inflection paradigm where verbs are inflected for evidentiality (i.e. direct versus indirect evidence available to the speaker). Particularly, we explored the general characteristics of the speech samples (e.g. utterance length) and the uses of lexical, finite and non-finite verbs and direct and indirect evidentials. The results show that speech rate is slow, verbs per utterance are lower than normal and the verb diversity is reduced in the agrammatic speakers. Verb inflection is relatively intact; however, a trade-off pattern between inflection for direct evidentials and verb diversity is found. The implications of the data are discussed in connection with narrative-speech production studies on other languages.
System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi

DOEpatents

Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

2006-04-25

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.
Speech Analysis and Synthesis and Man-Machine Speech Communications for Air Operations. (Synthese et Analyse de la Parole et Liaisons Vocales Homme- Machine dans les Operations Aeriennes)

DTIC Science & Technology

1990-05-01

speech produced by these systems. Finally, perhaps the greatest recent impetus in advancing digital Finally, in the area of speech and speaker recognitio ...XX) Ilz and logarithmic beyond I(XX) Hz (91. ts(n) *n) n)mW0) SWS BNLP LOGO *) -KQfl1 BANoPASS FILTER LOWPASS FILTER 0 fLi fHl f 0 fLP f FIgure 2
Classification of Parkinson's disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples.

PubMed

Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang

2016-11-16

The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Speech comprehension and emotional/behavioral problems in children with specific language impairment (SLI).

PubMed

Gregl, Ana; Kirigin, Marin; Bilać, Snjeiana; Sućeska Ligutić, Radojka; Jaksić, Nenad; Jakovljević, Miro

2014-09-01

This research aims to investigate differences in speech comprehension between children with specific language impairment (SLI) and their developmentally normal peers, and the relationship between speech comprehension and emotional/behavioral problems on Achenbach's Child Behavior Checklist (CBCL) and Caregiver Teacher's Report Form (C-TRF) according to the DSMIV The clinical sample comprised 97preschool children with SLI, while the peer sample comprised 60 developmentally normal preschool children. Children with SLI had significant delays in speech comprehension and more emotional/behavioral problems than peers. In children with SLI, speech comprehension significantly correlated with scores on Attention Deficit/Hyperactivity Problems (CBCL and C-TRF), and Pervasive Developmental Problems scales (CBCL)(p<0.05). In the peer sample, speech comprehension significantly correlated with scores on Affective Problems and Attention Deficit/Hyperactivity Problems (C-TRF) scales. Regression analysis showed that 12.8% of variance in speech comprehension is saturated with 5 CBCL variables, of which Attention Deficit/Hyperactivity (beta = -0.281) and Pervasive Developmental Problems (beta = -0.280) are statistically significant (p < 0.05). In the reduced regression model Attention Deficit/Hyperactivity explains 7.3% of the variance in speech comprehension, (beta = -0.270, p < 0.01). It is possible that, to a certain degree, the same neurodevelopmental process lies in the background of problems with speech comprehension, problems with attention and hyperactivity, and pervasive developmental problems. This study confirms the importance of triage for behavioral problems and attention training in the rehabilitation of children with SLI and children with normal language development that exhibit ADHD symptoms.
The Hypothesis of Apraxia of Speech in Children with Autism Spectrum Disorder

ERIC Educational Resources Information Center

Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.

2011-01-01

In a sample of 46 children aged 4-7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants' speech, prosody, and voice were compared with data from 40 typically-developing children, 13…
The Effect of Background Noise on Intelligibility of Dysphonic Speech

ERIC Educational Resources Information Center

Ishikawa, Keiko; Boyce, Suzanne; Kelchner, Lisa; Powell, Maria Golla; Schieve, Heidi; de Alarcon, Alessandro; Khosla, Sid

2017-01-01

Purpose: The aim of this study is to determine the effect of background noise on the intelligibility of dysphonic speech and to examine the relationship between intelligibility in noise and an acoustic measure of dysphonia--cepstral peak prominence (CPP). Method: A study of speech perception was conducted using speech samples from 6 adult speakers…
Automatic Method of Pause Measurement for Normal and Dysarthric Speech

ERIC Educational Resources Information Center

Rosen, Kristin; Murdoch, Bruce; Folker, Joanne; Vogel, Adam; Cahill, Louise; Delatycki, Martin; Corben, Louise

2010-01-01

This study proposes an automatic method for the detection of pauses and identification of pause types in conversational speech for the purpose of measuring the effects of Friedreich's Ataxia (FRDA) on speech. Speech samples of [approximately] 3 minutes were recorded from 13 speakers with FRDA and 18 healthy controls. Pauses were measured from the…
Autonomic and Emotional Responses of Graduate Student Clinicians in Speech-Language Pathology to Stuttered Speech

ERIC Educational Resources Information Center

Guntupalli, Vijaya K.; Nanjundeswaran, Chayadevie; Dayalu, Vikram N.; Kalinowski, Joseph

2012-01-01

Background: Fluent speakers and people who stutter manifest alterations in autonomic and emotional responses as they view stuttered relative to fluent speech samples. These reactions are indicative of an aroused autonomic state and are hypothesized to be triggered by the abrupt breakdown in fluency exemplified in stuttered speech. Furthermore,…
The minor third communicates sadness in speech, mirroring its use in music.

PubMed

Curtis, Meagan E; Bharucha, Jamshed J

2010-06-01

There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.
Acoustics of Clear and Noise-Adapted Speech in Children, Young, and Older Adults

ERIC Educational Resources Information Center

Smiljanic, Rajka; Gilbert, Rachael C.

2017-01-01

Purpose: This study investigated acoustic-phonetic modifications produced in noise-adapted speech (NAS) and clear speech (CS) by children, young adults, and older adults. Method: Ten children (11-13 years of age), 10 young adults (18-29 years of age), and 10 older adults (60-84 years of age) read sentences in conversational and clear speaking…
Pragmatic Difficulties in the Production of the Speech Act of Apology by Iraqi EFL Learners

ERIC Educational Resources Information Center

Al-Ghazalli, Mehdi Falih; Al-Shammary, Mohanad A. Amert

2014-01-01

The purpose of this paper is to investigate the pragmatic difficulties encountered by Iraqi EFL university students in producing the speech act of apology. Although the act of apology is easy to recognize or use by native speakers of English, non-native speakers generally encounter difficulties in discriminating one speech act from another. The…
Pausing Preceding and Following "Que" in the Production of Native Speakers of French

ERIC Educational Resources Information Center

Genc, Bilal; Mavasoglu, Mustafa; Bada, Erdogan

2011-01-01

Pausing strategies in read and spontaneous speech have been of significant interest for researchers since in literature it was observed that read speech and spontaneous speech pausing patterns do display some considerable differences. This, at least, is the case in the English language as it was produced by native speakers. As to what may be the…
Team Training through Communications Control

DTIC Science & Technology

1982-02-01

training * operational environment * team training research issues * training approach * team communications * models of operator beharior e...on the market soon, it certainly would be investigated carefully for its applicability to the team training problem. ce A text-to-speech voice...generation system. Votrax has recently marketed such a device, and others may soon follow suit. ’ d. A speech replay system designed to produce speech from
Talker Differences in Clear and Conversational Speech: Vowel Intelligibility for Older Adults with Hearing Loss

ERIC Educational Resources Information Center

Ferguson, Sarah Hargus

2012-01-01

Purpose: To establish the range of talker variability for vowel intelligibility in clear versus conversational speech for older adults with hearing loss and to determine whether talkers who produced a clear speech benefit for young listeners with normal hearing also did so for older adults with hearing loss. Method: Clear and conversational vowels…
A Pilot Study on the Ability of Young Children and Adults to Identify and Reproduce Novel Speech Sounds.

ERIC Educational Resources Information Center

Yeni-Komshian, Grace; And Others

This study was designed to compare children and adults on their initial ability to identify and reproduce novel speech sounds and to evaluate their performance after receiving several training sessions in producing these sounds. The novel speech sounds used were two voiceless fricatives which are consonant phonemes in Arabic but which are…
Time-Compressed Speech as an Educational Medium: Studies of Stimulus Characteristics and Individual Differences. Final Report.

ERIC Educational Resources Information Center

Friedman, Herbert L.; Johnson, Raymond L.

Research in training subjects to comprehend compressed speech has led to deeper studies of basic listening skills. The connected discourse is produced by a technique which deletes segments of the speech record and joins the remainder together without pitch distortion. The two problems dealt with were the sources of individual differences in the…
HUMAN SPEECH: A RESTRICTED USE OF THE MAMMALIAN LARYNX

PubMed Central

Titze, Ingo R.

2016-01-01

Purpose Speech has been hailed as unique to human evolution. While the inventory of distinct sounds producible with vocal tract articulators is a great advantage in human oral communication, it is argued here that the larynx as a sound source in speech is limited in its range and capability because a low fundamental frequency is ideal for phonemic intelligibility and source-filter independence. Method Four existing data sets were combined to make an argument regarding exclusive use of the larynx for speech: (1) range of fundamental frequency, (2) laryngeal muscle activation, (3) vocal fold length in relation to sarcomere length of the major laryngeal muscles, and (4) vocal fold morphological development. Results Limited data support the notion that speech tends to produce a contracture of the larynx. The morphological design of the human vocal folds, like that of primates and other mammals, is optimized for vocal communication over distances for which higher fundamental frequency, higher intensity, and fewer unvoiced segments are utilized than in conversational speech. Conclusion The positive message is that raising one’s voice to call, shout, or sing, or executing pitch glides to stretch the vocal folds, can counteract this trend toward a contracted state. PMID:27397113

When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

PubMed

Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

2017-11-01

Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
A variable rate speech compressor for mobile applications

NASA Technical Reports Server (NTRS)

Yeldener, S.; Kondoz, A. M.; Evans, B. G.

1990-01-01

One of the most promising speech coder at the bit rate of 9.6 to 4.8 kbits/s is CELP. Code Excited Linear Prediction (CELP) has been dominating 9.6 to 4.8 kbits/s region during the past 3 to 4 years. Its set back however, is its expensive implementation. As an alternative to CELP, the Base-Band CELP (CELP-BB) was developed which produced good quality speech comparable to CELP and a single chip implementable complexity as reported previously. Its robustness was also improved to tolerate errors up to 1.0 pct. and maintain intelligibility up to 5.0 pct. and more. Although, CELP-BB produces good quality speech at around 4.8 kbits/s, it has a fundamental problem when updating the pitch filter memory. A sub-optimal solution is proposed for this problem. Below 4.8 kbits/s, however, CELP-BB suffers from noticeable quantization noise as a result of the large vector dimensions used. Efficient representation of speech below 4.8 kbits/s is reported by introducing Sinusoidal Transform Coding (STC) to represent the LPC excitation which is called Sine Wave Excited LPC (SWELP). In this case, natural sounding good quality synthetic speech is obtained at around 2.4 kbits/s.
Use of Language Sample Analysis by School-Based SLPs: Results of a Nationwide Survey

ERIC Educational Resources Information Center

Pavelko, Stacey L.; Owens, Robert E., Jr.; Ireland, Marie; Hahs-Vaughn, Debbie L.

2016-01-01

Purpose: This article examines use of language sample analysis (LSA) by school-based speech-language pathologists (SLPs), including characteristics of language samples, methods of transcription and analysis, barriers to LSA use, and factors affecting LSA use, such as American Speech-Language-Hearing Association certification, number of years'…
Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

NASA Astrophysics Data System (ADS)

Kayasith, Prakasith; Theeramunkong, Thanaruk

It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Perceptual sensitivity to spectral properties of earlier sounds during speech categorization.

PubMed

Stilp, Christian E; Assgari, Ashley A

2018-02-28

Speech perception is heavily influenced by surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, this can produce spectral contrast effects (SCEs) that bias perception of later sounds. For example, when context sounds have more energy in low-F 1 frequency regions, listeners report more high-F 1 responses to a target vowel, and vice versa. SCEs have been reported using various approaches for a wide range of stimuli, but most often, large spectral peaks were added to the context to bias speech categorization. This obscures the lower limit of perceptual sensitivity to spectral properties of earlier sounds, i.e., when SCEs begin to bias speech categorization. Listeners categorized vowels (/ɪ/-/ɛ/, Experiment 1) or consonants (/d/-/g/, Experiment 2) following a context sentence with little spectral amplification (+1 to +4 dB) in frequency regions known to produce SCEs. In both experiments, +3 and +4 dB amplification in key frequency regions of the context produced SCEs, but lesser amplification was insufficient to bias performance. This establishes a lower limit of perceptual sensitivity where spectral differences across sounds can bias subsequent speech categorization. These results are consistent with proposed adaptation-based mechanisms that potentially underlie SCEs in auditory perception. Recent sounds can change what speech sounds we hear later. This can occur when the average frequency composition of earlier sounds differs from that of later sounds, biasing how they are perceived. These "spectral contrast effects" are widely observed when sounds' frequency compositions differ substantially. We reveal the lower limit of these effects, as +3 dB amplification of key frequency regions in earlier sounds was enough to bias categorization of the following vowel or consonant sound. Speech categorization being biased by very small spectral differences across sounds suggests that spectral contrast effects occur frequently in everyday speech perception.
English speech acquisition in 3- to 5-year-old children learning Russian and English.

PubMed

Gildersleeve-Neumann, Christina E; Wright, Kira L

2010-10-01

English speech acquisition in Russian-English (RE) bilingual children was investigated, exploring the effects of Russian phonetic and phonological properties on English single-word productions. Russian has more complex consonants and clusters and a smaller vowel inventory than English. One hundred thirty-seven single-word samples were phonetically transcribed from 14 RE and 28 English-only (E) children, ages 3;3 (years;months) to 5;7. Language and age differences were compared descriptively for phonetic inventories. Multivariate analyses compared phoneme accuracy and error rates between the two language groups. RE children produced Russian-influenced phones in English, including palatalized consonants and trills, and demonstrated significantly higher rates of trill substitution, final devoicing, and vowel errors than E children, suggesting Russian language effects on English. RE and E children did not differ in their overall production complexity, with similar final consonant deletion and cluster reduction error rates, similar phonetic inventories by age, and similar levels of phonetic complexity. Both older language groups were more accurate than the younger language groups. We observed effects of Russian on English speech acquisition; however, there were similarities between the RE and E children that have not been reported in previous studies of speech acquisition in bilingual children. These findings underscore the importance of knowing the phonological properties of both languages of a bilingual child in assessment.
Perceptual, auditory and acoustic vocal analysis of speech and singing in choir conductors.

PubMed

Rehder, Maria Inês Beltrati Cornacchioni; Behlau, Mara

2008-01-01

the voice of choir conductors. to evaluate the vocal quality of choir conductors based on the production of a sustained vowel during singing and when speaking in order to observe auditory and acoustic differences. participants of this study were 100 choir conductors, with an equal distribution between genders. Participants were asked to produce the sustained vowel "é" using a singing and speaking voice. Speech samples were analyzed based on auditory-perceptive and acoustic parameters. The auditory-perceptive analysis was carried out by two speech-language pathologist, specialists in this field of knowledge. The acoustic analysis was carried out with the support of the computer software Doctor Speech (Tiger Electronics, SRD, USA, version 4.0), using the Real Analysis module. the auditory-perceptive analysis of the vocal quality indicated that most conductors have adapted voices, presenting more alterations in their speaking voice. The acoustic analysis indicated different values between genders and between the different production modalities. The fundamental frequency was higher in the singing voice, as well as the values for the first formant; the second formant presented lower values in the singing voice, with statistically significant results only for women. the voice of choir conductors is adapted, presenting fewer deviations in the singing voice when compared to the speaking voice. Productions differ based the voice modality, singing or speaking.
A fast and flexible MRI system for the study of dynamic vocal tract shaping.

PubMed

Lingala, Sajan Goud; Zhu, Yinghua; Kim, Yoon-Chul; Toutios, Asterios; Narayanan, Shrikanth; Nayak, Krishna S

2017-01-01

The aim of this work was to develop and evaluate an MRI-based system for study of dynamic vocal tract shaping during speech production, which provides high spatial and temporal resolution. The proposed system utilizes (a) custom eight-channel upper airway coils that have high sensitivity to upper airway regions of interest, (b) two-dimensional golden angle spiral gradient echo acquisition, (c) on-the-fly view-sharing reconstruction, and (d) off-line temporal finite difference constrained reconstruction. The system also provides simultaneous noise-cancelled and temporally aligned audio. The system is evaluated in 3 healthy volunteers, and 1 tongue cancer patient, with a broad range of speech tasks. We report spatiotemporal resolutions of 2.4 × 2.4 mm 2 every 12 ms for single-slice imaging, and 2.4 × 2.4 mm 2 every 36 ms for three-slice imaging, which reflects roughly 7-fold acceleration over Nyquist sampling. This system demonstrates improved temporal fidelity in capturing rapid vocal tract shaping for tasks, such as producing consonant clusters in speech, and beat-boxing sounds. Novel acoustic-articulatory analysis was also demonstrated. A synergistic combination of custom coils, spiral acquisitions, and constrained reconstruction enables visualization of rapid speech with high spatiotemporal resolution in multiple planes. Magn Reson Med 77:112-125, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Long-Term Trajectories of the Development of Speech Sound Production in Pediatric Cochlear Implant Recipients

PubMed Central

Tomblin, J. Bruce; Peng, Shu-Chen; Spencer, Linda J.; Lu, Nelson

2011-01-01

Purpose This study characterized the development of speech sound production in prelingually deaf children with a minimum of 8 years of cochlear implant (CI) experience. Method Twenty-seven pediatric CI recipients' spontaneous speech samples from annual evaluation sessions were phonemically transcribed. Accuracy for these speech samples was evaluated in piecewise regression models. Results As a group, pediatric CI recipients showed steady improvement in speech sound production following implantation, but the improvement rate declined after 6 years of device experience. Piecewise regression models indicated that the slope estimating the participants' improvement rate was statistically greater than 0 during the first 6 years postimplantation, but not after 6 years. The group of pediatric CI recipients' accuracy of speech sound production after 4 years of device experience reasonably predicts their speech sound production after 5–10 years of device experience. Conclusions The development of speech sound production in prelingually deaf children stabilizes after 6 years of device experience, and typically approaches a plateau by 8 years of device use. Early growth in speech before 4 years of device experience did not predict later rates of growth or levels of achievement. However, good predictions could be made after 4 years of device use. PMID:18695018
The influence of speaking rate on nasality in the speech of hearing-impaired individuals.

PubMed

Dwyer, Claire H; Robb, Michael P; O'Beirne, Greg A; Gilbert, Harvey R

2009-10-01

The purpose of this study was to determine whether deliberate increases in speaking rate would serve to decrease the amount of nasality in the speech of severely hearing-impaired individuals. The participants were 11 severely to profoundly hearing-impaired students, ranging in age from 12 to 19 years (M = 16 years). Each participant provided a baseline speech sample (R1) followed by 3 training sessions during which participants were trained to increase their speaking rate. Following the training sessions, a second speech sample was obtained (R2). Acoustic and perceptual analyses of the speech samples obtained at R1 and R2 were undertaken. The acoustic analysis focused on changes in first (F(1)) and second (F(2)) formant frequency and formant bandwidths. The perceptual analysis involved listener ratings of the speech samples (at R1 and R2) for perceived nasality. Findings indicated a significant increase in speaking rate at R2. In addition, significantly narrower F(2) bandwidth and lower perceptual rating scores of nasality were obtained at R2 across all participants, suggesting a decrease in nasality as speaking rate increases. The nasality demonstrated by hearing-impaired individuals is amenable to change when speaking rate is increased. The influences of speaking rate changes on the perception and production of nasality in hearing-impaired individuals are discussed.
Investigation of Preservice Teachers' Speech Anxiety with Different Points of View

ERIC Educational Resources Information Center

Kana, Fatih

2015-01-01

The purpose of this study is to find out the level of speech anxiety of last year students at Education Faculties and the effects of speech anxiety. For this purpose, speech anxiety inventory was delivered to 540 pre-service teachers at 2013-2014 academic year using stratified sampling method. Relational screening model was used in the study. To…
Speech Abilities in Preschool Children with Speech Sound Disorder with and without Co-Occurring Language Impairment

ERIC Educational Resources Information Center

Macrae, Toby; Tyler, Ann A.

2014-01-01

Purpose: The authors compared preschool children with co-occurring speech sound disorder (SSD) and language impairment (LI) to children with SSD only in their numbers and types of speech sound errors. Method: In this post hoc quasi-experimental study, independent samples t tests were used to compare the groups in the standard score from different…
The Prevalence of Speech and Language Disorders in French-Speaking Preschool Children From Yaoundé (Cameroon).

PubMed

Tchoungui Oyono, Lilly; Pascoe, Michelle; Singh, Shajila

2018-05-17

The purpose of this study was to determine the prevalence of speech and language disorders in French-speaking preschool-age children in Yaoundé, the capital city of Cameroon. A total of 460 participants aged 3-5 years were recruited from the 7 communes of Yaoundé using a 2-stage cluster sampling method. Speech and language assessment was undertaken using a standardized speech and language test, the Evaluation du Langage Oral (Khomsi, 2001), which was purposefully renormed on the sample. A predetermined cutoff of 2 SDs below the normative mean was applied to identify articulation, expressive language, and receptive language disorders. Fluency and voice disorders were identified using clinical judgment by a speech-language pathologist. Overall prevalence was calculated as follows: speech disorders, 14.7%; language disorders, 4.3%; and speech and language disorders, 17.1%. In terms of disorders, prevalence findings were as follows: articulation disorders, 3.6%; expressive language disorders, 1.3%; receptive language disorders, 3%; fluency disorders, 8.4%; and voice disorders, 3.6%. Prevalence figures are higher than those reported for other countries and emphasize the urgent need to develop speech and language services for the Cameroonian population.
On the Perception of Speech Sounds as Biologically Significant Signals1,2

PubMed Central

Pisoni, David B.

2012-01-01

This paper reviews some of the major evidence and arguments currently available to support the view that human speech perception may require the use of specialized neural mechanisms for perceptual analysis. Experiments using synthetically produced speech signals with adults are briefly summarized and extensions of these results to infants and other organisms are reviewed with an emphasis towards detailing those aspects of speech perception that may require some need for specialized species-specific processors. Finally, some comments on the role of early experience in perceptual development are provided as an attempt to identify promising areas of new research in speech perception. PMID:399200
Visual Feedback of Tongue Movement for Novel Speech Sound Learning

PubMed Central

Katz, William F.; Mehta, Sonya

2015-01-01

Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one's own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker's learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA) was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ/; a voiced, coronal, palatal stop) before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers' productions were evaluated using kinematic (tongue-tip spatial positioning) and acoustic (burst spectra) measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing. PMID:26635571
Audiovisual cues and perceptual learning of spectrally distorted speech.

PubMed

Pilling, Michael; Thomas, Sharon

2011-12-01

Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties of a cochlear implant with a 6 mm place mismatch: Experiment I found that participants showed significantly greater improvement in perceiving noise-vocoded speech when training gave AV cues than when it gave auditory cues alone. Experiment 2 compared training with AV cues with training which gave written feedback. These two methods did not significantly differ in the pattern of training they produced. Suggestions are made about the types of circumstances in which the two training methods might be found to differ in facilitating auditory perceptual learning of speech.
Effect of hearing loss on semantic access by auditory and audiovisual speech in children.

PubMed

Jerger, Susan; Tye-Murray, Nancy; Damian, Markus F; Abdi, Hervé

2013-01-01

This research studied whether the mode of input (auditory versus audiovisual) influenced semantic access by speech in children with sensorineural hearing impairment (HI). Participants, 31 children with HI and 62 children with normal hearing (NH), were tested with the authors' new multimodal picture word task. Children were instructed to name pictures displayed on a monitor and ignore auditory or audiovisual speech distractors. The semantic content of the distractors was varied to be related versus unrelated to the pictures (e.g., picture distractor of dog-bear versus dog-cheese, respectively). In children with NH, picture-naming times were slower in the presence of semantically related distractors. This slowing, called semantic interference, is attributed to the meaning-related picture-distractor entries competing for selection and control of the response (the lexical selection by competition hypothesis). Recently, a modification of the lexical selection by competition hypothesis, called the competition threshold (CT) hypothesis, proposed that (1) the competition between the picture-distractor entries is determined by a threshold, and (2) distractors with experimentally reduced fidelity cannot reach the CT. Thus, semantically related distractors with reduced fidelity do not produce the normal interference effect, but instead no effect or semantic facilitation (faster picture naming times for semantically related versus unrelated distractors). Facilitation occurs because the activation level of the semantically related distractor with reduced fidelity (1) is not sufficient to exceed the CT and produce interference but (2) is sufficient to activate its concept, which then strengthens the activation of the picture and facilitates naming. This research investigated whether the proposals of the CT hypothesis generalize to the auditory domain, to the natural degradation of speech due to HI, and to participants who are children. Our multimodal picture word task allowed us to (1) quantify picture naming results in the presence of auditory speech distractors and (2) probe whether the addition of visual speech enriched the fidelity of the auditory input sufficiently to influence results. In the HI group, the auditory distractors produced no effect or a facilitative effect, in agreement with proposals of the CT hypothesis. In contrast, the audiovisual distractors produced the normal semantic interference effect. Results in the HI versus NH groups differed significantly for the auditory mode, but not for the audiovisual mode. This research indicates that the lower fidelity auditory speech associated with HI affects the normalcy of semantic access by children. Further, adding visual speech enriches the lower fidelity auditory input sufficiently to produce the semantic interference effect typical of children with NH.
Speech rate and fluency in children with phonological disorder.

PubMed

Novaes, Priscila Maronezi; Nicolielo-Carrilho, Ana Paola; Lopes-Herrera, Simone Aparecida

2015-01-01

To identify and describe the speech rate and fluency of children with phonological disorder (PD) with and without speech-language therapy. Thirty children, aged 5-8 years old, both genders, were divided into three groups: experimental group 1 (G1) — 10 children with PD in intervention; experimental group 2 (G2) — 10 children with PD without intervention; and control group (CG) — 10 children with typical development. Speech samples were collected and analyzed according to parameters of specific protocol. The children in CG had higher number of words per minute compared to those in G1, which, in turn, performed better in this aspect compared to children in G2. Regarding the number of syllables per minute, the CG showed the best result. In this aspect, the children in G1 showed better results than those in G2. Comparing children's performance in the assessed groups regarding the tests, those with PD in intervention had higher time of speech sample and adequate speech rate, which may be indicative of greater auditory monitoring of their own speech as a result of the intervention.
Influence of auditory fatigue on masked speech intelligibility

NASA Technical Reports Server (NTRS)

Parker, D. E.; Martens, W. L.; Johnston, P. A.

1980-01-01

Intelligibility of PB word lists embedded in simultaneous masking noise was evaluated before and after fatiguing-noise exposure, which was determined by observing the number of words correctly repeated during a shadowing task. Both the speech signal and the masking noise were filtered to a 2825-3185-Hz band. Masking-noise leves were varied from 0- to 90-dB SL. Fatigue was produced by a 1500-3000-Hz octave band of noise at 115 dB (re 20 micron-Pa) presented continuously for 5 min. The results of three experiments indicated that speed intelligibility was reduced when the speech was presented against a background of silence but that the fatiguing-noise exposure had no effect on intelligibility when the speech was made more intense and embedded in masking noise of 40-90-dB SL. These observations are interpreted by considering the recruitment produced by fatigue and masking noise.
Brief Report: Predicting Inner Speech Use amongst Children with Autism Spectrum Disorder (ASD)--The Roles of Verbal Ability and Cognitive Profile

ERIC Educational Resources Information Center

Williams, David M.; Jarrold, Christopher

2010-01-01

Studies of inner speech use in ASD have produced conflicting results. Lidstone et al., J "Autism Dev Disord" (2009) hypothesised that Cognitive Profile (i.e., "discrepancy" between non-verbal and verbal abilities) is a predictor of inner speech use amongst children with ASD. They suggested other, contradictory results might be explained in terms…

Development and Perceptual Evaluation of Amplitude-Based F0 Control in Electrolarynx Speech

ERIC Educational Resources Information Center

Saikachi, Yoko; Stevens, Kenneth N.; Hillman, Robert E.

2009-01-01

Purpose: Current electrolarynx (EL) devices produce a mechanical speech quality that has been largely attributed to the lack of natural fundamental frequency (F0) variation. In order to improve the quality of EL speech, in the present study the authors aimed to develop and evaluate an automatic F0 control scheme, in which F0 was modulated based on…
Voice-stress measure of mental workload

NASA Technical Reports Server (NTRS)

Alpert, Murray; Schneider, Sid J.

1988-01-01

In a planned experiment, male subjects between the age of 18 and 50 will be required to produce speech while performing various tasks. Analysis of the speech produced should reveal which aspects of voice prosody are associated with increased workloads. Preliminary results with two female subjects suggest a possible trend for voice frequency and amplitude to be higher and the variance of the voice frequency to be lower in the high workload condition.
Assessing Disfluencies in School-Age Children Who Stutter: How Much Speech Is Enough?

ERIC Educational Resources Information Center

Gregg, Brent A.; Sawyer, Jean

2015-01-01

The question of what size speech sample is sufficient to accurately identify stuttering and its myriad characteristics is a valid one. Short samples have a risk of over- or underrepresenting disfluency types or characteristics. In recent years, there has been a trend toward using shorter samples because they are less time-consuming for…
Population responses in primary auditory cortex simultaneously represent the temporal envelope and periodicity features in natural speech.

PubMed

Abrams, Daniel A; Nicol, Trent; White-Schwoch, Travis; Zecker, Steven; Kraus, Nina

2017-05-01

Speech perception relies on a listener's ability to simultaneously resolve multiple temporal features in the speech signal. Little is known regarding neural mechanisms that enable the simultaneous coding of concurrent temporal features in speech. Here we show that two categories of temporal features in speech, the low-frequency speech envelope and periodicity cues, are processed by distinct neural mechanisms within the same population of cortical neurons. We measured population activity in primary auditory cortex of anesthetized guinea pig in response to three variants of a naturally produced sentence. Results show that the envelope of population responses closely tracks the speech envelope, and this cortical activity more closely reflects wider bandwidths of the speech envelope compared to narrow bands. Additionally, neuronal populations represent the fundamental frequency of speech robustly with phase-locked responses. Importantly, these two temporal features of speech are simultaneously observed within neuronal ensembles in auditory cortex in response to clear, conversation, and compressed speech exemplars. Results show that auditory cortical neurons are adept at simultaneously resolving multiple temporal features in extended speech sentences using discrete coding mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.
Cognitive Load in Voice Therapy Carry-Over Exercises.

PubMed

Iwarsson, Jenny; Morris, David Jackson; Balling, Laura Winther

2017-01-01

The cognitive load generated by online speech production may vary with the nature of the speech task. This article examines 3 speech tasks used in voice therapy carry-over exercises, in which a patient is required to adopt and automatize new voice behaviors, ultimately in daily spontaneous communication. Twelve subjects produced speech in 3 conditions: rote speech (weekdays), sentences in a set form, and semispontaneous speech. Subjects simultaneously performed a secondary visual discrimination task for which response times were measured. On completion of each speech task, subjects rated their experience on a questionnaire. Response times from the secondary, visual task were found to be shortest for the rote speech, longer for the semispontaneous speech, and longest for the sentences within the set framework. Principal components derived from the subjective ratings were found to be linked to response times on the secondary visual task. Acoustic measures reflecting fundamental frequency distribution and vocal fold compression varied across the speech tasks. The results indicate that consideration should be given to the selection of speech tasks during the process leading to automation of revised speech behavior and that self-reports may be a reliable index of cognitive load.
Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain

PubMed Central

Gross, Joachim; Hoogenboom, Nienke; Thut, Gregor; Schyns, Philippe; Panzeri, Stefano; Belin, Pascal; Garrod, Simon

2013-01-01

Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations. PMID:24391472
Standardization of pitch-range settings in voice acoustic analysis.

PubMed

Vogel, Adam P; Maruff, Paul; Snyder, Peter J; Mundt, James C

2009-05-01

Voice acoustic analysis is typically a labor-intensive, time-consuming process that requires the application of idiosyncratic parameters tailored to individual aspects of the speech signal. Such processes limit the efficiency and utility of voice analysis in clinical practice as well as in applied research and development. In the present study, we analyzed 1,120 voice files, using standard techniques (case-by-case hand analysis), taking roughly 10 work weeks of personnel time to complete. The results were compared with the analytic output of several automated analysis scripts that made use of preset pitch-range parameters. After pitch windows were selected to appropriately account for sex differences, the automated analysis scripts reduced processing time of the 1,120 speech samples to less than 2.5 h and produced results comparable to those obtained with hand analysis. However, caution should be exercised when applying the suggested preset values to pathological voice populations.
Modifying Speech to Children based on their Perceived Phonetic Accuracy

PubMed Central

Julien, Hannah M.; Munson, Benjamin

2014-01-01

Purpose We examined the relationship between adults' perception of the accuracy of children's speech, and acoustic detail in their subsequent productions to children. Methods Twenty-two adults participated in a task in which they rated the accuracy of 2- and 3-year-old children's word-initial /s/and /∫/ using a visual analog scale (VAS), then produced a token of the same word as if they were responding to the child whose speech they had just rated. Result The duration of adults' fricatives varied as a function of their perception of the accuracy of children's speech: longer fricatives were produced following productions that they rated as inaccurate. This tendency to modify duration in response to perceived inaccurate tokens was mediated by measures of self-reported experience interacting with children. However, speakers did not increase the spectral distinctiveness of their fricatives following the perception of inaccurate tokens. Conclusion These results suggest that adults modify temporal features of their speech in response to perceiving children's inaccurate productions. These longer fricatives are potentially both enhanced input to children, and an error-corrective signal. PMID:22744140
Sensory-Motor Networks Involved in Speech Production and Motor Control: An fMRI Study

PubMed Central

Behroozmand, Roozbeh; Shebek, Rachel; Hansen, Daniel R.; Oya, Hiroyuki; Robin, Donald A.; Howard, Matthew A.; Greenlee, Jeremy D.W.

2015-01-01

Speaking is one of the most complex motor behaviors developed to facilitate human communication. The underlying neural mechanisms of speech involve sensory-motor interactions that incorporate feedback information for online monitoring and control of produced speech sounds. In the present study, we adopted an auditory feedback pitch perturbation paradigm and combined it with functional magnetic resonance imaging (fMRI) recordings in order to identify brain areas involved in speech production and motor control. Subjects underwent fMRI scanning while they produced a steady vowel sound /a/ (speaking) or listened to the playback of their own vowel production (playback). During each condition, the auditory feedback from vowel production was either normal (no perturbation) or perturbed by an upward (+600 cents) pitch shift stimulus randomly. Analysis of BOLD responses during speaking (with and without shift) vs. rest revealed activation of a complex network including bilateral superior temporal gyrus (STG), Heschl's gyrus, precentral gyrus, supplementary motor area (SMA), Rolandic operculum, postcentral gyrus and right inferior frontal gyrus (IFG). Performance correlation analysis showed that the subjects produced compensatory vocal responses that significantly correlated with BOLD response increases in bilateral STG and left precentral gyrus. However, during playback, the activation network was limited to cortical auditory areas including bilateral STG and Heschl's gyrus. Moreover, the contrast between speaking vs. playback highlighted a distinct functional network that included bilateral precentral gyrus, SMA, IFG, postcentral gyrus and insula. These findings suggest that speech motor control involves feedback error detection in sensory (e.g. auditory) cortices that subsequently activate motor-related areas for the adjustment of speech parameters during speaking. PMID:25623499
Analysis of facial motion patterns during speech using a matrix factorization algorithm

PubMed Central

Lucero, Jorge C.; Munhall, Kevin G.

2008-01-01

This paper presents an analysis of facial motion during speech to identify linearly independent kinematic regions. The data consists of three-dimensional displacement records of a set of markers located on a subject’s face while producing speech. A QR factorization with column pivoting algorithm selects a subset of markers with independent motion patterns. The subset is used as a basis to fit the motion of the other facial markers, which determines facial regions of influence of each of the linearly independent markers. Those regions constitute kinematic “eigenregions” whose combined motion produces the total motion of the face. Facial animations may be generated by driving the independent markers with collected displacement records. PMID:19062866
Building an Interdepartmental Major in Speech Communication.

ERIC Educational Resources Information Center

Litterst, Judith K.

This paper describes a popular and innovative major program of study in speech communication at St. Cloud University in Minnesota: the Speech Communication Interdepartmental Major. The paper provides background on the program, discusses overall program requirements, presents sample student options, identifies ingredients for program success,…
Auditory color constancy: calibration to reliable spectral properties across nonspeech context and targets.

PubMed

Stilp, Christian E; Alexander, Joshua M; Kiefte, Michael; Kluender, Keith R

2010-02-01

Brief experience with reliable spectral characteristics of a listening context can markedly alter perception of subsequent speech sounds, and parallels have been drawn between auditory compensation for listening context and visual color constancy. In order to better evaluate such an analogy, the generality of acoustic context effects for sounds with spectral-temporal compositions distinct from speech was investigated. Listeners identified nonspeech sounds-extensively edited samples produced by a French horn and a tenor saxophone-following either resynthesized speech or a short passage of music. Preceding contexts were "colored" by spectral envelope difference filters, which were created to emphasize differences between French horn and saxophone spectra. Listeners were more likely to report hearing a saxophone when the stimulus followed a context filtered to emphasize spectral characteristics of the French horn, and vice versa. Despite clear changes in apparent acoustic source, the auditory system calibrated to relatively predictable spectral characteristics of filtered context, differentially affecting perception of subsequent target nonspeech sounds. This calibration to listening context and relative indifference to acoustic sources operates much like visual color constancy, for which reliable properties of the spectrum of illumination are factored out of perception of color.
Describing Phonological Paraphasias in Three Variants of Primary Progressive Aphasia.

PubMed

Dalton, Sarah Grace Hudspeth; Shultz, Christine; Henry, Maya L; Hillis, Argye E; Richardson, Jessica D

2018-03-01

The purpose of this study was to describe the linguistic environment of phonological paraphasias in 3 variants of primary progressive aphasia (semantic, logopenic, and nonfluent) and to describe the profiles of paraphasia production for each of these variants. Discourse samples of 26 individuals diagnosed with primary progressive aphasia were investigated for phonological paraphasias using the criteria established for the Philadelphia Naming Test (Moss Rehabilitation Research Institute, 2013). Phonological paraphasias were coded for paraphasia type, part of speech of the target word, target word frequency, type of segment in error, word position of consonant errors, type of error, and degree of change in consonant errors. Eighteen individuals across the 3 variants produced phonological paraphasias. Most paraphasias were nonword, followed by formal, and then mixed, with errors primarily occurring on nouns and verbs, with relatively few on function words. Most errors were substitutions, followed by addition and deletion errors, and few sequencing errors. Errors were evenly distributed across vowels, consonant singletons, and clusters, with more errors occurring in initial and medial positions of words than in the final position of words. Most consonant errors consisted of only a single-feature change, with few 2- or 3-feature changes. Importantly, paraphasia productions by variant differed from these aggregate results, with unique production patterns for each variant. These results suggest that a system where paraphasias are coded as present versus absent may be insufficient to adequately distinguish between the 3 subtypes of PPA. The 3 variants demonstrate patterns that may be used to improve phenotyping and diagnostic sensitivity. These results should be integrated with recent findings on phonological processing and speech rate. Future research should attempt to replicate these results in a larger sample of participants with longer speech samples and varied elicitation tasks. https://doi.org/10.23641/asha.5558107.
Automatic initial and final segmentation in cleft palate speech of Mandarin speakers

PubMed Central

Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

2017-01-01

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with “quasi-unvoiced” or with “quasi-voiced” initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%. PMID:28926572
Automatic initial and final segmentation in cleft palate speech of Mandarin speakers.

PubMed

He, Ling; Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

2017-01-01

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with "quasi-unvoiced" or with "quasi-voiced" initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%.
Auditory Selective Attention to Speech Modulates Activity in the Visual Word Form Area

PubMed Central

Yoncheva, Yuliya N.; Zevin, Jason D.; Maurer, Urs

2010-01-01

Selective attention to speech versus nonspeech signals in complex auditory input could produce top-down modulation of cortical regions previously linked to perception of spoken, and even visual, words. To isolate such top-down attentional effects, we contrasted 2 equally challenging active listening tasks, performed on the same complex auditory stimuli (words overlaid with a series of 3 tones). Instructions required selectively attending to either the speech signals (in service of rhyme judgment) or the melodic signals (tone-triplet matching). Selective attention to speech, relative to attention to melody, was associated with blood oxygenation level–dependent (BOLD) increases during functional magnetic resonance imaging (fMRI) in left inferior frontal gyrus, temporal regions, and the visual word form area (VWFA). Further investigation of the activity in visual regions revealed overall deactivation relative to baseline rest for both attention conditions. Topographic analysis demonstrated that while attending to melody drove deactivation equivalently across all fusiform regions of interest examined, attending to speech produced a regionally specific modulation: deactivation of all fusiform regions, except the VWFA. Results indicate that selective attention to speech can topographically tune extrastriate cortex, leading to increased activity in VWFA relative to surrounding regions, in line with the well-established connectivity between areas related to spoken and visual word perception in skilled readers. PMID:19571269
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers.

PubMed

Liu, Fang; Chan, Alice H D; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C M

2016-07-01

This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production.
Pitch perception and production in congenital amusia: Evidence from Cantonese speakers

PubMed Central

Liu, Fang; Chan, Alice H. D.; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C. M.

2016-01-01

This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production. PMID:27475178
Suprasegmental Characteristics of Spontaneous Speech Produced in Good and Challenging Communicative Conditions by Talkers Aged 9-14 Years.

PubMed

Hazan, Valerie; Tuomainen, Outi; Pettinato, Michèle

2016-12-01

This study investigated the acoustic characteristics of spontaneous speech by talkers aged 9-14 years and their ability to adapt these characteristics to maintain effective communication when intelligibility was artificially degraded for their interlocutor. Recordings were made for 96 children (50 female participants, 46 male participants) engaged in a problem-solving task with a same-sex friend; recordings for 20 adults were used as reference. The task was carried out in good listening conditions (normal transmission) and in degraded transmission conditions. Articulation rate, median fundamental frequency (f0), f0 range, and relative energy in the 1- to 3-kHz range were analyzed. With increasing age, children significantly reduced their median f0 and f0 range, became faster talkers, and reduced their mid-frequency energy in spontaneous speech. Children produced similar clear speech adaptations (in degraded transmission conditions) as adults, but only children aged 11-14 years increased their f0 range, an unhelpful strategy not transmitted via the vocoder. Changes made by children were consistent with a general increase in vocal effort. Further developments in speech production take place during later childhood. Children use clear speech strategies to benefit an interlocutor facing intelligibility problems but may not be able to attune these strategies to the same degree as adults.
Longitudinal development of communication in children with cerebral palsy between 24 and 53 months: Predicting speech outcomes.

PubMed

Hustad, Katherine C; Allison, Kristen M; Sakash, Ashley; McFadd, Emily; Broman, Aimee Teo; Rathouz, Paul J

2017-08-01

To determine whether communication at 2 years predicted communication at 4 years in children with cerebral palsy (CP); and whether the age a child first produces words imitatively predicts change in speech production. 30 children (15 males) with CP participated and were seen 5 times at 6-month intervals between 24 and 53 months (mean age at time 1 = 26.9 months (SD 1.9)). Variables were communication classification at 24 and 53 months, age that children were first able to produce words imitatively, single-word intelligibility, and longest utterance produced. Communication at 24 months was highly predictive of abilities at 53 months. Speaking earlier led to faster gains in intelligibility and length of utterance and better outcomes at 53 months than speaking later. Inability to speak at 24 months indicates greater speech and language difficulty at 53 months and a strong need for early communication intervention.

Echolalic and Spontaneous Phrase Speech in Autistic Children.

ERIC Educational Resources Information Center

Howlin, Patricia

1982-01-01

Investigates the syntactical level of spontaneous and echolalic utterances of 26 autistic boys at different stages of phrase speech development. Speech samples were collected over a 90-minute period in unstructured settings in participants' homes. Imitations were not deliberately elicited, and only unprompted, noncommunicative echoes were…
Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

PubMed

Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

2017-09-18

Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p < .01) and with mean length of utterance produced during a written picture description (r = .96, p < .01). Correlations between inner speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
A new method to sample stuttering in preschool children.

PubMed

O'Brian, Sue; Jones, Mark; Pilowsky, Rachel; Onslow, Mark; Packman, Ann; Menzies, Ross

2010-06-01

This study reports a new method for sampling the speech of preschool stuttering children outside the clinic environment. Twenty parents engaged their stuttering children in an everyday play activity in the home with a telephone handset nearby. A remotely located researcher telephoned the parent and recorded the play session with a phone-recording jack attached to a digital audio recorder at the remote location. The parent placed an audio recorder near the child for comparison purposes. Children as young as 2 years complied with the remote method of speech sampling. The quality of the remote recordings was superior to that of the in-home recordings. There was no difference in means or reliability of stutter-count measures made from the remote recordings compared with those made in-home. Advantages of the new method include: (1) cost efficiency of real-time measurement of percent syllables stuttered in naturalistic situations, (2) reduction of bias associated with parent-selected timing of home recordings, (3) standardization of speech sampling procedures, (4) improved parent compliance with sampling procedures, (5) clinician or researcher on-line control of the acoustic and linguistic quality of recordings, and (6) elimination of the need to lend equipment to parents for speech sampling.
Phonation offset in tracheoesophageal speech.

PubMed

Searl, Jeff; Ousley, Teri

2004-01-01

Tracheoesophageal (TE) speakers often have difficulty producing the voiced-voiceless distinction. Phonation offset (POff) as a TE speaker transitions from a vowel to a stop consonant may be altered, possibly contributing to listener misperceptions. The purposes of this study were to: (1) compare the duration of POff in TE versus laryngeal speakers, and (2) compare POff between TE productions that were accurately versus inaccurately perceived. Phonation offset and offset duration as a proportion of the stop gap (%POff) were greater for the TE versus the laryngeal samples. There was no difference in POff or %POff when comparing accurately to inaccurately perceived TE samples. Tracheoesophageal speakers may have less ability to halt neoglottal vibration compared to laryngeal speakers' ability to stop glottal vibration. Comparable POff for accurately and inaccurately perceived TE samples suggests that POff may not be a particularly salient acoustic feature to the voicing distinction, at least for stop consonants. (1) As a result of this activity, participants will be able to describe what phonation offset is relative to the voicing distinction. (2) As a result of this activity, participants will be able to describe phonation offset in tracheoesophageal speakers relative to laryngeal speakers. (3) As a result of this activity, participants will be able to describe whether phonation offset in tracheoesophageal speech has perceptual saliency for listeners.
Some articulatory details of emotional speech

NASA Astrophysics Data System (ADS)

Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

2005-09-01

Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.
Irregular vocal fold dynamics incited by asymmetric fluid loading in a model of recurrent laryngeal nerve paralysis

NASA Astrophysics Data System (ADS)

Sommer, David; Erath, Byron D.; Zanartu, Matias; Peterson, Sean D.

2011-11-01

Voiced speech is produced by dynamic fluid-structure interactions in the larynx. Traditionally, reduced order models of speech have relied upon simplified inviscid flow solvers to prescribe the fluid loadings that drive vocal fold motion, neglecting viscous flow effects that occur naturally in voiced speech. Viscous phenomena, such as skewing of the intraglottal jet, have the most pronounced effect on voiced speech in cases of vocal fold paralysis where one vocal fold loses some, or all, muscular control. The impact of asymmetric intraglottal flow in pathological speech is captured in a reduced order two-mass model of speech by coupling a boundary-layer estimation of the asymmetric pressures with asymmetric tissue parameters that are representative of recurrent laryngeal nerve paralysis. Nonlinear analysis identifies the emergence of irregular and chaotic vocal fold dynamics at values representative of pathological speech conditions.
Children's perception of their synthetically corrected speech production.

PubMed

Strömbergsson, Sofia; Wengelin, Asa; House, David

2014-06-01

We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.
Intelligibility of Clear Speech: Effect of Instruction

ERIC Educational Resources Information Center

Lam, Jennifer; Tjaden, Kris

2013-01-01

Purpose: The authors investigated how clear speech instructions influence sentence intelligibility. Method: Twelve speakers produced sentences in habitual, clear, hearing impaired, and overenunciate conditions. Stimuli were amplitude normalized and mixed with multitalker babble for orthographic transcription by 40 listeners. The main analysis…
Prosodic Contrasts in Ironic Speech

ERIC Educational Resources Information Center

Bryant, Gregory A.

2010-01-01

Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…
Discourse Analysis of the Political Speeches of the Ousted Arab Presidents during the Arab Spring Revolution Using Halliday and Hasan's Framework of Cohesion

ERIC Educational Resources Information Center

Al-Majali, Wala'

2015-01-01

This study is designed to explore the salient linguistic features of the political speeches of the ousted Arab presidents during the Arab Spring Revolution. The sample of the study is composed of seven political speeches delivered by the ousted Arab presidents during the period from December 2010 to December 2012. Three speeches were delivered by…
The Prompt Book for...Teaching the Art of Speech and Drama To Children: A Resource Guide for Teachers of Children in the Art of Speech and Drama.

ERIC Educational Resources Information Center

Dugger, Anita; And Others

Providing for individual differences in ability, interest, and cultural values among students, this guide contains resources, goals, objectives, sample lesson plans, and activities for teaching speech and drama to elementary school students. The first section of the guide offers advice on the organization of a speech arts curriculum, approaches to…
A social feedback loop for speech development and its reduction in autism

PubMed Central

Warlaumont, Anne S.; Richards, Jeffrey A.; Gilkerson, Jill; Oller, D. Kimbrough

2014-01-01

We analyze the microstructure of child-adult interaction during naturalistic, daylong, automatically labeled audio recordings (13,836 hours total) of children (8- to 48-month-olds) with and without autism. We find that adult responses are more likely when child vocalizations are speech-related. In turn, a child vocalization is more likely to be speech-related if the previous speech-related child vocalization received an immediate adult response. Taken together, these results are consistent with the idea that there is a social feedback loop between child and caregiver that promotes speech-language development. Although this feedback loop applies in both typical development and autism, children with autism produce proportionally fewer speech-related vocalizations and the responses they receive are less contingent on whether their vocalizations are speech-related. We argue that such differences will diminish the strength of the social feedback loop with cascading effects on speech development over time. Differences related to socioeconomic status are also reported. PMID:24840717
Low-dimensional recurrent neural network-based Kalman filter for speech enhancement.

PubMed

Xia, Youshen; Wang, Jun

2015-07-01

This paper proposes a new recurrent neural network-based Kalman filter for speech enhancement, based on a noise-constrained least squares estimate. The parameters of speech signal modeled as autoregressive process are first estimated by using the proposed recurrent neural network and the speech signal is then recovered from Kalman filtering. The proposed recurrent neural network is globally asymptomatically stable to the noise-constrained estimate. Because the noise-constrained estimate has a robust performance against non-Gaussian noise, the proposed recurrent neural network-based speech enhancement algorithm can minimize the estimation error of Kalman filter parameters in non-Gaussian noise. Furthermore, having a low-dimensional model feature, the proposed neural network-based speech enhancement algorithm has a much faster speed than two existing recurrent neural networks-based speech enhancement algorithms. Simulation results show that the proposed recurrent neural network-based speech enhancement algorithm can produce a good performance with fast computation and noise reduction. Copyright © 2015 Elsevier Ltd. All rights reserved.
Speech Intelligibility in Severe Adductor Spasmodic Dysphonia

ERIC Educational Resources Information Center

Bender, Brenda K.; Cannito, Michael P.; Murry, Thomas; Woodson, Gayle E.

2004-01-01

This study compared speech intelligibility in nondisabled speakers and speakers with adductor spasmodic dysphonia (ADSD) before and after botulinum toxin (Botox) injection. Standard speech samples were obtained from 10 speakers diagnosed with severe ADSD prior to and 1 month following Botox injection, as well as from 10 age- and gender-matched…
Listeners' Perceptions of Speech and Language Disorders

ERIC Educational Resources Information Center

Allard, Emily R.; Williams, Dale F.

2008-01-01

Using semantic differential scales with nine trait pairs, 445 adults rated five audio-taped speech samples, one depicting an individual without a disorder and four portraying communication disorders. Statistical analyses indicated that the no disorder sample was rated higher with respect to the trait of employability than were the articulation,…
Giving speech a hand: gesture modulates activity in auditory cortex during speech perception.

PubMed

Hubbard, Amy L; Wilson, Stephen M; Callan, Daniel E; Dapretto, Mirella

2009-03-01

Viewing hand gestures during face-to-face communication affects speech perception and comprehension. Despite the visible role played by gesture in social interactions, relatively little is known about how the brain integrates hand gestures with co-occurring speech. Here we used functional magnetic resonance imaging (fMRI) and an ecologically valid paradigm to investigate how beat gesture-a fundamental type of hand gesture that marks speech prosody-might impact speech perception at the neural level. Subjects underwent fMRI while listening to spontaneously-produced speech accompanied by beat gesture, nonsense hand movement, or a still body; as additional control conditions, subjects also viewed beat gesture, nonsense hand movement, or a still body all presented without speech. Validating behavioral evidence that gesture affects speech perception, bilateral nonprimary auditory cortex showed greater activity when speech was accompanied by beat gesture than when speech was presented alone. Further, the left superior temporal gyrus/sulcus showed stronger activity when speech was accompanied by beat gesture than when speech was accompanied by nonsense hand movement. Finally, the right planum temporale was identified as a putative multisensory integration site for beat gesture and speech (i.e., here activity in response to speech accompanied by beat gesture was greater than the summed responses to speech alone and beat gesture alone), indicating that this area may be pivotally involved in synthesizing the rhythmic aspects of both speech and gesture. Taken together, these findings suggest a common neural substrate for processing speech and gesture, likely reflecting their joint communicative role in social interactions.
Giving Speech a Hand: Gesture Modulates Activity in Auditory Cortex During Speech Perception

PubMed Central

Hubbard, Amy L.; Wilson, Stephen M.; Callan, Daniel E.; Dapretto, Mirella

2008-01-01

Viewing hand gestures during face-to-face communication affects speech perception and comprehension. Despite the visible role played by gesture in social interactions, relatively little is known about how the brain integrates hand gestures with co-occurring speech. Here we used functional magnetic resonance imaging (fMRI) and an ecologically valid paradigm to investigate how beat gesture – a fundamental type of hand gesture that marks speech prosody – might impact speech perception at the neural level. Subjects underwent fMRI while listening to spontaneously-produced speech accompanied by beat gesture, nonsense hand movement, or a still body; as additional control conditions, subjects also viewed beat gesture, nonsense hand movement, or a still body all presented without speech. Validating behavioral evidence that gesture affects speech perception, bilateral nonprimary auditory cortex showed greater activity when speech was accompanied by beat gesture than when speech was presented alone. Further, the left superior temporal gyrus/sulcus showed stronger activity when speech was accompanied by beat gesture than when speech was accompanied by nonsense hand movement. Finally, the right planum temporale was identified as a putative multisensory integration site for beat gesture and speech (i.e., here activity in response to speech accompanied by beat gesture was greater than the summed responses to speech alone and beat gesture alone), indicating that this area may be pivotally involved in synthesizing the rhythmic aspects of both speech and gesture. Taken together, these findings suggest a common neural substrate for processing speech and gesture, likely reflecting their joint communicative role in social interactions. PMID:18412134
Fluency variation in adolescents.

PubMed

Furquim de Andrade, Claudia Regina; de Oliveira Martins, Vanessa

2007-10-01

The Speech Fluency Profile of fluent adolescent speakers of Brazilian Portuguese, were examined with respect to gender and neurolinguistic variations. Speech samples of 130 male and female adolescents, aged between 12;0 and 17;11 years were gathered. They were analysed according to type of speech disruption; speech rate; and frequency of speech disruptions. Statistical analysis did not find significant differences between genders for the variables studied. However, regarding the phases of adolescence (early: 12;0-14;11 years; late: 15;0-17;11 years), statistical differences were observed for all of the variables. As for neurolinguistic maturation, a decrease in the number of speech disruptions and an increase in speech rate occurred during the final phase of adolescence, indicating that the maturation of the motor and linguistic processes exerted an influence over the fluency profile of speech.
Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging.

PubMed

Hagedorn, Christina; Proctor, Michael; Goldstein, Louis; Wilson, Stephen M; Miller, Bruce; Gorno-Tempini, Maria Luisa; Narayanan, Shrikanth S

2017-04-14

Real-time magnetic resonance imaging (MRI) and accompanying analytical methods are shown to capture and quantify salient aspects of apraxic speech, substantiating and expanding upon evidence provided by clinical observation and acoustic and kinematic data. Analysis of apraxic speech errors within a dynamic systems framework is provided and the nature of pathomechanisms of apraxic speech discussed. One adult male speaker with apraxia of speech was imaged using real-time MRI while producing spontaneous speech, repeated naming tasks, and self-paced repetition of word pairs designed to elicit speech errors. Articulatory data were analyzed, and speech errors were detected using time series reflecting articulatory activity in regions of interest. Real-time MRI captured two types of apraxic gestural intrusion errors in a word pair repetition task. Gestural intrusion errors in nonrepetitive speech, multiple silent initiation gestures at the onset of speech, and covert (unphonated) articulation of entire monosyllabic words were also captured. Real-time MRI and accompanying analytical methods capture and quantify many features of apraxic speech that have been previously observed using other modalities while offering high spatial resolution. This patient's apraxia of speech affected the ability to select only the appropriate vocal tract gestures for a target utterance, suppressing others, and to coordinate them in time.
A social feedback loop for speech development and its reduction in autism.

PubMed

Warlaumont, Anne S; Richards, Jeffrey A; Gilkerson, Jill; Oller, D Kimbrough

2014-07-01

We analyzed the microstructure of child-adult interaction during naturalistic, daylong, automatically labeled audio recordings (13,836 hr total) of children (8- to 48-month-olds) with and without autism. We found that an adult was more likely to respond when the child's vocalization was speech related rather than not speech related. In turn, a child's vocalization was more likely to be speech related if the child's previous speech-related vocalization had received an immediate adult response rather than no response. Taken together, these results are consistent with the idea that there is a social feedback loop between child and caregiver that promotes speech development. Although this feedback loop applies in both typical development and autism, children with autism produced proportionally fewer speech-related vocalizations, and the responses they received were less contingent on whether their vocalizations were speech related. We argue that such differences will diminish the strength of the social feedback loop and have cascading effects on speech development over time. Differences related to socioeconomic status are also reported. © The Author(s) 2014.

Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech.

PubMed

Gerratt, Bruce R; Kreiman, Jody; Garellek, Marc

2016-10-01

The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.
Impairments of speech fluency in Lewy body spectrum disorder.

PubMed

Ash, Sharon; McMillan, Corey; Gross, Rachel G; Cook, Philip; Gunawardena, Delani; Morgan, Brianna; Boller, Ashley; Siderowf, Andrew; Grossman, Murray

2012-03-01

Few studies have examined connected speech in demented and non-demented patients with Parkinson's disease (PD). We assessed the speech production of 35 patients with Lewy body spectrum disorder (LBSD), including non-demented PD patients, patients with PD dementia (PDD), and patients with dementia with Lewy bodies (DLB), in a semi-structured narrative speech sample in order to characterize impairments of speech fluency and to determine the factors contributing to reduced speech fluency in these patients. Both demented and non-demented PD patients exhibited reduced speech fluency, characterized by reduced overall speech rate and long pauses between sentences. Reduced speech rate in LBSD correlated with measures of between-utterance pauses, executive functioning, and grammatical comprehension. Regression analyses related non-fluent speech, grammatical difficulty, and executive difficulty to atrophy in frontal brain regions. These findings indicate that multiple factors contribute to slowed speech in LBSD, and this is mediated in part by disease in frontal brain regions. Copyright Â© 2011 Elsevier Inc. All rights reserved.
On the context-dependent nature of the contribution of the ventral premotor cortex to speech perception

PubMed Central

Tremblay, Pascale; Small, Steven L.

2011-01-01

What is the nature of the interface between speech perception and production, where auditory and motor representations converge? One set of explanations suggests that during perception, the motor circuits involved in producing a perceived action are in some way enacting the action without actually causing movement (covert simulation) or sending along the motor information to be used to predict its sensory consequences (i.e., efference copy). Other accounts either reject entirely the involvement of motor representations in perception, or explain their role as being more supportive than integral, and not employing the identical circuits used in production. Using fMRI, we investigated whether there are brain regions that are conjointly active for both speech perception and production, and whether these regions are sensitive to articulatory (syllabic) complexity during both processes, which is predicted by a covert simulation account. A group of healthy young adults (1) observed a female speaker produce a set of familiar words (perception), and (2) observed and then repeated the words (production). There were two types of words, varying in articulatory complexity, as measured by the presence or absence of consonant clusters. The simple words contained no consonant cluster (e.g. “palace”), while the complex words contained one to three consonant clusters (e.g. “planet”). Results indicate that the left ventral premotor cortex (PMv) was significantly active during speech perception and speech production but that activation in this region was scaled to articulatory complexity only during speech production, revealing an incompletely specified efferent motor signal during speech perception. The right planum temporal (PT) was also active during speech perception and speech production, and activation in this region was scaled to articulatory complexity during both production and perception. These findings are discussed in the context of current theories theory of speech perception, with particular attention to accounts that include an explanatory role for mirror neurons. PMID:21664275
45 CFR 1308.9 - Eligibility criteria: Speech or language impairments.

Code of Federal Regulations, 2010 CFR

2010-10-01

... HUMAN DEVELOPMENT SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES THE ADMINISTRATION FOR CHILDREN... language impairments. (a) A speech or language impairment means a communication disorder such as stuttering... language disorder may be characterized by difficulty in understanding and producing language, including...
Revisiting speech rate and utterance length manipulations in stuttering speakers.

PubMed

Blomgren, Michael; Goberman, Alexander M

2008-01-01

The goal of this study was to evaluate stuttering frequency across a multidimensional (2x2) hierarchy of speech performance tasks. Specifically, this study examined the interaction between changes in length of utterance and levels of speech rate stability. Forty-four adult male speakers participated in the study (22 stuttering speakers and 22 non-stuttering speakers). Participants were audio and video recorded while producing a spontaneous speech task and four different experimental speaking tasks. The four experimental speaking tasks involved reading a list of 45 words and a list 45 phrases two times each. One reading of each list involved speaking at a steady habitual rate (habitual rate tasks) and another reading involved producing each list at a variable speaking rate (variable rate tasks). For the variable rate tasks, participants were directed to produce words or phrases at randomly ordered slow, habitual, and fast rates. The stuttering speakers exhibited significantly more stuttering on the variable rate tasks than on the habitual rate tasks. In addition, the stuttering speakers exhibited significantly more stuttering on the first word of the phrase length tasks compared to the single word tasks. Overall, the results indicated that varying levels of both utterance length and temporal complexity function to modulate stuttering frequency in adult stuttering speakers. Discussion focuses on issues of speech performance according to stuttering severity and possible clinical implications. The reader will learn about and be able to: (1) describe the mediating effects of length of utterance and speech rate on the frequency of stuttering in stuttering speakers; (2) understand the rationale behind multidimensional skill performance matrices; and (3) describe possible applications of motor skill performance matrices to stuttering therapy.
Sensory-motor networks involved in speech production and motor control: an fMRI study.

PubMed

Behroozmand, Roozbeh; Shebek, Rachel; Hansen, Daniel R; Oya, Hiroyuki; Robin, Donald A; Howard, Matthew A; Greenlee, Jeremy D W

2015-04-01

Speaking is one of the most complex motor behaviors developed to facilitate human communication. The underlying neural mechanisms of speech involve sensory-motor interactions that incorporate feedback information for online monitoring and control of produced speech sounds. In the present study, we adopted an auditory feedback pitch perturbation paradigm and combined it with functional magnetic resonance imaging (fMRI) recordings in order to identify brain areas involved in speech production and motor control. Subjects underwent fMRI scanning while they produced a steady vowel sound /a/ (speaking) or listened to the playback of their own vowel production (playback). During each condition, the auditory feedback from vowel production was either normal (no perturbation) or perturbed by an upward (+600 cents) pitch-shift stimulus randomly. Analysis of BOLD responses during speaking (with and without shift) vs. rest revealed activation of a complex network including bilateral superior temporal gyrus (STG), Heschl's gyrus, precentral gyrus, supplementary motor area (SMA), Rolandic operculum, postcentral gyrus and right inferior frontal gyrus (IFG). Performance correlation analysis showed that the subjects produced compensatory vocal responses that significantly correlated with BOLD response increases in bilateral STG and left precentral gyrus. However, during playback, the activation network was limited to cortical auditory areas including bilateral STG and Heschl's gyrus. Moreover, the contrast between speaking vs. playback highlighted a distinct functional network that included bilateral precentral gyrus, SMA, IFG, postcentral gyrus and insula. These findings suggest that speech motor control involves feedback error detection in sensory (e.g. auditory) cortices that subsequently activate motor-related areas for the adjustment of speech parameters during speaking. Copyright © 2015 Elsevier Inc. All rights reserved.
White noise speech illusion and psychosis expression: An experimental investigation of psychosis liability.

PubMed

Pries, Lotta-Katrin; Guloksuz, Sinan; Menne-Lothmann, Claudia; Decoster, Jeroen; van Winkel, Ruud; Collip, Dina; Delespaul, Philippe; De Hert, Marc; Derom, Catherine; Thiery, Evert; Jacobs, Nele; Wichers, Marieke; Simons, Claudia J P; Rutten, Bart P F; van Os, Jim

2017-01-01

An association between white noise speech illusion and psychotic symptoms has been reported in patients and their relatives. This supports the theory that bottom-up and top-down perceptual processes are involved in the mechanisms underlying perceptual abnormalities. However, findings in nonclinical populations have been conflicting. The aim of this study was to examine the association between white noise speech illusion and subclinical expression of psychotic symptoms in a nonclinical sample. Findings were compared to previous results to investigate potential methodology dependent differences. In a general population adolescent and young adult twin sample (n = 704), the association between white noise speech illusion and subclinical psychotic experiences, using the Structured Interview for Schizotypy-Revised (SIS-R) and the Community Assessment of Psychic Experiences (CAPE), was analyzed using multilevel logistic regression analyses. Perception of any white noise speech illusion was not associated with either positive or negative schizotypy in the general population twin sample, using the method by Galdos et al. (2011) (positive: ORadjusted: 0.82, 95% CI: 0.6-1.12, p = 0.217; negative: ORadjusted: 0.75, 95% CI: 0.56-1.02, p = 0.065) and the method by Catalan et al. (2014) (positive: ORadjusted: 1.11, 95% CI: 0.79-1.57, p = 0.557). No association was found between CAPE scores and speech illusion (ORadjusted: 1.25, 95% CI: 0.88-1.79, p = 0.220). For the Catalan et al. (2014) but not the Galdos et al. (2011) method, a negative association was apparent between positive schizotypy and speech illusion with positive or negative affective valence (ORadjusted: 0.44, 95% CI: 0.24-0.81, p = 0.008). Contrary to findings in clinical populations, white noise speech illusion may not be associated with psychosis proneness in nonclinical populations.
Automated recognition of helium speech. Phase I: Investigation of microprocessor based analysis/synthesis system

NASA Astrophysics Data System (ADS)

Jelinek, H. J.

1986-01-01

This is the Final Report of Electronic Design Associates on its Phase I SBIR project. The purpose of this project is to develop a method for correcting helium speech, as experienced in diver-surface communication. The goal of the Phase I study was to design, prototype, and evaluate a real time helium speech corrector system based upon digital signal processing techniques. The general approach was to develop hardware (an IBM PC board) to digitize helium speech and software (a LAMBDA computer based simulation) to translate the speech. As planned in the study proposal, this initial prototype may now be used to assess expected performance from a self contained real time system which uses an identical algorithm. The Final Report details the work carried out to produce the prototype system. Four major project tasks were: a signal processing scheme for converting helium speech to normal sounding speech was generated. The signal processing scheme was simulated on a general purpose (LAMDA) computer. Actual helium speech was supplied to the simulation and the converted speech was generated. An IBM-PC based 14 bit data Input/Output board was designed and built. A bibliography of references on speech processing was generated.
Infants’ brain responses to speech suggest Analysis by Synthesis

PubMed Central

Kuhl, Patricia K.; Ramírez, Rey R.; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

2014-01-01

Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners’ knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca’s area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding. PMID:25024207
Infants' brain responses to speech suggest analysis by synthesis.

PubMed

Kuhl, Patricia K; Ramírez, Rey R; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

2014-08-05

Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners' knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca's area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of "motherese" on early language learning, and (iii) the "social-gating" hypothesis and humans' development of social understanding.
Speech-language pathology program for reading comprehension and orthography: effects on the spelling of dyslexic individuals.

PubMed

Nogueira, Débora Manzano; Cárnio, Maria Silvia

2018-01-01

Purpose Prepare a Speech-language Pathology Program for Reading Comprehension and Orthography and verify its effects on the reading comprehension and spelling of students with Developmental Dyslexia. Methods The study sample was composed of eleven individuals (eight males), diagnosed with Developmental Dyslexia, aged 09-11 years. All participants underwent a Speech-language Pathology Program in Reading Comprehension and Orthography comprising 16 individual weekly sessions. In each session, tasks of reading comprehension of texts and orthography were developed. At the beginning and end of the Program, the participants were submitted to a specific assessment (pre- and post-test). Results The individuals presented difficulty in reading comprehension, but the Cloze technique proved to be a useful remediation tool, and significant improvement in their performance was observed in the post-test evaluation. The dyslexic individuals showed poor performance for their educational level in the spelling assessment. At the end of the program, their performance evolved, but it remained below the expected, showing the same error pattern at the pre- and post-tests, with errors in both natural and arbitrary spelling. Conclusion The proposed Speech-language Pathology Program for Reading Comprehension and Orthography produced positive effects on the reading comprehension, spelling, and motivation to reading and writing of the participants. This study presents an unprecedented contribution by proposing joint stimulation of reading and writing by means of a program easy to apply and analyze in individuals with Developmental Dyslexia.
Measuring what matters: Effectively predicting language and literacy in children with cochlear implants

PubMed Central

Nittrouer, Susan; Caldwell, Amanda; Holloman, Christopher

2012-01-01

Objective To evaluate how well various language measures typically used with very young children after they receive cochlear implants predict language and literacy skills as they enter school. Methods Subjects were 50 children who had just completed kindergarten and were 6 or 7 years of age. All had previously participated in a longitudinal study from 12 to 48 months of age. 27 children had severe-to-profound hearing loss and wore cochlear implants, 8 had moderate hearing loss and wore hearing aids, and 15 had normal hearing. A latent variable of language/literacy skill was constructed from scores on six kinds of measures: (1) language comprehension; (2) expressive vocabulary; (3) phonological awareness; (4) literacy; (5) narrative skill; and (6) processing speed. Five kinds of language measures obtained at six-month intervals from 12 to 48 months of age were used as predictor variables in correlational analyses: (1) language comprehension; (2) expressive vocabulary; (3) syntactic structure of productive speech; (4) form and (5) function of language used in language samples. Results Outcomes quantified how much variance in kindergarten language/literacy performance was explained by each predictor variable, at each earlier age of testing. Comprehension measures consistently predicted roughly 25 to 50 percent of the variance in kindergarten language/literacy performance, and were the only effective predictors before 24 months of age. Vocabulary and syntactic complexity were strong predictors after roughly 36 months of age. Amount of speech produced in language samples and number of answers to parental queries explained moderate amounts of variance in performance after 24 months of age. Number of manual gestures and nonspeech vocalizations produced in language samples explained little to no variance before 24 months of age, and after that were negatively correlated with kindergarten performance. The number of imitations produced in language samples at 24 months of age explained about 10 percent of variance in kindergarten performance, but was otherwise not correlated or negatively correlated with kindergarten outcomes. Conclusions Before 24 months of age, the best predictor of later language success is language comprehension. In general, measures that index a child’s cognitive processing of language are the most sensitive predictors of school-age language abilities. PMID:22648088
Intelligibility of foreign-accented speech: Effects of listening condition, listener age, and listener hearing status

NASA Astrophysics Data System (ADS)

Ferguson, Sarah Hargus

2005-09-01

It is well known that, for listeners with normal hearing, speech produced by non-native speakers of the listener's first language is less intelligible than speech produced by native speakers. Intelligibility is well correlated with listener's ratings of talker comprehensibility and accentedness, which have been shown to be related to several talker factors, including age of second language acquisition and level of similarity between the talker's native and second language phoneme inventories. Relatively few studies have focused on factors extrinsic to the talker. The current project explored the effects of listener and environmental factors on the intelligibility of foreign-accented speech. Specifically, monosyllabic English words previously recorded from two talkers, one a native speaker of American English and the other a native speaker of Spanish, were presented to three groups of listeners (young listeners with normal hearing, elderly listeners with normal hearing, and elderly listeners with hearing impairment; n=20 each) in three different listening conditions (undistorted words in quiet, undistorted words in 12-talker babble, and filtered words in quiet). Data analysis will focus on interactions between talker accent, listener age, listener hearing status, and listening condition. [Project supported by American Speech-Language-Hearing Association AARC Award.
Production Variability and Single Word Intelligibility in Aphasia and Apraxia of Speech

ERIC Educational Resources Information Center

Haley, Katarina L.; Martin, Gwenyth

2011-01-01

This study was designed to estimate test-retest reliability of orthographic speech intelligibility testing in speakers with aphasia and AOS and to examine its relationship to the consistency of speaker and listener responses. Monosyllabic single word speech samples were recorded from 13 speakers with coexisting aphasia and AOS. These words were…
Audiovisual Matching in Speech and Nonspeech Sounds: A Neurodynamical Model

ERIC Educational Resources Information Center

Loh, Marco; Schmid, Gabriele; Deco, Gustavo; Ziegler, Wolfram

2010-01-01

Audiovisual speech perception provides an opportunity to investigate the mechanisms underlying multimodal processing. By using nonspeech stimuli, it is possible to investigate the degree to which audiovisual processing is specific to the speech domain. It has been shown in a match-to-sample design that matching across modalities is more difficult…
Speech Recognition for A Digital Video Library.

ERIC Educational Resources Information Center

Witbrock, Michael J.; Hauptmann, Alexander G.

1998-01-01

Production of the meta-data supporting the Informedia Digital Video Library interface is automated using techniques derived from artificial intelligence research. Speech recognition and natural-language processing, information retrieval, and image analysis are applied to produce an interface that helps users locate information and navigate more…
Acute stress reduces speech fluency.

PubMed

Buchanan, Tony W; Laures-Gore, Jacqueline S; Duff, Melissa C

2014-03-01

People often report word-finding difficulties and other language disturbances when put in a stressful situation. There is, however, scant empirical evidence to support the claim that stress affects speech productivity. To address this issue, we measured speech and language variables during a stressful Trier Social Stress Test (TSST) as well as during a less stressful "placebo" TSST (Het et al., 2009). Compared to the non-stressful speech, participants showed higher word productivity during the TSST. By contrast, participants paused more during the stressful TSST, an effect that was especially pronounced in participants who produced a larger cortisol and heart rate response to the stressor. Findings support anecdotal evidence of stress-impaired speech production abilities. Copyright © 2014 Elsevier B.V. All rights reserved.
Perception of speech rhythm in second language: the case of rhythmically similar L1 and L2

PubMed Central

Ordin, Mikhail; Polyanskaya, Leona

2015-01-01

We investigated the perception of developmental changes in timing patterns that happen in the course of second language (L2) acquisition, provided that the native and the target languages of the learner are rhythmically similar (German and English). It was found that speech rhythm in L2 English produced by German learners becomes increasingly stress-timed as acquisition progresses. This development is captured by the tempo-normalized rhythm measures of durational variability. Advanced learners also deliver speech at a faster rate. However, when native speakers have to classify the timing patterns characteristic of L2 English of German learners at different proficiency levels, they attend to speech rate cues and ignore the differences in speech rhythm. PMID:25859228
Acoustic Event Detection and Classification

NASA Astrophysics Data System (ADS)

Temko, Andrey; Nadeu, Climent; Macho, Dušan; Malkin, Robert; Zieger, Christian; Omologo, Maurizio

The human activity that takes place in meeting rooms or classrooms is reflected in a rich variety of acoustic events (AE), produced either by the human body or by objects handled by humans, so the determination of both the identity of sounds and their position in time may help to detect and describe that human activity. Indeed, speech is usually the most informative sound, but other kinds of AEs may also carry useful information, for example, clapping or laughing inside a speech, a strong yawn in the middle of a lecture, a chair moving or a door slam when the meeting has just started. Additionally, detection and classification of sounds other than speech may be useful to enhance the robustness of speech technologies like automatic speech recognition.
Speech transformations based on a sinusoidal representation

NASA Astrophysics Data System (ADS)

Quatieri, T. E.; McAulay, R. J.

1986-05-01

A new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformation including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism that has been shown to produce synthetic speech that preserves the waveform shape and is essentially perceptually indistinguishable from the original. Although the analysis/synthesis system originally was designed for single-speaker signals, it is equally capable of recovering and modifying nonspeech signals such as music; multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.

Decoding spectrotemporal features of overt and covert speech from the human cortex

PubMed Central

Martin, Stéphanie; Brunner, Peter; Holdgraf, Chris; Heinze, Hans-Jochen; Crone, Nathan E.; Rieger, Jochem; Schalk, Gerwin; Knight, Robert T.; Pasley, Brian N.

2014-01-01

Auditory perception and auditory imagery have been shown to activate overlapping brain regions. We hypothesized that these phenomena also share a common underlying neural representation. To assess this, we used electrocorticography intracranial recordings from epileptic patients performing an out loud or a silent reading task. In these tasks, short stories scrolled across a video screen in two conditions: subjects read the same stories both aloud (overt) and silently (covert). In a control condition the subject remained in a resting state. We first built a high gamma (70–150 Hz) neural decoding model to reconstruct spectrotemporal auditory features of self-generated overt speech. We then evaluated whether this same model could reconstruct auditory speech features in the covert speech condition. Two speech models were tested: a spectrogram and a modulation-based feature space. For the overt condition, reconstruction accuracy was evaluated as the correlation between original and predicted speech features, and was significant in each subject (p < 10−5; paired two-sample t-test). For the covert speech condition, dynamic time warping was first used to realign the covert speech reconstruction with the corresponding original speech from the overt condition. Reconstruction accuracy was then evaluated as the correlation between original and reconstructed speech features. Covert reconstruction accuracy was compared to the accuracy obtained from reconstructions in the baseline control condition. Reconstruction accuracy for the covert condition was significantly better than for the control condition (p < 0.005; paired two-sample t-test). The superior temporal gyrus, pre- and post-central gyrus provided the highest reconstruction information. The relationship between overt and covert speech reconstruction depended on anatomy. These results provide evidence that auditory representations of covert speech can be reconstructed from models that are built from an overt speech data set, supporting a partially shared neural substrate. PMID:24904404
High-frequency energy in singing and speech

NASA Astrophysics Data System (ADS)

Monson, Brian Bruce

While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.
Relations between affective music and speech: evidence from dynamics of affective piano performance and speech production.

PubMed

Liu, Xiaoluan; Xu, Yi

2015-01-01

This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role.
Comparing speech and nonspeech context effects across timescales in coarticulatory contexts.

PubMed

Viswanathan, Navin; Kelty-Stephen, Damian G

2018-02-01

Context effects are ubiquitous in speech perception and reflect the ability of human listeners to successfully perceive highly variable speech signals. In the study of how listeners compensate for coarticulatory variability, past studies have used similar effects speech and tone analogues of speech as strong support for speech-neutral, general auditory mechanisms for compensation for coarticulation. In this manuscript, we revisit compensation for coarticulation by replacing standard button-press responses with mouse-tracking responses and examining both standard geometric measures of uncertainty as well as newer information-theoretic measures that separate fast from slow mouse movements. We found that when our analyses were restricted to end-state responses, tones and speech contexts appeared to produce similar effects. However, a more detailed time-course analysis revealed systematic differences between speech and tone contexts such that listeners' responses to speech contexts, but not to tone contexts, changed across the experimental session. Analyses of the time course of effects within trials using mouse tracking indicated that speech contexts elicited fewer x-position flips but more area under the curve (AUC) and maximum deviation (MD), and they did so in the slower portions of mouse-tracking movements. Our results indicate critical differences between the time course of speech and nonspeech context effects and that general auditory explanations, motivated by their apparent similarity, be reexamined.
Relations between affective music and speech: evidence from dynamics of affective piano performance and speech production

PubMed Central

Liu, Xiaoluan; Xu, Yi

2015-01-01

This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role. PMID:26217252
Perception of Filtered Speech by Children with Developmental Dyslexia and Children with Specific Language Impairments

PubMed Central

Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M.; Barnes, Lisa; Fosker, Tim

2016-01-01

Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed. PMID:27303348
Extending and Applying the EPIC Architecture for Human Cognition and Performance: Auditory and Spatial Components

DTIC Science & Technology

2013-03-20

Wakefield of the University of Michigan as Co-PI. This extended activity produced a large number of products and accomplishments; however, this report...speech communication will be expanded to provide a robust modeling and prediction capability for tasks involving speech production and speech and non...preparations made to move to the newer Cocoa API instead of the previous Carbon API. In the following sections, an extended treatment will be
Determining stability in connected speech in primary progressive aphasia and Alzheimer's disease.

PubMed

Beales, Ashleigh; Whitworth, Anne; Cartwright, Jade; Panegyres, Peter K; Kane, Robert T

2018-03-08

Using connected speech to assess progressive language disorders is confounded by uncertainty around whether connected speech is stable over successive sampling, and therefore representative of an individual's performance, and whether some contexts and/or language behaviours show greater stability than others. A repeated measure, within groups, research design was used to investigate stability of a range of behaviours in the connected speech of six individuals with primary progressive aphasia and three individuals with Alzheimer's disease. Stability was evaluated, at a group and individual level, across three samples, collected over 3 weeks, involving everyday monologue, narrative and picture description, and analysed for lexical content, fluency and communicative informativeness and efficiency. Excellent and significant stability was found on the majority of measures, at a group and individual level, across all genres, with isolated measures (e.g. nouns use, communicative efficiency) showing good, but greater variability, within one of the three genres. Findings provide evidence of stability on measures of lexical content, fluency and communicative informativeness and efficiency. While preliminary evidence suggests that task selection is influential when considering stability of particular connected speech measures, replication over a larger sample is necessary to reproduce findings.
"Non-Vocalization": A Phonological Error Process in the Speech of Severely and Profoundly Hearing Impaired Adults, from the Point of View of the Theory of Phonology as Human Behaviour

ERIC Educational Resources Information Center

Halpern, Orly; Tobin, Yishai

2008-01-01

"Non-vocalization" (N-V) is a newly described phonological error process in hearing impaired speakers. In N-V the hearing impaired person actually articulates the phoneme but without producing a voice. The result is an error process looking as if it is produced but sounding as if it is omitted. N-V was discovered by video recording the speech of…
Speech-Enabled Interfaces for Travel Information Systems with Large Grammars

NASA Astrophysics Data System (ADS)

Zhao, Baoli; Allen, Tony; Bargiela, Andrzej

This paper introduces three grammar-segmentation methods capable of handling the large grammar issues associated with producing a real-time speech-enabled VXML bus travel application for London. Large grammars tend to produce relatively slow recognition interfaces and this work shows how this limitation can be successfully addressed. Comparative experimental results show that the novel last-word recognition based grammar segmentation method described here achieves an optimal balance between recognition rate, speed of processing and naturalness of interaction.
Breath-Group Intelligibility in Dysarthria: Characteristics and Underlying Correlates

ERIC Educational Resources Information Center

Yunusova, Yana; Weismer, Gary; Kent, Ray D.; Rusche, Nicole M.

2005-01-01

Purpose: This study was designed to determine whether within-speaker fluctuations in speech intelligibility occurred among speakers with dysarthria who produced a reading passage, and, if they did, whether selected linguistic and acoustic variables predicted the variations in speech intelligibility. Method: Participants with dysarthria included a…
A study on the co- and adjacent channel protection requirements for mobile satellite ACSSB modulation

NASA Technical Reports Server (NTRS)

Sydor, John T.

1988-01-01

Samples of speech modulated by narrowband frequency modulation (NBFM) (cellular) and amplitude companded single sideband (ACSSB) radios were subjected to simulated co- and adjacent channel interference environments typical of proposed frequency division multiple access (FDMA) mobile satellite systems. These samples were then listened to by a group of evaluators whose subjective responses to the samples were used to produce a series of graphs showing the relationship between subjective acceptability, carrier to noise density (C/No), carrier to interference ratio (C/I), and frequency offset. The results show that in a mobile satellite environment, ACSSB deteriorates more slowly than NBFM. The co- and adjacent channel protection ratios for both modulation techniques were roughly the same, even though the mechanism for signal deterioration is different.
Post-treatment speech naturalness of comprehensive stuttering program clients and differences in ratings among listener groups.

PubMed

Teshima, Shelli; Langevin, Marilyn; Hagler, Paul; Kully, Deborah

2010-03-01

The purposes of this study were to investigate naturalness of the post-treatment speech of Comprehensive Stuttering Program (CSP) clients and differences in naturalness ratings by three listener groups. Listeners were 21 student speech-language pathologists, 9 community members, and 15 listeners who stutter. Listeners rated perceptually fluent speech samples of CSP clients obtained immediately post-treatment (Post) and at 5 years follow-up (F5), and speech samples of matched typically fluent (TF) speakers. A 9-point interval rating scale was used. A 3 (listener group)x2 (time)x2 (speaker) mixed ANOVA was used to test for differences among mean ratings. The difference between CSP Post and F5 mean ratings was statistically significant. The F5 mean rating was within the range reported for typically fluent speakers. Student speech-language pathologists were found to be less critical than community members and listeners who stutter in rating naturalness; however, there were no significant differences in ratings made by community members and listeners who stutter. Results indicate that the naturalness of post-treatment speech of CSP clients improves in the post-treatment period and that it is possible for clients to achieve levels of naturalness that appear to be acceptable to adults who stutter and that are within the range of naturalness ratings given to typically fluent speakers. Readers will be able to (a) summarize key findings of studies that have investigated naturalness ratings, and (b) interpret the naturalness ratings of Comprehensive Stuttering Program speaker samples and the ratings made by the three listener groups in this study.
Speech Discrimination Difficulties in High-Functioning Autism Spectrum Disorder Are Likely Independent of Auditory Hypersensitivity

PubMed Central

Dunlop, William A.; Enticott, Peter G.; Rajan, Ramesh

2016-01-01

Autism Spectrum Disorder (ASD), characterized by impaired communication skills and repetitive behaviors, can also result in differences in sensory perception. Individuals with ASD often perform normally in simple auditory tasks but poorly compared to typically developed (TD) individuals on complex auditory tasks like discriminating speech from complex background noise. A common trait of individuals with ASD is hypersensitivity to auditory stimulation. No studies to our knowledge consider whether hypersensitivity to sounds is related to differences in speech-in-noise discrimination. We provide novel evidence that individuals with high-functioning ASD show poor performance compared to TD individuals in a speech-in-noise discrimination task with an attentionally demanding background noise, but not in a purely energetic noise. Further, we demonstrate in our small sample that speech-hypersensitivity does not appear to predict performance in the speech-in-noise task. The findings support the argument that an attentional deficit, rather than a perceptual deficit, affects the ability of individuals with ASD to discriminate speech from background noise. Finally, we piloted a novel questionnaire that measures difficulty hearing in noisy environments, and sensitivity to non-verbal and verbal sounds. Psychometric analysis using 128 TD participants provided novel evidence for a difference in sensitivity to non-verbal and verbal sounds, and these findings were reinforced by participants with ASD who also completed the questionnaire. The study was limited by a small and high-functioning sample of participants with ASD. Future work could test larger sample sizes and include lower-functioning ASD participants. PMID:27555814
Direct cortical stimulation of inferior frontal cortex disrupts both speech and music production in highly trained musicians.

PubMed

Leonard, Matthew K; Desai, Maansi; Hungate, Dylan; Cai, Ruofan; Singhal, Nilika S; Knowlton, Robert C; Chang, Edward F

2018-05-22

Music and speech are human-specific behaviours that share numerous properties, including the fine motor skills required to produce them. Given these similarities, previous work has suggested that music and speech may at least partially share neural substrates. To date, much of this work has focused on perception, and has not investigated the neural basis of production, particularly in trained musicians. Here, we report two rare cases of musicians undergoing neurosurgical procedures, where it was possible to directly stimulate the left hemisphere cortex during speech and piano/guitar music production tasks. We found that stimulation to left inferior frontal cortex, including pars opercularis and ventral pre-central gyrus, caused slowing and arrest for both speech and music, and note sequence errors for music. Stimulation to posterior superior temporal cortex only caused production errors during speech. These results demonstrate partially dissociable networks underlying speech and music production, with a shared substrate in frontal regions.
Difficulty understanding speech in noise by the hearing impaired: underlying causes and technological solutions.

PubMed

Healy, Eric W; Yoho, Sarah E

2016-08-01

A primary complaint of hearing-impaired individuals involves poor speech understanding when background noise is present. Hearing aids and cochlear implants often allow good speech understanding in quiet backgrounds. But hearing-impaired individuals are highly noise intolerant, and existing devices are not very effective at combating background noise. As a result, speech understanding in noise is often quite poor. In accord with the significance of the problem, considerable effort has been expended toward understanding and remedying this issue. Fortunately, our understanding of the underlying issues is reasonably good. In sharp contrast, effective solutions have remained elusive. One solution that seems promising involves a single-microphone machine-learning algorithm to extract speech from background noise. Data from our group indicate that the algorithm is capable of producing vast increases in speech understanding by hearing-impaired individuals. This paper will first provide an overview of the speech-in-noise problem and outline why hearing-impaired individuals are so noise intolerant. An overview of our approach to solving this problem will follow.
Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research.

PubMed

Toutios, Asterios; Narayanan, Shrikanth S

2016-01-01

Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.
Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research

PubMed Central

TOUTIOS, ASTERIOS; NARAYANAN, SHRIKANTH S.

2016-01-01

Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development. PMID:27833745
Speech fluency profile on different tasks for individuals with Parkinson's disease.

PubMed

Juste, Fabiola Staróbole; Andrade, Claudia Regina Furquim de

2017-07-20

To characterize the speech fluency profile of patients with Parkinson's disease. Study participants were 40 individuals of both genders aged 40 to 80 years divided into 2 groups: Research Group - RG (20 individuals with diagnosis of Parkinson's disease) and Control Group - CG (20 individuals with no communication or neurological disorders). For all of the participants, three speech samples involving different tasks were collected: monologue, individual reading, and automatic speech. The RG presented a significant larger number of speech disruptions, both stuttering-like and typical dysfluencies, and higher percentage of speech discontinuity in the monologue and individual reading tasks compared with the CG. Both groups presented reduced number of speech disruptions (stuttering-like and typical dysfluencies) in the automatic speech task; the groups presented similar performance in this task. Regarding speech rate, individuals in the RG presented lower number of words and syllables per minute compared with those in the CG in all speech tasks. Participants of the RG presented altered parameters of speech fluency compared with those of the CG; however, this change in fluency cannot be considered a stuttering disorder.
Two Different Communication Genres and Implications for Vocabulary Development and Learning to Read

ERIC Educational Resources Information Center

Massaro, Dominic W.

2015-01-01

This study examined potential differences in vocabulary found in picture books and adult's speech to children and to other adults. Using a small sample of various sources of speech and print, Hayes observed that print had a more extensive vocabulary than speech. The current analyses of two different spoken language databases and an assembled…

School-Based Speech-Language Pathologists' Use of iPads

ERIC Educational Resources Information Center

Romane, Garvin Philippe

2017-01-01

This study explored school-based speech-language pathologists' (SLPs') use of iPads and apps for speech and language instruction, specifically for articulation, language, and vocabulary goals. A mostly quantitative-based survey was administered to approximately 2,800 SLPs in a K-12 setting; the final sample consisted of 189 licensed SLPs. Overall,…
The Measurement of the Oral and Nasal Sound Pressure Levels of Speech

ERIC Educational Resources Information Center

Clarke, Wayne M.

1975-01-01

A nasal separator was used to measure the oral and nasal components in the speech of a normal adult Australian population. Results indicated no difference in oral and nasal sound pressure levels for read versus spontaneous speech samples; however, females tended to have a higher nasal component than did males. (Author/TL)
Effects of Culture and Gender in Comprehension of Speech Acts of Indirect Request

ERIC Educational Resources Information Center

Shams, Rabe'a; Afghari, Akbar

2011-01-01

This study investigates the comprehension of indirect request speech act used by Iranian people in daily communication. The study is an attempt to find out whether different cultural backgrounds and the gender of the speakers affect the comprehension of the indirect request of speech act. The sample includes thirty males and females in Gachsaran…
Phonological Memory, Attention Control, and Musical Ability: Effects of Individual Differences on Rater Judgments of Second Language Speech

ERIC Educational Resources Information Center

Isaacs, Talia; Trofimovich, Pavel

2011-01-01

This study examines how listener judgments of second language speech relate to individual differences in listeners' phonological memory, attention control, and musical ability. Sixty native English listeners (30 music majors, 30 nonmusic majors) rated 40 nonnative speech samples for accentedness, comprehensibility, and fluency. The listeners were…
The Influence of Social Class and Race on Language Test Performance and Spontaneous Speech of Preschool Children.

ERIC Educational Resources Information Center

Johnson, Dale L.

This investigation compares child language obtained with standardized tests and samples of spontaneous speech obtained in natural settings. It was hypothesized that differences would exist between social class and racial groups on the unfamiliar standard tests, but such differences would not be evident on spontaneous speech measures. Also, higher…
Advances in natural language processing.

PubMed

Hirschberg, Julia; Manning, Christopher D

2015-07-17

Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.
From the analysis of verbal data to the analysis of organizations: organizing as a dialogical process.

PubMed

Lorino, Philippe

2014-12-01

The analysis of conversational turn-taking and its implications on time (the speaker cannot completely anticipate the future effects of her/his speech) and sociality (the speech is co-produced by the various speakers rather than by the speaking individual) can provide a useful basis to analyze complex organizing processes and collective action: the actor cannot completely anticipate the future effects of her/his acts and the act is co-produced by multiple actors. This translation from verbal to broader classes of interaction stresses the performativity of speeches, the importance of the situation, the role of semiotic mediations to make temporally and spatially distant "ghosts" present in the dialog, and the dissymmetrical relationship between successive conversational turns, due to temporal irreversibility.
Developmental relationships between speech and writing: is verb-phrase anaphora production a special case?

PubMed

Donaldson, Morag L; Cooper, Lynn S M

2013-09-01

Young children's speech is typically more linguistically sophisticated than their writing. However, there are grounds for asking whether production of cohesive devices, such as verb-phrase anaphora (VPA), might represent an exception to this developmental pattern, as cohesive devices are generally more important in writing than in speech and so might be expected to be more frequent in children's writing than in their speech. The study reported herein aims to compare the frequency of children's production of VPA constructions (e.g., Mary is eating an apple and so is John) between a written and a spoken task. Forty-eight children participated from each of two age groups: 7-year-olds and 10-year-olds. All the children received both a spoken and a written sentence completion task designed to elicit production of VPA. Task order was counterbalanced. VPA production was significantly more frequent in speech than in writing and when the spoken task was presented first. Surprisingly, the 7-year-olds produced VPA constructions more frequently than the 10-year-olds. Despite the greater importance of cohesion in writing than in speech, children's production of VPA is similar to their production of most other aspects of language in that more sophisticated constructions are used more frequently in speech than in writing. Children's written production of cohesive devices could probably be enhanced by presenting spoken tasks immediately before written tasks. The lower frequency of VPA production in the older children may reflect syntactic priming effects or a belief that they should produce sentences that are as fully specified as possible. © 2012 The British Psychological Society.
Speech disorders in neurofibromatosis type 1: a sample survey.

PubMed

Cosyns, Marjan; Vandeweghe, Lies; Mortier, Geert; Janssens, Sandra; Van Borsel, John

2010-01-01

Neurofibromatosis type 1 (NF1) is an autosomal-dominant neurocutaneous disorder with an estimated prevalence of two to three cases per 10,000 population. While the physical characteristics have been well documented, speech disorders have not been fully characterized in NF1 patients. This study serves as a pilot to identify key issues in the speech of NF1 patients. In particular, the aim is to explore further the occurrence and nature of problems associated with speech as perceived by the patients themselves. A questionnaire was sent to 149 patients with NF1 registered at the Department of Genetics, Ghent University Hospital. The questionnaire inquired about articulation, hearing, breathing, voice, resonance and fluency. Sixty individuals ranging in age from 4.5 to 61.3 years returned completed questionnaires and these served as the database for the study. The results of this sample survey were compared with data of the normal population. About two-thirds of participants experienced at least one speech or speech-related problem of any type. Compared with the normal population, the NF1 group indicated more articulation difficulties, hearing impairment, abnormalities in loudness, and stuttering. The results indicate that speech difficulties are an area of interest in the NF1 population. Further research to elucidate these findings is needed.
Relationships among psychoacoustic judgments, speech understanding ability and self-perceived handicap in tinnitus subjects.

PubMed

Newman, C W; Wharton, J A; Shivapuja, B G; Jacobson, G P

1994-01-01

Tinnitus is often a disturbing symptom which affects 6-20% of the population. Relationships among tinnitus pitch and loudness judgments, audiometric speech understanding measures and self-perceived handicap were evaluated in a sample of subjects with tinnitus and hearing loss (THL). Data obtained from the THL sample on the audiometric speech measures were compared to the performance of an age-matched hearing loss only (HL) group. Both groups had normal hearing through 1 kHz with a sloping configuration of < or = 20 dB/octave between 2-12 kHz. The THL subjects performed more poorly on the low predictability items of the Speech Perception in Noise Test, suggesting that tinnitus may interfere with the perception of speech signals having reduced linguistic redundancy. The THL subjects rated their tinnitus as annoying at relatively low sensation levels using the pitch-match frequency as the reference tone. Further, significant relationships were found between loudness judgment measures and self-rated annoyance. No predictable relationships were observed between the audiometric speech measures and perceived handicap using the Tinnitus Handicap Questionnaire. These findings support the use of self-report measures in tinnitus patients in that audiometric speech tests alone may be insufficient in describing an individual's reaction to his/her communication breakdowns.
Intelligibility assessment in developmental phonological disorders: accuracy of caregiver gloss.

PubMed

Kwiatkowski, J; Shriberg, L D

1992-10-01

Fifteen caregivers each glossed a simultaneously videotaped and audiotaped sample of their child with speech delay engaged in conversation with a clinician. One of the authors generated a reference gloss for each sample, aided by (a) prior knowledge of the child's speech-language status and error patterns, (b) glosses from the child's clinician and the child's caregiver, (c) unlimited replays of the taped sample, and (d) the information gained from completing a narrow phonetic transcription of the sample. Caregivers glossed an average of 78% of the utterances and 81% of the words. A comparison of their glosses to the reference glosses suggested that they accurately understood an average of 58% of the utterances and 73% of the words. Discussion considers the implications of such findings for methodological and theoretical issues underlying children's moment-to-moment intelligibility breakdowns during speech-language processing.
Are the Literacy Difficulties That Characterize Developmental Dyslexia Associated with a Failure to Integrate Letters and Speech Sounds?

ERIC Educational Resources Information Center

Nash, Hannah M.; Gooch, Debbie; Hulme, Charles; Mahajan, Yatin; McArthur, Genevieve; Steinmetzger, Kurt; Snowling, Margaret J.

2017-01-01

The "automatic letter-sound integration hypothesis" (Blomert, [Blomert, L., 2011]) proposes that dyslexia results from a failure to fully integrate letters and speech sounds into automated audio-visual objects. We tested this hypothesis in a sample of English-speaking children with dyslexic difficulties (N = 13) and samples of…
Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech.

PubMed

Khalighinejad, Bahar; Cruzatto da Silva, Guilherme; Mesgarani, Nima

2017-02-22

Humans are unique in their ability to communicate using spoken language. However, it remains unclear how the speech signal is transformed and represented in the brain at different stages of the auditory pathway. In this study, we characterized electroencephalography responses to continuous speech by obtaining the time-locked responses to phoneme instances (phoneme-related potential). We showed that responses to different phoneme categories are organized by phonetic features. We found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Comparing the patterns of phoneme similarity in the neural responses and the acoustic signals confirms a repetitive appearance of acoustic distinctions of phonemes in the neural data. Analysis of the phonetic and speaker information in neural activations revealed that different time intervals jointly encode the acoustic similarity of both phonetic and speaker categories. These findings provide evidence for a dynamic neural transformation of low-level speech features as they propagate along the auditory pathway, and form an empirical framework to study the representational changes in learning, attention, and speech disorders. SIGNIFICANCE STATEMENT We characterized the properties of evoked neural responses to phoneme instances in continuous speech. We show that each instance of a phoneme in continuous speech produces several observable neural responses at different times occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Each temporal event explicitly encodes the acoustic similarity of phonemes, and linguistic and nonlinguistic information are best represented at different time intervals. Finally, we show a joint encoding of phonetic and speaker information, where the neural representation of speakers is dependent on phoneme category. These findings provide compelling new evidence for dynamic processing of speech sounds in the auditory pathway. Copyright © 2017 Khalighinejad et al.
Attitudes toward speech disorders: sampling the views of Cantonese-speaking Americans.

PubMed

Bebout, L; Arthur, B

1997-01-01

Speech-language pathologists who serve clients from cultural backgrounds that are not familiar to them may encounter culturally influenced attitudinal differences. A questionnaire with statements about 4 speech disorders (dysfluency, cleft pallet, speech of the deaf, and misarticulations) was given to a focus group of Chinese Americans and a comparison group of non-Chinese Americans. The focus group was much more likely to believe that persons with speech disorders could improve their own speech by "trying hard," was somewhat more likely to say that people who use deaf speech and people with cleft palates might be "emotionally disturbed," and generally more likely to view deaf speech as a limitation. The comparison group was more pessimistic about stuttering children's acceptance by their peers than was the focus group. The two subject groups agreed about other items, such as the likelihood that older children with articulation problems are "less intelligent" than their peers.
Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech.

PubMed

Venezia, Jonathan H; Fillmore, Paul; Matchin, William; Isenberg, A Lisette; Hickok, Gregory; Fridriksson, Julius

2016-02-01

Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development. Copyright © 2015 Elsevier Inc. All rights reserved.
Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech

PubMed Central

Venezia, Jonathan H.; Fillmore, Paul; Matchin, William; Isenberg, A. Lisette; Hickok, Gregory; Fridriksson, Julius

2015-01-01

Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development. PMID:26608242
Speech endpoint detection with non-language speech sounds for generic speech processing applications

NASA Astrophysics Data System (ADS)

McClain, Matthew; Romanowski, Brian

2009-05-01

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
38 CFR 8.0 - Definitions of terms used in connection with title 38 CFR, part 8, National Service Life Insurance.

Code of Federal Regulations, 2010 CFR

2010-07-01

..., through the normal organs of speech if the loss is caused by physical changes in such organs. The fact that some speech can be produced through the use of artificial appliance or other organs of the body...
Grammar without Speech Production: The Case of Labrador Inuttitut Heritage Receptive Bilinguals

ERIC Educational Resources Information Center

Sherkina-Lieber, Marina; Perez-Leroux, Ana T.; Johns, Alana

2011-01-01

We examine morphosyntactic knowledge of Labrador Inuttitut by Inuit receptive bilinguals (RBs)--heritage speakers who are capable of comprehension, but produce little or no speech. A grammaticality judgment study suggests that RBs possess sensitivity to morphosyntactic violations, though to a lesser degree than fluent bilinguals. Low-proficiency…
Talker Identification across Source Mechanisms: Experiments with Laryngeal and Electrolarynx Speech

ERIC Educational Resources Information Center

Perrachione, Tyler K.; Stepp, Cara E.; Hillman, Robert E.; Wong, Patrick C. M.

2014-01-01

Purpose: The purpose of this study was to determine listeners' ability to learn talker identity from speech produced with an electrolarynx, explore source and filter differentiation in talker identification, and describe acoustic-phonetic changes associated with electrolarynx use. Method: Healthy adult control listeners learned to identify…

Speech Accommodation without Priming: The Case of Pitch

ERIC Educational Resources Information Center

Gijssels, Tom; Casasanto, Laura Staum; Jasmin, Kyle; Hagoort, Peter; Casasanto, Daniel

2016-01-01

People often accommodate to each other's speech by aligning their linguistic production with their partner's. According to an influential theory, the Interactive Alignment Model, alignment is the result of priming. When people perceive an utterance, the corresponding linguistic representations are primed and become easier to produce. Here we…
Multichannel Compression, Temporal Cues, and Audibility.

ERIC Educational Resources Information Center

Souza, Pamela E.; Turner, Christopher W.

1998-01-01

The effect of the reduction of the temporal envelope produced by multichannel compression on recognition was examined in 16 listeners with hearing loss, with particular focus on audibility of the speech signal. Multichannel compression improved speech recognition when superior audibility was provided by a two-channel compression system over linear…
Bite Block Vowel Production in Apraxia of Speech

ERIC Educational Resources Information Center

Jacks, Adam

2008-01-01

Purpose: This study explored vowel production and adaptation to articulatory constraints in adults with acquired apraxia of speech (AOS) plus aphasia. Method: Five adults with acquired AOS plus aphasia and 5 healthy control participants produced the vowels [iota], [epsilon], and [ash] in four word-length conditions in unconstrained and bite block…
The Role of the Listener's State in Speech Perception

ERIC Educational Resources Information Center

Viswanathan, Navin

2009-01-01

Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…
Longitudinal decline in speech production in Parkinson's disease spectrum disorders.

PubMed

Ash, Sharon; Jester, Charles; York, Collin; Kofman, Olga L; Langey, Rachel; Halpin, Amy; Firn, Kim; Dominguez Perez, Sophia; Chahine, Lama; Spindler, Meredith; Dahodwala, Nabila; Irwin, David J; McMillan, Corey; Weintraub, Daniel; Grossman, Murray

2017-08-01

We examined narrative speech production longitudinally in non-demented (n=15) and mildly demented (n=8) patients with Parkinson's disease spectrum disorder (PDSD), and we related increasing impairment to structural brain changes in specific language and motor regions. Patients provided semi-structured speech samples, describing a standardized picture at two time points (mean±SD interval=38±24months). The recorded speech samples were analyzed for fluency, grammar, and informativeness. PDSD patients with dementia exhibited significant decline in their speech, unrelated to changes in overall cognitive or motor functioning. Regression analysis in a subset of patients with MRI scans (n=11) revealed that impaired language performance at Time 2 was associated with reduced gray matter (GM) volume at Time 1 in regions of interest important for language functioning but not with reduced GM volume in motor brain areas. These results dissociate language and motor systems and highlight the importance of non-motor brain regions for declining language in PDSD. Copyright © 2017 Elsevier Inc. All rights reserved.
Implementation of Three Text to Speech Systems for Kurdish Language

NASA Astrophysics Data System (ADS)

Bahrampour, Anvar; Barkhoda, Wafa; Azami, Bahram Zahir

Nowadays, concatenative method is used in most modern TTS systems to produce artificial speech. The most important challenge in this method is choosing appropriate unit for creating database. This unit must warranty smoothness and high quality speech, and also, creating database for it must reasonable and inexpensive. For example, syllable, phoneme, allophone, and, diphone are appropriate units for all-purpose systems. In this paper, we implemented three synthesis systems for Kurdish language based on syllable, allophone, and diphone and compare their quality using subjective testing.
Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation.

PubMed

Gowda, Dhananjaya; Airaksinen, Manu; Alku, Paavo

2017-09-01

Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models with a hard-limiting weighting function. A sample selective minimization of the prediction error in WLP reduces the effective number of samples available within a given window frame. To counter this problem, a modified quasi-closed phase forward-backward (QCP-FB) analysis is proposed, wherein each sample is predicted based on its past as well as future samples thereby utilizing the available number of samples more effectively. Formant detection and estimation experiments on synthetic vowels generated using a physical modeling approach as well as natural speech utterances show that the proposed QCP-FB method yields statistically significant improvements over the conventional linear prediction and QCP methods.
Iconic gestures prime words: comparison of priming effects when gestures are presented alone and when they are accompanying speech

PubMed Central

So, Wing-Chee; Yi-Feng, Alvan Low; Yap, De-Fu; Kheng, Eugene; Yap, Ju-Min Melvin

2013-01-01

Previous studies have shown that iconic gestures presented in an isolated manner prime visually presented semantically related words. Since gestures and speech are almost always produced together, this study examined whether iconic gestures accompanying speech would prime words and compared the priming effect of iconic gestures with speech to that of iconic gestures presented alone. Adult participants (N = 180) were randomly assigned to one of three conditions in a lexical decision task: Gestures-Only (the primes were iconic gestures presented alone); Speech-Only (the primes were auditory tokens conveying the same meaning as the iconic gestures); Gestures-Accompanying-Speech (the primes were the simultaneous coupling of iconic gestures and their corresponding auditory tokens). Our findings revealed significant priming effects in all three conditions. However, the priming effect in the Gestures-Accompanying-Speech condition was comparable to that in the Speech-Only condition and was significantly weaker than that in the Gestures-Only condition, suggesting that the facilitatory effect of iconic gestures accompanying speech may be constrained by the level of language processing required in the lexical decision task where linguistic processing of words forms is more dominant than semantic processing. Hence, the priming effect afforded by the co-speech iconic gestures was weakened. PMID:24155738
Potential interactions among linguistic, autonomic, and motor factors in speech.

PubMed

Kleinow, Jennifer; Smith, Anne

2006-05-01

Though anecdotal reports link certain speech disorders to increases in autonomic arousal, few studies have described the relationship between arousal and speech processes. Additionally, it is unclear how increases in arousal may interact with other cognitive-linguistic processes to affect speech motor control. In this experiment we examine potential interactions between autonomic arousal, linguistic processing, and speech motor coordination in adults and children. Autonomic responses (heart rate, finger pulse volume, tonic skin conductance, and phasic skin conductance) were recorded simultaneously with upper and lower lip movements during speech. The lip aperture variability (LA variability index) across multiple repetitions of sentences that varied in length and syntactic complexity was calculated under low- and high-arousal conditions. High arousal conditions were elicited by performance of the Stroop color word task. Children had significantly higher lip aperture variability index values across all speaking tasks, indicating more variable speech motor coordination. Increases in syntactic complexity and utterance length were associated with increases in speech motor coordination variability in both speaker groups. There was a significant effect of Stroop task, which produced increases in autonomic arousal and increased speech motor variability in both adults and children. These results provide novel evidence that high arousal levels can influence speech motor control in both adults and children. (c) 2006 Wiley Periodicals, Inc.
Effect of "developmental speech and language training through music" on speech production in children with autism spectrum disorders.

PubMed

Lim, Hayoung A

2010-01-01

The study compared the effect of music training, speech training and no-training on the verbal production of children with Autism Spectrum Disorders (ASD). Participants were 50 children with ASD, age range 3 to 5 years, who had previously been evaluated on standard tests of language and level of functioning. They were randomly assigned to one of three 3-day conditions. Participants in music training (n = 18) watched a music video containing 6 songs and pictures of the 36 target words; those in speech training (n = 18) watched a speech video containing 6 stories and pictures, and those in the control condition (n = 14) received no treatment. Participants' verbal production including semantics, phonology, pragmatics, and prosody was measured by an experimenter designed verbal production evaluation scale. Results showed that participants in both music and speech training significantly increased their pre to posttest verbal production. Results also indicated that both high and low functioning participants improved their speech production after receiving either music or speech training; however, low functioning participants showed a greater improvement after the music training than the speech training. Children with ASD perceive important linguistic information embedded in music stimuli organized by principles of pattern perception, and produce the functional speech.
Speech effort measurement and stuttering: investigating the chorus reading effect.

PubMed

Ingham, Roger J; Warner, Allison; Byrd, Anne; Cotton, John

2006-06-01

The purpose of this study was to investigate chorus reading's (CR's) effect on speech effort during oral reading by adult stuttering speakers and control participants. The effect of a speech effort measurement highlighting strategy was also investigated. Twelve persistent stuttering (PS) adults and 12 normally fluent control participants completed 1-min base rate readings (BR-nonchorus) and CRs within a BR/CR/BR/CR/BR experimental design. Participants self-rated speech effort using a 9-point scale after each reading trial. Stuttering frequency, speech rate, and speech naturalness measures were also obtained. Instructions highlighting speech effort ratings during BR and CR phases were introduced after the first CR. CR improved speech effort ratings for the PS group, but the control group showed a reverse trend. Both groups' effort ratings were not significantly different during CR phases but were significantly poorer than the control group's effort ratings during BR phases. The highlighting strategy did not significantly change effort ratings. The findings show that CR will produce not only stutter-free and natural sounding speech but also reliable reductions in speech effort. However, these reductions do not reach effort levels equivalent to those achieved by normally fluent speakers, thereby conditioning its use as a gold standard of achievable normal fluency by PS speakers.
Expressed parental concern regarding childhood stuttering and the Test of Childhood Stuttering.

PubMed

Tumanova, Victoria; Choi, Dahye; Conture, Edward G; Walden, Tedra A

The purpose of the present study was to determine whether the Test of Childhood Stuttering observational rating scales (TOCS; Gillam et al., 2009) (1) differed between parents who did versus did not express concern (independent from the TOCS) about their child's speech fluency; (2) correlated with children's frequency of stuttering measured during a child-examiner conversation; and (3) correlated with the length and complexity of children's utterances, as indexed by mean length of utterance (MLU). Participants were 183 young children ages 3:0-5:11. Ninety-one had parents who reported concern about their child's stuttering (65 boys, 26 girls) and 92 had parents who reported no such concern (50 boys, 42 girls). Participants' conversational speech during a child-examiner conversation was analyzed for (a) frequency of occurrence of stuttered and non-stuttered disfluencies, and (b) MLU. Besides expressing concern or lack thereof about their child's speech fluency, parents completed the TOCS observational rating scales documenting how often they observe different disfluency types in speech of their children, as well as disfluency-related consequences. There were three main findings. First, parents who expressed concern (independently from the TOCS) about their child's stuttering reported significantly higher scores on the TOCS Speech Fluency and Disfluency-Related Consequences rating scales. Second, children whose parents rated them higher on the TOCS Speech Fluency rating scale produced more stuttered disfluencies during a child-examiner conversation. Third, children with higher scores on the TOCS Disfluency-Related Consequences rating scale had shorter MLU during child-examiner conversation, across age and level of language ability. Findings support the use of the TOCS observational rating scales as one documentable, objective means to determine parental perception of and concern about their child's stuttering. Findings also support the notion that parents are reasonably accurate, if not reliable, judges of the quantity and quality (i.e., stuttered vs. non-stuttered) of their child's speech disfluencies. Lastly, findings that some children may decrease their verbal output in attempts to minimize instances of stuttering - as indexed by relatively low MLU and a high TOCS Disfluency-Related Consequences scores - provides strong support for sampling young children's speech and language across various situations to obtain the most representative index possible of the child's MLU and associated instances of stuttering. Copyright © 2018 Elsevier Inc. All rights reserved.
The Atlanta Motor Speech Disorders Corpus: Motivation, Development, and Utility.

PubMed

Laures-Gore, Jacqueline; Russell, Scott; Patel, Rupal; Frankel, Michael

2016-01-01

This paper describes the design and collection of a comprehensive spoken language dataset from speakers with motor speech disorders in Atlanta, Ga., USA. This collaborative project aimed to gather a spoken database consisting of nonmainstream American English speakers residing in the Southeastern US in order to provide a more diverse perspective of motor speech disorders. Ninety-nine adults with an acquired neurogenic disorder resulting in a motor speech disorder were recruited. Stimuli include isolated vowels, single words, sentences with contrastive focus, sentences with emotional content and prosody, sentences with acoustic and perceptual sensitivity to motor speech disorders, as well as 'The Caterpillar' and 'The Grandfather' passages. Utility of this data in understanding the potential interplay of dialect and dysarthria was demonstrated with a subset of the speech samples existing in the database. The Atlanta Motor Speech Disorders Corpus will enrich our understanding of motor speech disorders through the examination of speech from a diverse group of speakers. © 2016 S. Karger AG, Basel.
Speech and pause characteristics in multiple sclerosis: A preliminary study of speakers with high and low neuropsychological test performance

PubMed Central

FEENAUGHTY, LYNDA; TJADEN, KRIS; BENEDICT, RALPH H.B.; WEINSTOCK-GUTTMAN, BIANCA

2017-01-01

This preliminary study investigated how cognitive-linguistic status in multiple sclerosis (MS) is reflected in two speech tasks (i.e. oral reading, narrative) that differ in cognitive-linguistic demand. Twenty individuals with MS were selected to comprise High and Low performance groups based on clinical tests of executive function and information processing speed and efficiency. Ten healthy controls were included for comparison. Speech samples were audio-recorded and measures of global speech timing were obtained. Results indicated predicted differences in global speech timing (i.e. speech rate and pause characteristics) for speech tasks differing in cognitive-linguistic demand, but the magnitude of these task-related differences was similar for all speaker groups. Findings suggest that assumptions concerning the cognitive-linguistic demands of reading aloud as compared to spontaneous speech may need to be re-considered for individuals with cognitive impairment. Qualitative trends suggest that additional studies investigating the association between cognitive-linguistic and speech motor variables in MS are warranted. PMID:23294227
Brainstem Encoding of Aided Speech in Hearing Aid Users with Cochlear Dead Region(s).

PubMed

Hassaan, Mohammad Ramadan; Ibraheem, Ola Abdallah; Galhom, Dalia Helal

2016-07-01

Neural encoding of speech begins with the analysis of the signal as a whole broken down into its sinusoidal components in the cochlea, which has to be conserved up to the higher auditory centers. Some of these components target the dead regions of the cochlea causing little or no excitation. Measuring aided speech-evoked auditory brainstem response elicited by speech stimuli with different spectral maxima can give insight into the brainstem encoding of aided speech with spectral maxima at these dead regions. This research aims to study the impact of dead regions of the cochlea on speech processing at the brainstem level after a long period of hearing aid use. This study comprised 30 ears without dead regions and 46 ears with dead regions at low, mid, or high frequencies. For all ears, we measured the aided speech-evoked auditory brainstem response using speech stimuli of low, mid, and high spectral maxima. Aided speech-evoked auditory brainstem response was producible in all subjects. Responses evoked by stimuli with spectral maxima at dead regions had longer latencies and smaller amplitudes when compared with the control group or the responses of other stimuli. The presence of cochlear dead regions affects brainstem encoding of speech with spectral maxima perpendicular to these regions. Brainstem neuroplasticity and the extrinsic redundancy of speech can minimize the impact of dead regions in chronic hearing aid users.
The speech perception skills of children with and without speech sound disorder.

PubMed

Hearnshaw, Stephanie; Baker, Elise; Munro, Natalie

To investigate whether Australian-English speaking children with and without speech sound disorder (SSD) differ in their overall speech perception accuracy. Additionally, to investigate differences in the perception of specific phonemes and the association between speech perception and speech production skills. Twenty-five Australian-English speaking children aged 48-60 months participated in this study. The SSD group included 12 children and the typically developing (TD) group included 13 children. Children completed routine speech and language assessments in addition to an experimental Australian-English lexical and phonetic judgement task based on Rvachew's Speech Assessment and Interactive Learning System (SAILS) program (Rvachew, 2009). This task included eight words across four word-initial phonemes-/k, ɹ, ʃ, s/. Children with SSD showed significantly poorer perceptual accuracy on the lexical and phonetic judgement task compared with TD peers. The phonemes /ɹ/ and /s/ were most frequently perceived in error across both groups. Additionally, the phoneme /ɹ/ was most commonly produced in error. There was also a positive correlation between overall speech perception and speech production scores. Children with SSD perceived speech less accurately than their typically developing peers. The findings suggest that an Australian-English variation of a lexical and phonetic judgement task similar to the SAILS program is promising and worthy of a larger scale study. Copyright © 2017 Elsevier Inc. All rights reserved.
"The caterpillar": a novel reading passage for assessment of motor speech disorders.

PubMed

Patel, Rupal; Connaghan, Kathryn; Franco, Diana; Edsall, Erika; Forgit, Dory; Olsen, Laura; Ramage, Lianna; Tyler, Emily; Russell, Scott

2013-02-01

A review of the salient characteristics of motor speech disorders and common assessment protocols revealed the need for a novel reading passage tailored specifically to differentiate between and among the dysarthrias (DYSs) and apraxia of speech (AOS). "The Caterpillar" passage was designed to provide a contemporary, easily read, contextual speech sample with specific tasks (e.g., prosodic contrasts, words of increasing length and complexity) targeted to inform the assessment of motor speech disorders. Twenty-two adults, 15 with DYS or AOS and 7 healthy controls (HC), were recorded reading "The Caterpillar" passage to demonstrate its utility in examining motor speech performance. Analysis of performance across a subset of segmental and prosodic variables illustrated that "The Caterpillar" passage showed promise for extracting individual profiles of impairment that could augment current assessment protocols and inform treatment planning in motor speech disorders.
Perceptual Measures of Speech from Individuals with Parkinson's Disease and Multiple Sclerosis: Intelligibility and beyond

ERIC Educational Resources Information Center

Sussman, Joan E.; Tjaden, Kris

2012-01-01

Purpose: The primary purpose of this study was to compare percent correct word and sentence intelligibility scores for individuals with multiple sclerosis (MS) and Parkinson's disease (PD) with scaled estimates of speech severity obtained for a reading passage. Method: Speech samples for 78 talkers were judged, including 30 speakers with MS, 16…
Do Native Speakers of North American and Singapore English Differentially Perceive Comprehensibility in Second Language Speech?

ERIC Educational Resources Information Center

Saito, Kazuya; Shintani, Natsuko

2016-01-01

The current study examined the extent to which native speakers of North American and Singapore English differentially perceive the comprehensibility (ease of understanding) of second language (L2) speech. Spontaneous speech samples elicited from 50 Japanese learners of English with various proficiency levels were first rated by 10 Canadian and 10…
Assessing Children's Home Language Environments Using Automatic Speech Recognition Technology

ERIC Educational Resources Information Center

Greenwood, Charles R.; Thiemann-Bourque, Kathy; Walker, Dale; Buzhardt, Jay; Gilkerson, Jill

2011-01-01

The purpose of this research was to replicate and extend some of the findings of Hart and Risley using automatic speech processing instead of human transcription of language samples. The long-term goal of this work is to make the current approach to speech processing possible by researchers and clinicians working on a daily basis with families and…

Music and Speech Perception in Children Using Sung Speech

PubMed Central

Nie, Yingjiu; Galvin, John J.; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

2018-01-01

This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners. PMID:29609496
Music and Speech Perception in Children Using Sung Speech.

PubMed

Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

2018-01-01

This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Examining explanations for fundamental frequency's contribution to speech intelligibility in noise

NASA Astrophysics Data System (ADS)

Schlauch, Robert S.; Miller, Sharon E.; Watson, Peter J.

2005-09-01

Laures and Weismer [JSLHR, 42, 1148 (1999)] reported that speech with natural variation in fundamental frequency (F0) is more intelligible in noise than speech with a flattened F0 contour. Cognitive-linguistic based explanations have been offered to account for this drop in intelligibility for the flattened condition, but a lower-level mechanism related to auditory streaming may be responsible. Numerous psychoacoustic studies have demonstrated that modulating a tone enables a listener to segregate it from background sounds. To test these rival hypotheses, speech recognition in noise was measured for sentences with six different F0 contours: unmodified, flattened at the mean, natural but exaggerated, reversed, and frequency modulated (rates of 2.5 and 5.0 Hz). The 180 stimulus sentences were produced by five talkers (30 sentences per condition). Speech recognition for fifteen listeners replicate earlier findings showing that flattening the F0 contour results in a roughly 10% reduction in recognition of key words compared with the natural condition. Although the exaggerated condition produced results comparable to those of the flattened condition, the other conditions with unnatural F0 contours all yielded significantly poorer performance than the flattened condition. These results support the cognitive, linguistic-based explanations for the reduction in performance.
Speech Processing to Improve the Perception of Speech in Background Noise for Children With Auditory Processing Disorder and Typically Developing Peers.

PubMed

Flanagan, Sheila; Zorilă, Tudor-Cătălin; Stylianou, Yannis; Moore, Brian C J

2018-01-01

Auditory processing disorder (APD) may be diagnosed when a child has listening difficulties but has normal audiometric thresholds. For adults with normal hearing and with mild-to-moderate hearing impairment, an algorithm called spectral shaping with dynamic range compression (SSDRC) has been shown to increase the intelligibility of speech when background noise is added after the processing. Here, we assessed the effect of such processing using 8 children with APD and 10 age-matched control children. The loudness of the processed and unprocessed sentences was matched using a loudness model. The task was to repeat back sentences produced by a female speaker when presented with either speech-shaped noise (SSN) or a male competing speaker (CS) at two signal-to-background ratios (SBRs). Speech identification was significantly better with SSDRC processing than without, for both groups. The benefit of SSDRC processing was greater for the SSN than for the CS background. For the SSN, scores were similar for the two groups at both SBRs. For the CS, the APD group performed significantly more poorly than the control group. The overall improvement produced by SSDRC processing could be useful for enhancing communication in a classroom where the teacher's voice is broadcast using a wireless system.
Electrophysiological evidence of functional integration between the language and motor systems in the brain: a study of the speech Bereitschaftspotential.

PubMed

McArdle, J J; Mari, Z; Pursley, R H; Schulz, G M; Braun, A R

2009-02-01

We investigated whether the Bereitschaftspotential (BP), an event related potential believed to reflect motor planning, would be modulated by language-related parameters prior to speech. We anticipated that articulatory complexity would produce effects on the BP distribution similar to those demonstrated for complex limb movements. We also hypothesized that lexical semantic operations would independently impact the BP. Eighteen participants performed 3 speech tasks designed to differentiate lexical semantic and articulatory contributions to the BP. EEG epochs were time-locked to the earliest source of speech movement per trial. Lip movements were assessed using EMG recordings. Doppler imaging was used to determine the onset of tongue movement during speech, providing a means of identification and elimination of potential artifact. Compared to simple repetition, complex articulations produced an anterior shift in the maximum midline BP. Tasks requiring lexical search and selection augmented these effects and independently elicited a left lateralized asymmetry in the frontal distribution. The findings indicate that the BP is significantly modulated by linguistic processing, suggesting that the premotor system might play a role in lexical access. These novel findings support the notion that the motor systems may play a significant role in the formulation of language.
Quality ratings of frequency-compressed speech by participants with extensive high-frequency dead regions in the cochlea

PubMed Central

Salorio-Corbetto, Marina; Baer, Thomas; Moore, Brian C. J.

2017-01-01

Abstract Objective: The objective was to assess the degradation of speech sound quality produced by frequency compression for listeners with extensive high-frequency dead regions (DRs). Design: Quality ratings were obtained using values of the starting frequency (Sf) of the frequency compression both below and above the estimated edge frequency, fe, of each DR. Thus, the value of Sf often fell below the lowest value currently used in clinical practice. Several compression ratios were used for each value of Sf. Stimuli were sentences processed via a prototype hearing aid based on Phonak Exélia Art P. Study sample: Five participants (eight ears) with extensive high-frequency DRs were tested. Results: Reductions of sound-quality produced by frequency compression were small to moderate. Ratings decreased significantly with decreasing Sf and increasing CR. The mean ratings were lowest for the lowest Sf and highest CR. Ratings varied across participants, with one participant rating frequency compression lower than no frequency compression even when Sf was above fe. Conclusions: Frequency compression degraded sound quality somewhat for this small group of participants with extensive high-frequency DRs. The degradation was greater for lower values of Sf relative to fe, and for greater values of CR. Results varied across participants. PMID:27724057
TOEFL iBT Speaking Test Scores as Indicators of Oral Communicative Language Proficiency

ERIC Educational Resources Information Center

Bridgeman, Brent; Powers, Donald; Stone, Elizabeth; Mollaun, Pamela

2012-01-01

Scores assigned by trained raters and by an automated scoring system (SpeechRater[TM]) on the speaking section of the TOEFL iBT[TM] were validated against a communicative competence criterion. Specifically, a sample of 555 undergraduate students listened to speech samples from 184 examinees who took the Test of English as a Foreign Language…
An Analysis of the Use and Structure of Logic in Japanese Argument.

ERIC Educational Resources Information Center

Hazen, Michael David

A study was conducted to determine if the Japanese use logic and argument in different ways than do Westerners. The study analyzed sample rebuttal speeches (in English) of 14 Japanese debaters using the Toulmin model of argument. In addition, it made comparisons with a sample of speeches made by 5 American high school debaters. Audiotapes of the…
Risk and Protective Factors Associated with Speech and Language Impairment in a Nationally Representative Sample of 4- to 5-Year-Old Children

ERIC Educational Resources Information Center

Harrison, Linda J.; McLeod, Sharynne

2010-01-01

Purpose: To determine risk and protective factors for speech and language impairment in early childhood. Method: Data are presented for a nationally representative sample of 4,983 children participating in the Longitudinal Study of Australian Children (described in McLeod & Harrison, 2009). Thirty-one child, parent, family, and community…
Articulatory movements modulate auditory responses to speech

PubMed Central

Agnew, Z.K.; McGettigan, C.; Banks, B.; Scott, S.K.

2013-01-01

Production of actions is highly dependent on concurrent sensory information. In speech production, for example, movement of the articulators is guided by both auditory and somatosensory input. It has been demonstrated in non-human primates that self-produced vocalizations and those of others are differentially processed in the temporal cortex. The aim of the current study was to investigate how auditory and motor responses differ for self-produced and externally produced speech. Using functional neuroimaging, subjects were asked to produce sentences aloud, to silently mouth while listening to a different speaker producing the same sentence, to passively listen to sentences being read aloud, or to read sentences silently. We show that that separate regions of the superior temporal cortex display distinct response profiles to speaking aloud, mouthing while listening, and passive listening. Responses in anterior superior temporal cortices in both hemispheres are greater for passive listening compared with both mouthing while listening, and speaking aloud. This is the first demonstration that articulation, whether or not it has auditory consequences, modulates responses of the dorsolateral temporal cortex. In contrast posterior regions of the superior temporal cortex are recruited during both articulation conditions. In dorsal regions of the posterior superior temporal gyrus, responses to mouthing and reading aloud were equivalent, and in more ventral posterior superior temporal sulcus, responses were greater for reading aloud compared with mouthing while listening. These data demonstrate an anterior–posterior division of superior temporal regions where anterior fields are suppressed during motor output, potentially for the purpose of enhanced detection of the speech of others. We suggest posterior fields are engaged in auditory processing for the guidance of articulation by auditory information. PMID:22982103
Cued Speech Transliteration: Effects of Accuracy and Lag Time on Message Intelligibility

ERIC Educational Resources Information Center

Krause, Jean C.; Lopez, Katherine A.

2017-01-01

This paper is the second in a series concerned with the level of access afforded to students who use educational interpreters. The first paper (Krause & Tessler, 2016) focused on factors affecting accuracy of messages produced by Cued Speech (CS) transliterators (expression). In this study, factors affecting intelligibility (reception by deaf…
Planning and Articulation in Incremental Word Production: Syllable-Frequency Effects in English

ERIC Educational Resources Information Center

Cholin, Joana; Dell, Gary S.; Levelt, Willem J. M.

2011-01-01

We investigated the role of syllables during speech planning in English by measuring syllable-frequency effects. So far, syllable-frequency effects in English have not been reported. English has poorly defined syllable boundaries, and thus the syllable might not function as a prominent unit in English speech production. Speakers produced either…
Camperdown Program for Adults Who Stutter: A Student Training Clinic Phase I Trial

ERIC Educational Resources Information Center

Cocomazzo, Nadia; Block, Susan; Carey, Brenda; O'Brian, Sue; Onslow, Mark; Packman, Ann; Iverach, Lisa

2012-01-01

Objectives: During speech pathology professional preparation there is a need for adequate student instruction with speech-restructuring treatments for adults. An important part of that clinical educational experience is to participate in a clinical setting that produces outcomes equivalent to those attained during clinical trials. A previous…
Reference in Action: Links between Pointing and Language

ERIC Educational Resources Information Center

Cooperrider, Kensy Andrew

2011-01-01

When referring to things in the world, speakers produce utterances that are composites of speech and action. Pointing gestures are a pervasive part of such composite utterances, but many questions remain about exactly how pointing is integrated with speech. In this dissertation I present three strands of research that investigate relations of…
Videofluoroscopic Investigation of Body Position on Articulatory Positioning

ERIC Educational Resources Information Center

Bae, Youkyung; Perry, Jamie L.; Kuehn, David P.

2014-01-01

Purpose: To quantitatively examine the effects of body position on the positioning of the epiglottis, tongue, and velum at rest and during speech. Method: Videofluoroscopic data were obtained from 12 healthy adults in the supine and upright positions at rest and during speech while the participants produced 12 VCV sequences. The effects of body…
The Female-to-Male Transsexual Voice: Physiology vs. Performance in Production

ERIC Educational Resources Information Center

Papp, Viktoria

2011-01-01

Results of the three studies on the speech production of female-to-male transgender individuals (transmen) present phonetic evidence that speech produces the transmen by what I termed triple decoupling. Transmen successfully decouple gender from biological sex. The results of the longitudinal studies exemplified that speakers born and raised…
Linguistic Flexibility Modulates Speech Planning for Causative Motion Events: A Cross-Linguistic Study of Mandarin and English

ERIC Educational Resources Information Center

Zheng, Chun

2017-01-01

Producing a sensible utterance requires speakers to select conceptual content, lexical items, and syntactic structures almost instantaneously during speech planning. Each language offers its speakers flexibility in the selection of lexical and syntactic options to talk about the same scenarios involving movement. Languages also vary typologically…
Responses to Intensity-Shifted Auditory Feedback during Running Speech

ERIC Educational Resources Information Center

Patel, Rupal; Reilly, Kevin J.; Archibald, Erin; Cai, Shanqing; Guenther, Frank H.

2015-01-01

Purpose: Responses to intensity perturbation during running speech were measured to understand whether prosodic features are controlled in an independent or integrated manner. Method: Nineteen English-speaking healthy adults (age range = 21-41 years) produced 480 sentences in which emphatic stress was placed on either the 1st or 2nd word. One…
The Listener: No Longer the Silent Partner in Reduced Intelligibility

ERIC Educational Resources Information Center

Zielinski, Beth W.

2008-01-01

In this study I investigate the impact of different characteristics of the L2 speech signal on the intelligibility of L2 speakers of English to native listeners. Three native listeners were observed and questioned as they orthographically transcribed utterances taken from connected conversational speech produced by three L2 speakers from different…
Machine Learning Methods for Articulatory Data

ERIC Educational Resources Information Center

Berry, Jeffrey James

2012-01-01

Humans make use of more than just the audio signal to perceive speech. Behavioral and neurological research has shown that a person's knowledge of how speech is produced influences what is perceived. With methods for collecting articulatory data becoming more ubiquitous, methods for extracting useful information are needed to make this data…

Do Parents Lead Their Children by the Hand?

ERIC Educational Resources Information Center

Ozcaliskan, Seyda; Goldin-Meadow, Susan

2005-01-01

The types of gesture+speech combinations children produce during the early stages of language development change over time. This change, in turn, predicts the onset of two-word speech and thus might reflect a cognitive transition that the child is undergoing. An alternative, however, is that the change merely reflects changes in the types of…
Auditory-Visual Speech Integration by Adults with and without Language-Learning Disabilities

ERIC Educational Resources Information Center

Norrix, Linda W.; Plante, Elena; Vance, Rebecca

2006-01-01

Auditory and auditory-visual (AV) speech perception skills were examined in adults with and without language-learning disabilities (LLD). The AV stimuli consisted of congruent consonant-vowel syllables (auditory and visual syllables matched in terms of syllable being produced) and incongruent McGurk syllables (auditory syllable differed from…
Talking Wheelchair

NASA Technical Reports Server (NTRS)

1981-01-01

Communication is made possible for disabled individuals by means of an electronic system, developed at Stanford University's School of Medicine, which produces highly intelligible synthesized speech. Familiarly known as the "talking wheelchair" and formally as the Versatile Portable Speech Prosthesis (VPSP). Wheelchair mounted system consists of a word processor, a video screen, a voice synthesizer and a computer program which instructs the synthesizer how to produce intelligible sounds in response to user commands. Computer's memory contains 925 words plus a number of common phrases and questions. Memory can also store several thousand other words of the user's choice. Message units are selected by operating a simple switch, joystick or keyboard. Completed message appears on the video screen, then user activates speech synthesizer, which generates a voice with a somewhat mechanical tone. With the keyboard, an experienced user can construct messages as rapidly as 30 words per minute.
Effects of a computer-based intervention program on the communicative functions of children with autism.

PubMed

Hetzroni, Orit E; Tannous, Juman

2004-04-01

This study investigated the use of computer-based intervention for enhancing communication functions of children with autism. The software program was developed based on daily life activities in the areas of play, food, and hygiene. The following variables were investigated: delayed echolalia, immediate echolalia, irrelevant speech, relevant speech, and communicative initiations. Multiple-baseline design across settings was used to examine the effects of the exposure of five children with autism to activities in a structured and controlled simulated environment on the communication manifested in their natural environment. Results indicated that after exposure to the simulations, all children produced fewer sentences with delayed and irrelevant speech. Most of the children engaged in fewer sentences involving immediate echolalia and increased the number of communication intentions and the amount of relevant speech they produced. Results indicated that after practicing in a controlled and structured setting that provided the children with opportunities to interact in play, food, and hygiene activities, the children were able to transfer their knowledge to the natural classroom environment. Implications and future research directions are discussed.
White noise speech illusion and psychosis expression: An experimental investigation of psychosis liability

PubMed Central

Guloksuz, Sinan; Menne-Lothmann, Claudia; Decoster, Jeroen; van Winkel, Ruud; Collip, Dina; Delespaul, Philippe; De Hert, Marc; Derom, Catherine; Thiery, Evert; Jacobs, Nele; Wichers, Marieke; Simons, Claudia J. P.; Rutten, Bart P. F.; van Os, Jim

2017-01-01

Background An association between white noise speech illusion and psychotic symptoms has been reported in patients and their relatives. This supports the theory that bottom-up and top-down perceptual processes are involved in the mechanisms underlying perceptual abnormalities. However, findings in nonclinical populations have been conflicting. Objectives The aim of this study was to examine the association between white noise speech illusion and subclinical expression of psychotic symptoms in a nonclinical sample. Findings were compared to previous results to investigate potential methodology dependent differences. Methods In a general population adolescent and young adult twin sample (n = 704), the association between white noise speech illusion and subclinical psychotic experiences, using the Structured Interview for Schizotypy—Revised (SIS-R) and the Community Assessment of Psychic Experiences (CAPE), was analyzed using multilevel logistic regression analyses. Results Perception of any white noise speech illusion was not associated with either positive or negative schizotypy in the general population twin sample, using the method by Galdos et al. (2011) (positive: ORadjusted: 0.82, 95% CI: 0.6–1.12, p = 0.217; negative: ORadjusted: 0.75, 95% CI: 0.56–1.02, p = 0.065) and the method by Catalan et al. (2014) (positive: ORadjusted: 1.11, 95% CI: 0.79–1.57, p = 0.557). No association was found between CAPE scores and speech illusion (ORadjusted: 1.25, 95% CI: 0.88–1.79, p = 0.220). For the Catalan et al. (2014) but not the Galdos et al. (2011) method, a negative association was apparent between positive schizotypy and speech illusion with positive or negative affective valence (ORadjusted: 0.44, 95% CI: 0.24–0.81, p = 0.008). Conclusion Contrary to findings in clinical populations, white noise speech illusion may not be associated with psychosis proneness in nonclinical populations. PMID:28832672
Reflections on mirror neurons and speech perception.

PubMed

Lotto, Andrew J; Hickok, Gregory S; Holt, Lori L

2009-03-01

The discovery of mirror neurons, a class of neurons that respond when a monkey performs an action and also when the monkey observes others producing the same action, has promoted a renaissance for the Motor Theory (MT) of speech perception. This is because mirror neurons seem to accomplish the same kind of one to one mapping between perception and action that MT theorizes to be the basis of human speech communication. However, this seeming correspondence is superficial, and there are theoretical and empirical reasons to temper enthusiasm about the explanatory role mirror neurons might have for speech perception. In fact, rather than providing support for MT, mirror neurons are actually inconsistent with the central tenets of MT.
Reflections on mirror neurons and speech perception

PubMed Central

Lotto, Andrew J.; Hickok, Gregory S.; Holt, Lori L.

2010-01-01

The discovery of mirror neurons, a class of neurons that respond when a monkey performs an action and also when the monkey observes others producing the same action, has promoted a renaissance for the Motor Theory (MT) of speech perception. This is because mirror neurons seem to accomplish the same kind of one to one mapping between perception and action that MT theorizes to be the basis of human speech communication. However, this seeming correspondence is superficial, and there are theoretical and empirical reasons to temper enthusiasm about the explanatory role mirror neurons might have for speech perception. In fact, rather than providing support for MT, mirror neurons are actually inconsistent with the central tenets of MT. PMID:19223222
Developing a Weighted Measure of Speech Sound Accuracy

PubMed Central

Preston, Jonathan L.; Ramsdell, Heather L.; Oller, D. Kimbrough; Edwards, Mary Louise; Tobin, Stephen J.

2010-01-01

Purpose The purpose is to develop a system for numerically quantifying a speaker’s phonetic accuracy through transcription-based measures. With a focus on normal and disordered speech in children, we describe a system for differentially weighting speech sound errors based on various levels of phonetic accuracy with a Weighted Speech Sound Accuracy (WSSA) score. We then evaluate the reliability and validity of this measure. Method Phonetic transcriptions are analyzed from several samples of child speech, including preschoolers and young adolescents with and without speech sound disorders and typically developing toddlers. The new measure of phonetic accuracy is compared to existing measures, is used to discriminate typical and disordered speech production, and is evaluated to determine whether it is sensitive to changes in phonetic accuracy over time. Results Initial psychometric data indicate that WSSA scores correlate with other measures of phonetic accuracy as well as listeners’ judgments of severity of a child’s speech disorder. The measure separates children with and without speech sound disorders. WSSA scores also capture growth in phonetic accuracy in toddler’s speech over time. Conclusion Results provide preliminary support for the WSSA as a valid and reliable measure of phonetic accuracy in children’s speech. PMID:20699344
The role of hearing ability and speech distortion in the facilitation of articulatory motor cortex.

PubMed

Nuttall, Helen E; Kennedy-Higgins, Daniel; Devlin, Joseph T; Adank, Patti

2017-01-08

Excitability of articulatory motor cortex is facilitated when listening to speech in challenging conditions. Beyond this, however, we have little knowledge of what listener-specific and speech-specific factors engage articulatory facilitation during speech perception. For example, it is unknown whether speech motor activity is independent or dependent on the form of distortion in the speech signal. It is also unknown if speech motor facilitation is moderated by hearing ability. We investigated these questions in two experiments. We applied transcranial magnetic stimulation (TMS) to the lip area of primary motor cortex (M1) in young, normally hearing participants to test if lip M1 is sensitive to the quality (Experiment 1) or quantity (Experiment 2) of distortion in the speech signal, and if lip M1 facilitation relates to the hearing ability of the listener. Experiment 1 found that lip motor evoked potentials (MEPs) were larger during perception of motor-distorted speech that had been produced using a tongue depressor, and during perception of speech presented in background noise, relative to natural speech in quiet. Experiment 2 did not find evidence of motor system facilitation when speech was presented in noise at signal-to-noise ratios where speech intelligibility was at 50% or 75%, which were significantly less severe noise levels than used in Experiment 1. However, there was a significant interaction between noise condition and hearing ability, which indicated that when speech stimuli were correctly classified at 50%, speech motor facilitation was observed in individuals with better hearing, whereas individuals with relatively worse but still normal hearing showed more activation during perception of clear speech. These findings indicate that the motor system may be sensitive to the quantity, but not quality, of degradation in the speech signal. Data support the notion that motor cortex complements auditory cortex during speech perception, and point to a role for the motor cortex in compensating for differences in hearing ability. Copyright © 2016 Elsevier Ltd. All rights reserved.
Speech deficits in serious mental illness: a cognitive resource issue?

PubMed

Cohen, Alex S; McGovern, Jessica E; Dinzeo, Thomas J; Covington, Michael A

2014-12-01

Speech deficits, notably those involved in psychomotor retardation, blunted affect, alogia and poverty of content of speech, are pronounced in a wide range of serious mental illnesses (e.g., schizophrenia, unipolar depression, bipolar disorders). The present project evaluated the degree to which these deficits manifest as a function of cognitive resource limitations. We examined natural speech from 52 patients meeting criteria for serious mental illnesses (i.e., severe functional deficits with a concomitant diagnosis of schizophrenia, unipolar and/or bipolar affective disorders) and 30 non-psychiatric controls using a range of objective, computer-based measures tapping speech production ("alogia"), variability ("blunted vocal affect") and content ("poverty of content of speech"). Subjects produced natural speech during a baseline condition and while engaging in an experimentally-manipulated cognitively-effortful task. For correlational analysis, cognitive ability was measured using a standardized battery. Generally speaking, speech deficits did not differ as a function of SMI diagnosis. However, every speech production and content measure was significantly abnormal in SMI versus control groups. Speech variability measures generally did not differ between groups. For both patients and controls as a group, speech during the cognitively-effortful task was sparser and less rich in content. Relative to controls, patients were abnormal under cognitive load with respect only to average pause length. Correlations between the speech variables and cognitive ability were only significant for this same variable: average pause length. Results suggest that certain speech deficits, notably involving pause length, may manifest as a function of cognitive resource limitations. Implications for treatment, research and assessment are discussed. Copyright © 2014 Elsevier B.V. All rights reserved.
Stuttering Frequency, Speech Rate, Speech Naturalness, and Speech Effort During the Production of Voluntary Stuttering.

PubMed

Davidow, Jason H; Grossman, Heather L; Edge, Robin L

2018-05-01

Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.
Cognitive Bias for Learning Speech Sounds From a Continuous Signal Space Seems Nonlinguistic.

PubMed

van der Ham, Sabine; de Boer, Bart

2015-10-01

When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults' generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants' reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories.
STI: An objective measure for the performance of voice communication systems

NASA Astrophysics Data System (ADS)

Houtgast, T.; Steeneken, H. J. M.

1981-06-01

A measuring device was developed for determining the quality of speech communication systems. It comprises two parts, a signal source which replaces the talker, producing an artificial speech-like signal, and an analysis part which replaces the listener, by which the signal at the receiving end of the system under test is evaluated. Each single measurement results in an index (ranging from 0-100%) which indicates the effect of that communication system on speech intelligibility. The index is called STI (Speech Transmission Index). A careful design of the characteristics of the test signal and of the type of signal analysis makes the present approach widely applicable. It was verified experimentally that a given STI implies a given effect on speech intelligibility, irrespective of the nature of the actual disturbance (noise interference, band-pass limiting, peak clipping, etc.).
Cognitive Bias for Learning Speech Sounds From a Continuous Signal Space Seems Nonlinguistic

PubMed Central

de Boer, Bart

2015-01-01

When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults’ generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants’ reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories. PMID:27648212
Overreliance on auditory feedback may lead to sound/syllable repetitions: simulations of stuttering and fluency-inducing conditions with a neural model of speech production

PubMed Central

Civier, Oren; Tasko, Stephen M.; Guenther, Frank H.

2010-01-01

This paper investigates the hypothesis that stuttering may result in part from impaired readout of feedforward control of speech, which forces persons who stutter (PWS) to produce speech with a motor strategy that is weighted too much toward auditory feedback control. Over-reliance on feedback control leads to production errors which, if they grow large enough, can cause the motor system to “reset” and repeat the current syllable. This hypothesis is investigated using computer simulations of a “neurally impaired” version of the DIVA model, a neural network model of speech acquisition and production. The model’s outputs are compared to published acoustic data from PWS’ fluent speech, and to combined acoustic and articulatory movement data collected from the dysfluent speech of one PWS. The simulations mimic the errors observed in the PWS subject’s speech, as well as the repairs of these errors. Additional simulations were able to account for enhancements of fluency gained by slowed/prolonged speech and masking noise. Together these results support the hypothesis that many dysfluencies in stuttering are due to a bias away from feedforward control and toward feedback control. PMID:20831971
Vowels in clear and conversational speech: Talker differences in acoustic characteristics and intelligibility for normal-hearing listeners

NASA Astrophysics Data System (ADS)

Hargus Ferguson, Sarah; Kewley-Port, Diane

2002-05-01

Several studies have shown that when a talker is instructed to speak as though talking to a hearing-impaired person, the resulting ``clear'' speech is significantly more intelligible than typical conversational speech. Recent work in this lab suggests that talkers vary in how much their intelligibility improves when they are instructed to speak clearly. The few studies examining acoustic characteristics of clear and conversational speech suggest that these differing clear speech effects result from different acoustic strategies on the part of individual talkers. However, only two studies to date have directly examined differences among talkers producing clear versus conversational speech, and neither included acoustic analysis. In this project, clear and conversational speech was recorded from 41 male and female talkers aged 18-45 years. A listening experiment demonstrated that for normal-hearing listeners in noise, vowel intelligibility varied widely among the 41 talkers for both speaking styles, as did the magnitude of the speaking style effect. Acoustic analyses using stimuli from a subgroup of talkers shown to have a range of speaking style effects will be used to assess specific acoustic correlates of vowel intelligibility in clear and conversational speech. [Work supported by NIHDCD-02229.
A voice-input voice-output communication aid for people with severe speech impairment.

PubMed

Hawley, Mark S; Cunningham, Stuart P; Green, Phil D; Enderby, Pam; Palmer, Rebecca; Sehgal, Siddharth; O'Neill, Peter

2013-01-01

A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.
Loudness perception and speech intensity control in Parkinson's disease.

PubMed

Clark, Jenna P; Adams, Scott G; Dykstra, Allyson D; Moodie, Shane; Jog, Mandar

2014-01-01

The aim of this study was to examine loudness perception in individuals with hypophonia and Parkinson's disease. The participants included 17 individuals with hypophonia related to Parkinson's disease (PD) and 25 age-equivalent controls. The three loudness perception tasks included a magnitude estimation procedure involving a sentence spoken at 60, 65, 70, 75 and 80 dB SPL, an imitation task involving a sentence spoken at 60, 65, 70, 75 and 80 dB SPL, and a magnitude production procedure involving the production of a sentence at five different loudness levels (habitual, two and four times louder and two and four times quieter). The participants with PD produced a significantly different pattern and used a more restricted range than the controls in their perception of speech loudness, imitation of speech intensity, and self-generated estimates of speech loudness. The results support a speech loudness perception deficit in PD involving an abnormal perception of externally generated and self-generated speech intensity. Readers will recognize that individuals with hypophonia related to Parkinson's disease may demonstrate a speech loudness perception deficit involving the abnormal perception of externally generated and self-generated speech intensity. Copyright © 2014 Elsevier Inc. All rights reserved.
The effect of bone conduction microphone placement on intensity and spectrum of transmitted speech items.

PubMed

Tran, Phuong K; Letowski, Tomasz R; McBride, Maranda E

2013-06-01

Speech signals can be converted into electrical audio signals using either conventional air conduction (AC) microphone or a contact bone conduction (BC) microphone. The goal of this study was to investigate the effects of the location of a BC microphone on the intensity and frequency spectrum of the recorded speech. Twelve locations, 11 on the talker's head and 1 on the collar bone, were investigated. The speech sounds were three vowels (/u/, /a/, /i/) and two consonants (/m/, /∫/). The sounds were produced by 12 talkers. Each sound was recorded simultaneously with two BC microphones and an AC microphone. Analyzed spectral data showed that the BC recordings made at the forehead of the talker were the most similar to the AC recordings, whereas the collar bone recordings were most different. Comparison of the spectral data with speech intelligibility data collected in another study revealed a strong negative relationship between BC speech intelligibility and the degree of deviation of the BC speech spectrum from the AC spectrum. In addition, the head locations that resulted in the highest speech intelligibility were associated with the lowest output signals among all tested locations. Implications of these findings for BC communication are discussed.
Connected word recognition using a cascaded neuro-computational model

NASA Astrophysics Data System (ADS)

Hoya, Tetsuya; van Leeuwen, Cees

2016-10-01

We propose a novel framework for processing a continuous speech stream that contains a varying number of words, as well as non-speech periods. Speech samples are segmented into word-tokens and non-speech periods. An augmented version of an earlier-proposed, cascaded neuro-computational model is used for recognising individual words within the stream. Simulation studies using both a multi-speaker-dependent and speaker-independent digit string database show that the proposed method yields a recognition performance comparable to that obtained by a benchmark approach using hidden Markov models with embedded training.

Design of a robust baseband LPC coder for speech transmission over 9.6 kbit/s noisy channels

NASA Astrophysics Data System (ADS)

Viswanathan, V. R.; Russell, W. H.; Higgins, A. L.

1982-04-01

This paper describes the design of a baseband Linear Predictive Coder (LPC) which transmits speech over 9.6 kbit/sec synchronous channels with random bit errors of up to 1%. Presented are the results of our investigation of a number of aspects of the baseband LPC coder with the goal of maximizing the quality of the transmitted speech. Important among these aspects are: bandwidth of the baseband, coding of the baseband residual, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder. This optimized speech coding algorithm has been implemented as a real-time full-duplex system on an array processor. Informal listening tests of the real-time coder have shown that the coder produces good speech quality in the absence of channel bit errors and introduces only a slight degradation in quality for channel bit error rates of up to 1%.
Perceptual learning for speech in noise after application of binary time-frequency masks

PubMed Central

Ahmadi, Mahnaz; Gross, Vauna L.; Sinex, Donal G.

2013-01-01

Ideal time-frequency (TF) masks can reject noise and improve the recognition of speech-noise mixtures. An ideal TF mask is constructed with prior knowledge of the target speech signal. The intelligibility of a processed speech-noise mixture depends upon the threshold criterion used to define the TF mask. The study reported here assessed the effect of training on the recognition of speech in noise after processing by ideal TF masks that did not restore perfect speech intelligibility. Two groups of listeners with normal hearing listened to speech-noise mixtures processed by TF masks calculated with different threshold criteria. For each group, a threshold criterion that initially produced word recognition scores between 0.56–0.69 was chosen for training. Listeners practiced with one set of TF-masked sentences until their word recognition performance approached asymptote. Perceptual learning was quantified by comparing word-recognition scores in the first and last training sessions. Word recognition scores improved with practice for all listeners with the greatest improvement observed for the same materials used in training. PMID:23464038
Is Listening in Noise Worth It? The Neurobiology of Speech Recognition in Challenging Listening Conditions.

PubMed

Eckert, Mark A; Teubner-Rhodes, Susan; Vaden, Kenneth I

2016-01-01

This review examines findings from functional neuroimaging studies of speech recognition in noise to provide a neural systems level explanation for the effort and fatigue that can be experienced during speech recognition in challenging listening conditions. Neuroimaging studies of speech recognition consistently demonstrate that challenging listening conditions engage neural systems that are used to monitor and optimize performance across a wide range of tasks. These systems appear to improve speech recognition in younger and older adults, but sustained engagement of these systems also appears to produce an experience of effort and fatigue that may affect the value of communication. When considered in the broader context of the neuroimaging and decision making literature, the speech recognition findings from functional imaging studies indicate that the expected value, or expected level of speech recognition given the difficulty of listening conditions, should be considered when measuring effort and fatigue. The authors propose that the behavioral economics or neuroeconomics of listening can provide a conceptual and experimental framework for understanding effort and fatigue that may have clinical significance.
Is Listening in Noise Worth It? The Neurobiology of Speech Recognition in Challenging Listening Conditions

PubMed Central

Eckert, Mark A.; Teubner-Rhodes, Susan; Vaden, Kenneth I.

2016-01-01

This review examines findings from functional neuroimaging studies of speech recognition in noise to provide a neural systems level explanation for the effort and fatigue that can be experienced during speech recognition in challenging listening conditions. Neuroimaging studies of speech recognition consistently demonstrate that challenging listening conditions engage neural systems that are used to monitor and optimize performance across a wide range of tasks. These systems appear to improve speech recognition in younger and older adults, but sustained engagement of these systems also appears to produce an experience of effort and fatigue that may affect the value of communication. When considered in the broader context of the neuroimaging and decision making literature, the speech recognition findings from functional imaging studies indicate that the expected value, or expected level of speech recognition given the difficulty of listening conditions, should be considered when measuring effort and fatigue. We propose that the behavioral economics and/or neuroeconomics of listening can provide a conceptual and experimental framework for understanding effort and fatigue that may have clinical significance. PMID:27355759
Speech after Radial Forearm Free Flap Reconstruction of the Tongue: A Longitudinal Acoustic Study of Vowel and Diphthong Sounds

ERIC Educational Resources Information Center

Laaksonen, Juha-Pertti; Rieger, Jana; Happonen, Risto-Pekka; Harris, Jeffrey; Seikaly, Hadi

2010-01-01

The purpose of this study was to use acoustic analyses to describe speech outcomes over the course of 1 year after radial forearm free flap (RFFF) reconstruction of the tongue. Eighteen Canadian English-speaking females and males with reconstruction for oral cancer had speech samples recorded (pre-operative, and 1 month, 6 months, and 1 year…
Speech Sound Disorders in Preschool Children: Correspondence between Clinical Diagnosis and Teacher and Parent Report

ERIC Educational Resources Information Center

Harrison, Linda J.; McLeod, Sharynne; McAllister, Lindy; McCormack, Jane

2017-01-01

This study sought to assess the level of correspondence between parent and teacher report of concern about young children's speech and specialist assessment of speech sound disorders (SSD). A sample of 157 children aged 4-5 years was recruited in preschools and long day care centres in Victoria and New South Wales (NSW). SSD was assessed…
Regular/Irregular is Not the Whole Story: The Role of Frequency and Generalization in the Acquisition of German Past Participle Inflection

ERIC Educational Resources Information Center

Szagun, Gisela

2011-01-01

The acquisition of German participle inflection was investigated using spontaneous speech samples from six children between 1 ; 4 and 3 ; 8 and ten children between 1 ; 4 and 2 ; 10 recorded longitudinally at regular intervals. Child-directed speech was also analyzed. In adult and child speech weak participles were significantly more frequent than…
SPEECH PERCEPTION AS A TALKER-CONTINGENT PROCESS

PubMed Central

Nygaard, Lynne C.; Sommers, Mitchell S.; Pisoni, David B.

2011-01-01

To determine how familiarity with a talker’s voice affects perception of spoken words, we trained two groups of subjects to recognize a set of voices over a 9-day period. One group then identified novel words produced by the same set of talkers at four signal-to-noise ratios. Control subjects identified the same words produced by a different set of talkers. The results showed that the ability to identify a talker’s voice improved intelligibility of novel words produced by that talker. The results suggest that speech perception may involve talker-contingent processes whereby perceptual learning of aspects of the vocal source facilitates the subsequent phonetic analysis of the acoustic signal. PMID:21526138
Using genetic algorithms with subjective input from human subjects: implications for fitting hearing aids and cochlear implants.

PubMed

Başkent, Deniz; Eiler, Cheryl L; Edwards, Brent

2007-06-01

To present a comprehensive analysis of the feasibility of genetic algorithms (GA) for finding the best fit of hearing aids or cochlear implants for individual users in clinical or research settings, where the algorithm is solely driven by subjective human input. Due to varying pathology, the best settings of an auditory device differ for each user. It is also likely that listening preferences vary at the same time. The settings of a device customized for a particular user can only be evaluated by the user. When optimization algorithms are used for fitting purposes, this situation poses a difficulty for a systematic and quantitative evaluation of the suitability of the fitting parameters produced by the algorithm. In the present study, an artificial listening environment was generated by distorting speech using a noiseband vocoder. The settings produced by the GA for this listening problem could objectively be evaluated by measuring speech recognition and comparing the performance to the best vocoder condition where speech was least distorted. Nine normal-hearing subjects participated in the study. The parameters to be optimized were the number of vocoder channels, the shift between the input frequency range and the synthesis frequency range, and the compression-expansion of the input frequency range over the synthesis frequency range. The subjects listened to pairs of sentences processed with the vocoder, and entered a preference for the sentence with better intelligibility. The GA modified the solutions iteratively according to the subject preferences. The program converged when the user ranked the same set of parameters as the best in three consecutive steps. The results produced by the GA were analyzed for quality by measuring speech intelligibility, for test-retest reliability by running the GA three times with each subject, and for convergence properties. Speech recognition scores averaged across subjects were similar for the best vocoder solution and for the solutions produced by the GA. The average number of iterations was 8 and the average convergence time was 25.5 minutes. The settings produced by different GA runs for the same subject were slightly different; however, speech recognition scores measured with these settings were similar. Individual data from subjects showed that in each run, a small number of GA solutions produced poorer speech intelligibility than for the best setting. This was probably a result of the combination of the inherent randomness of the GA, the convergence criterion used in the present study, and possible errors that the users might have made during the paired comparisons. On the other hand, the effect of these errors was probably small compared to the other two factors, as a comparison between subjective preferences and objective measures showed that for many subjects the two were in good agreement. The results showed that the GA was able to produce good solutions by using listener preferences in a relatively short time. For practical applications, the program can be made more robust by running the GA twice or by not using an automatic stopping criterion, and it can be made faster by optimizing the number of the paired comparisons completed in each iteration.
Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.

The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore » decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.« less
Do not throw out the baby with the bath water: choosing an effective baseline for a functional localizer of speech processing.

PubMed

Stoppelman, Nadav; Harpaz, Tamar; Ben-Shachar, Michal

2013-05-01

Speech processing engages multiple cortical regions in the temporal, parietal, and frontal lobes. Isolating speech-sensitive cortex in individual participants is of major clinical and scientific importance. This task is complicated by the fact that responses to sensory and linguistic aspects of speech are tightly packed within the posterior superior temporal cortex. In functional magnetic resonance imaging (fMRI), various baseline conditions are typically used in order to isolate speech-specific from basic auditory responses. Using a short, continuous sampling paradigm, we show that reversed ("backward") speech, a commonly used auditory baseline for speech processing, removes much of the speech responses in frontal and temporal language regions of adult individuals. On the other hand, signal correlated noise (SCN) serves as an effective baseline for removing primary auditory responses while maintaining strong signals in the same language regions. We show that the response to reversed speech in left inferior frontal gyrus decays significantly faster than the response to speech, thus suggesting that this response reflects bottom-up activation of speech analysis followed up by top-down attenuation once the signal is classified as nonspeech. The results overall favor SCN as an auditory baseline for speech processing.
The effect of language experience on perceptual normalization of Mandarin tones and non-speech pitch contours.

PubMed

Luo, Xin; Ashmore, Krista B

2014-06-01

Context-dependent pitch perception helps listeners recognize tones produced by speakers with different fundamental frequencies (f0s). The role of language experience in tone normalization remains unclear. In this cross-language study of tone normalization, native Mandarin and English listeners were asked to recognize Mandarin Tone 1 (high-flat) and Tone 2 (mid-rising) with a preceding Mandarin sentence. To further test whether context-dependent pitch perception is speech-specific or domain-general, both language groups were asked to identify non-speech flat and rising pitch contours with a preceding non-speech flat pitch contour. Results showed that both Mandarin and English listeners made more rising responses with non-speech than with speech stimuli, due to differences in spectral complexity and listening task between the two stimulus types. English listeners made more rising responses than Mandarin listeners with both speech and non-speech stimuli. Contrastive context effects (more rising responses in the high-f0 context than in the low-f0 context) were found with both speech and non-speech stimuli for Mandarin listeners, but not for English listeners. English listeners' lack of tone experience may have caused more rising responses and limited use of context f0 cues. These results suggest that context-dependent pitch perception in tone normalization is domain-general, but influenced by long-term language experience.
Temporal modulations in speech and music.

PubMed

Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David

2017-10-01

Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
The Role of Clinical Experience in Speech-Language Pathologists' Perception of Subphonemic Detail in Children's Speech

PubMed Central

Munson, Benjamin; Johnson, Julie M.; Edwards, Jan

2013-01-01

Purpose This study examined whether experienced speech-language pathologists differ from inexperienced people in their perception of phonetic detail in children's speech. Method Convenience samples comprising 21 experienced speech-language pathologist and 21 inexperienced listeners participated in a series of tasks in which they made visual-analog scale (VAS) ratings of children's natural productions of target /s/-/θ/, /t/-/k/, and /d/-/ɡ/ in word-initial position. Listeners rated the perception distance between individual productions and ideal productions. Results The experienced listeners' ratings differed from inexperienced listeners' in four ways: they had higher intra-rater reliability, they showed less bias toward a more frequent sound, their ratings were more closely related to the acoustic characteristics of the children's speech, and their responses were related to a different set of predictor variables. Conclusions Results suggest that experience working as a speech-language pathologist leads to better perception of phonetic detail in children's speech. Limitations and future research are discussed. PMID:22230182
Developmental profile of speech-language and communicative functions in an individual with the Preserved Speech Variant of Rett syndrome

PubMed Central

Marschik, Peter B.; Vollmann, Ralf; Bartl-Pokorny, Katrin D.; Green, Vanessa A.; van der Meer, Larah; Wolin, Thomas; Einspieler, Christa

2018-01-01

Objective We assessed various aspects of speech-language and communicative functions of an individual with the preserved speech variant (PSV) of Rett syndrome (RTT) to describe her developmental profile over a period of 11 years. Methods For this study we incorporated the following data resources and methods to assess speech-language and communicative functions during pre-, peri- and post-regressional development: retrospective video analyses, medical history data, parental checklists and diaries, standardized tests on vocabulary and grammar, spontaneous speech samples, and picture stories to elicit narrative competences. Results Despite achieving speech-language milestones, atypical behaviours were present at all times. We observed a unique developmental speech-language trajectory (including the RTT typical regression) affecting all linguistic and socio-communicative sub-domains in the receptive as well as the expressive modality. Conclusion Future research should take into consideration a potentially considerable discordance between formal and functional language use by interpreting communicative acts on a more cautionary note. PMID:23870013
Developmental profile of speech-language and communicative functions in an individual with the preserved speech variant of Rett syndrome.

PubMed

Marschik, Peter B; Vollmann, Ralf; Bartl-Pokorny, Katrin D; Green, Vanessa A; van der Meer, Larah; Wolin, Thomas; Einspieler, Christa

2014-08-01

We assessed various aspects of speech-language and communicative functions of an individual with the preserved speech variant of Rett syndrome (RTT) to describe her developmental profile over a period of 11 years. For this study, we incorporated the following data resources and methods to assess speech-language and communicative functions during pre-, peri- and post-regressional development: retrospective video analyses, medical history data, parental checklists and diaries, standardized tests on vocabulary and grammar, spontaneous speech samples and picture stories to elicit narrative competences. Despite achieving speech-language milestones, atypical behaviours were present at all times. We observed a unique developmental speech-language trajectory (including the RTT typical regression) affecting all linguistic and socio-communicative sub-domains in the receptive as well as the expressive modality. Future research should take into consideration a potentially considerable discordance between formal and functional language use by interpreting communicative acts on a more cautionary note.
Hearing impaired speech in noisy classrooms

NASA Astrophysics Data System (ADS)

Shahin, Kimary; McKellin, William H.; Jamieson, Janet; Hodgson, Murray; Pichora-Fuller, M. Kathleen

2005-04-01

Noisy classrooms have been shown to induce among students patterns of interaction similar to those used by hearing impaired people [W. H. McKellin et al., GURT (2003)]. In this research, the speech of children in a noisy classroom setting was investigated to determine if noisy classrooms have an effect on students' speech. Audio recordings were made of the speech of students during group work in their regular classrooms (grades 1-7), and of the speech of the same students in a sound booth. Noise level readings in the classrooms were also recorded. Each student's noisy and quiet environment speech samples were acoustically analyzed for prosodic and segmental properties (f0, pitch range, pitch variation, phoneme duration, vowel formants), and compared. The analysis showed that the students' speech in the noisy classrooms had characteristics of the speech of hearing-impaired persons [e.g., R. O'Halpin, Clin. Ling. and Phon. 15, 529-550 (2001)]. Some educational implications of our findings were identified. [Work supported by the Peter Wall Institute for Advanced Studies, University of British Columbia.
Missouri Assessment Program, Spring 2002: Social Studies, Grade 8. Released Items [and] Scoring Guide.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This booklet contains sample items from the Missouri social studies test for eighth graders. The first sample is based on a speech delivered by Elizabeth Cady Stanton in the mid-1880s, which proposed a new approach to raising girls. Students are directed to use their own knowledge and the speech excerpt to do three activities. The second sample…
Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age

PubMed Central

Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik

2015-01-01

Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259
An experiment with spectral analysis of emotional speech affected by orthodontic appliances

NASA Astrophysics Data System (ADS)

Přibil, Jiří; Přibilová, Anna; Ďuračková, Daniela

2012-11-01

The contribution describes the effect of the fixed and removable orthodontic appliances on spectral properties of emotional speech. Spectral changes were analyzed and evaluated by spectrograms and mean Welch’s periodograms. This alternative approach to the standard listening test enables to obtain objective comparison based on statistical analysis by ANOVA and hypothesis tests. Obtained results of analysis performed on short sentences of a female speaker in four emotional states (joyous, sad, angry, and neutral) show that, first of all, the removable orthodontic appliance affects the spectrograms of produced speech.

Selective spatial attention modulates bottom-up informational masking of speech

PubMed Central

Carlile, Simon; Corkhill, Caitlin

2015-01-01

To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention. PMID:25727100
Selective spatial attention modulates bottom-up informational masking of speech.

PubMed

Carlile, Simon; Corkhill, Caitlin

2015-03-02

To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention.
Affective Properties of Mothers' Speech to Infants With Hearing Impairment and Cochlear Implants

PubMed Central

Bergeson, Tonya R.; Xu, Huiping; Kitamura, Christine

2015-01-01

Purpose The affective properties of infant-directed speech influence the attention of infants with normal hearing to speech sounds. This study explored the affective quality of maternal speech to infants with hearing impairment (HI) during the 1st year after cochlear implantation as compared to speech to infants with normal hearing. Method Mothers of infants with HI and mothers of infants with normal hearing matched by age (NH-AM) or hearing experience (NH-EM) were recorded playing with their infants during 3 sessions over a 12-month period. Speech samples of 25 s were low-pass filtered, leaving intonation but not speech information intact. Sixty adults rated the stimuli along 5 scales: positive/negative affect and intention to express affection, to encourage attention, to comfort/soothe, and to direct behavior. Results Low-pass filtered speech to HI and NH-EM groups was rated as more positive, affective, and comforting compared with the such speech to the NH-AM group. Speech to infants with HI and with NH-AM was rated as more directive than speech to the NH-EM group. Mothers decreased affective qualities in speech to all infants but increased directive qualities in speech to infants with NH-EM over time. Conclusions Mothers fine-tune communicative intent in speech to their infant's developmental stage. They adjust affective qualities to infants' hearing experience rather than to chronological age but adjust directive qualities of speech to the chronological age of their infants. PMID:25679195
Speech sound disorders in a community study of preschool children.

PubMed

McLeod, Sharynne; Harrison, Linda J; McAllister, Lindy; McCormack, Jane

2013-08-01

To undertake a community (nonclinical) study to describe the speech of preschool children who had been identified by parents/teachers as having difficulties "talking and making speech sounds" and compare the speech characteristics of those who had and had not accessed the services of a speech-language pathologist (SLP). Stage 1: Parent/teacher concern regarding the speech skills of 1,097 4- to 5-year-old children attending early childhood centers was documented. Stage 2a: One hundred forty-three children who had been identified with concerns were assessed. Stage 2b: Parents returned questionnaires about service access for 109 children. The majority of the 143 children (86.7%) achieved a standard score below the normal range for the percentage of consonants correct (PCC) on the Diagnostic Evaluation of Articulation and Phonology (Dodd, Hua, Crosbie, Holm, & Ozanne, 2002). Consonants produced incorrectly were consistent with the late-8 phonemes ( Shriberg, 1993). Common phonological patterns were fricative simplification (82.5%), cluster simplification (49.0%)/reduction (19.6%), gliding (41.3%), and palatal fronting (15.4%). Interdental lisps on /s/ and /z/ were produced by 39.9% of the children, dentalization of other sibilants by 17.5%, and lateral lisps by 13.3%. Despite parent/teacher concern, only 41/109 children had contact with an SLP. These children were more likely to be unintelligible to strangers, to express distress about their speech, and to have a lower PCC and a smaller consonant inventory compared to the children who had no contact with an SLP. A significant number of preschool-age children with speech sound disorders (SSD) have not had contact with an SLP. These children have mild-severe SSD and would benefit from SLP intervention. Integrated SLP services within early childhood communities would enable earlier identification of SSD and access to intervention to reduce potential educational and social impacts affiliated with SSD.
Brainstem Encoding of Aided Speech in Hearing Aid Users with Cochlear Dead Region(s)

PubMed Central

Hassaan, Mohammad Ramadan; Ibraheem, Ola Abdallah; Galhom, Dalia Helal

2016-01-01

Introduction Neural encoding of speech begins with the analysis of the signal as a whole broken down into its sinusoidal components in the cochlea, which has to be conserved up to the higher auditory centers. Some of these components target the dead regions of the cochlea causing little or no excitation. Measuring aided speech-evoked auditory brainstem response elicited by speech stimuli with different spectral maxima can give insight into the brainstem encoding of aided speech with spectral maxima at these dead regions. Objective This research aims to study the impact of dead regions of the cochlea on speech processing at the brainstem level after a long period of hearing aid use. Methods This study comprised 30 ears without dead regions and 46 ears with dead regions at low, mid, or high frequencies. For all ears, we measured the aided speech-evoked auditory brainstem response using speech stimuli of low, mid, and high spectral maxima. Results Aided speech-evoked auditory brainstem response was producible in all subjects. Responses evoked by stimuli with spectral maxima at dead regions had longer latencies and smaller amplitudes when compared with the control group or the responses of other stimuli. Conclusion The presence of cochlear dead regions affects brainstem encoding of speech with spectral maxima perpendicular to these regions. Brainstem neuroplasticity and the extrinsic redundancy of speech can minimize the impact of dead regions in chronic hearing aid users. PMID:27413404
Individual differneces in degraded speech perception

NASA Astrophysics Data System (ADS)

Carbonell, Kathy M.

One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.
Perception of Sung Speech in Bimodal Cochlear Implant Users.

PubMed

Crew, Joseph D; Galvin, John J; Fu, Qian-Jie

2016-11-11

Combined use of a hearing aid (HA) and cochlear implant (CI) has been shown to improve CI users' speech and music performance. However, different hearing devices, test stimuli, and listening tasks may interact and obscure bimodal benefits. In this study, speech and music perception were measured in bimodal listeners for CI-only, HA-only, and CI + HA conditions, using the Sung Speech Corpus, a database of monosyllabic words produced at different fundamental frequencies. Sentence recognition was measured using sung speech in which pitch was held constant or varied across words, as well as for spoken speech. Melodic contour identification (MCI) was measured using sung speech in which the words were held constant or varied across notes. Results showed that sentence recognition was poorer with sung speech relative to spoken, with little difference between sung speech with a constant or variable pitch; mean performance was better with CI-only relative to HA-only, and best with CI + HA. MCI performance was better with constant words versus variable words; mean performance was better with HA-only than with CI-only and was best with CI + HA. Relative to CI-only, a strong bimodal benefit was observed for speech and music perception. Relative to the better ear, bimodal benefits remained strong for sentence recognition but were marginal for MCI. While variations in pitch and timbre may negatively affect CI users' speech and music perception, bimodal listening may partially compensate for these deficits. © The Author(s) 2016.
Developmental changes in brain activation involved in the production of novel speech sounds in children.

PubMed

Hashizume, Hiroshi; Taki, Yasuyuki; Sassa, Yuko; Thyreau, Benjamin; Asano, Michiko; Asano, Kohei; Takeuchi, Hikaru; Nouchi, Rui; Kotozaki, Yuka; Jeong, Hyeonjeong; Sugiura, Motoaki; Kawashima, Ryuta

2014-08-01

Older children are more successful at producing unfamiliar, non-native speech sounds than younger children during the initial stages of learning. To reveal the neuronal underpinning of the age-related increase in the accuracy of non-native speech production, we examined the developmental changes in activation involved in the production of novel speech sounds using functional magnetic resonance imaging. Healthy right-handed children (aged 6-18 years) were scanned while performing an overt repetition task and a perceptual task involving aurally presented non-native and native syllables. Productions of non-native speech sounds were recorded and evaluated by native speakers. The mouth regions in the bilateral primary sensorimotor areas were activated more significantly during the repetition task relative to the perceptual task. The hemodynamic response in the left inferior frontal gyrus pars opercularis (IFG pOp) specific to non-native speech sound production (defined by prior hypothesis) increased with age. Additionally, the accuracy of non-native speech sound production increased with age. These results provide the first evidence of developmental changes in the neural processes underlying the production of novel speech sounds. Our data further suggest that the recruitment of the left IFG pOp during the production of novel speech sounds was possibly enhanced due to the maturation of the neuronal circuits needed for speech motor planning. This, in turn, would lead to improvement in the ability to immediately imitate non-native speech. Copyright © 2014 Wiley Periodicals, Inc.
Perception and analysis of Spanish accents in English speech

NASA Astrophysics Data System (ADS)

Chism, Cori; Lass, Norman

2002-05-01

The purpose of the present study was to determine what relates most closely to the degree of perceived foreign accent in the English speech of native Spanish speakers: intonation, vowel length, stress, voice onset time (VOT), or segmental accuracy. Nineteen native English speaking listeners rated speech samples from 7 native English speakers and 15 native Spanish speakers for comprehensibility and degree of foreign accent. The speech samples were analyzed spectrographically and perceptually to obtain numerical values for each variable. Correlation coefficients were computed to determine the relationship beween these values and the average foreign accent scores. Results showed that the average foreign accent scores were statistically significantly correlated with three variables: the length of stressed vowels (r=-0.48, p=0.05), voice onset time (r =-0.62, p=0.01), and segmental accuracy (r=0.92, p=0.001). Implications of these findings and suggestions for future research are discussed.
Speech vs. singing: infants choose happier sounds

PubMed Central

Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle

2013-01-01

Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119
The Relationship Between Apraxia of Speech and Oral Apraxia: Association or Dissociation?

PubMed

Whiteside, Sandra P; Dyson, Lucy; Cowell, Patricia E; Varley, Rosemary A

2015-11-01

Acquired apraxia of speech (AOS) is a motor speech disorder that affects the implementation of articulatory gestures and the fluency and intelligibility of speech. Oral apraxia (OA) is an impairment of nonspeech volitional movement. Although many speakers with AOS also display difficulties with volitional nonspeech oral movements, the relationship between the 2 conditions is unclear. This study explored the relationship between speech and volitional nonspeech oral movement impairment in a sample of 50 participants with AOS. We examined levels of association and dissociation between speech and OA using a battery of nonspeech oromotor, speech, and auditory/aphasia tasks. There was evidence of a moderate positive association between the 2 impairments across participants. However, individual profiles revealed patterns of dissociation between the 2 in a few cases, with evidence of double dissociation of speech and oral apraxic impairment. We discuss the implications of these relationships for models of oral motor and speech control. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The speech naturalness of people who stutter speaking under delayed auditory feedback as perceived by different groups of listeners.

PubMed

Van Borsel, John; Eeckhout, Hannelore

2008-09-01

This study investigated listeners' perception of the speech naturalness of people who stutter (PWS) speaking under delayed auditory feedback (DAF) with particular attention for possible listener differences. Three panels of judges consisting of 14 stuttering individuals, 14 speech language pathologists, and 14 naive listeners rated the naturalness of speech samples of stuttering and non-stuttering individuals using a 9-point interval scale. Results clearly indicate that these three groups evaluate naturalness differently. Naive listeners appear to be more severe in their judgements than speech language pathologists and stuttering listeners, and speech language pathologists are apparently more severe than PWS. The three listener groups showed similar trends with respect to the relationship between speech naturalness and speech rate. Results of all three indicated that for PWS, the slower a speaker's rate was, the less natural speech was judged to sound. The three listener groups also showed similar trends with regard to naturalness of the stuttering versus the non-stuttering individuals. All three panels considered the speech of the non-stuttering participants more natural. The reader will be able to: (1) discuss the speech naturalness of people who stutter speaking under delayed auditory feedback, (2) discuss listener differences about the naturalness of people who stutter speaking under delayed auditory feedback, and (3) discuss the importance of speech rate for the naturalness of speech.
Scores on Riley's stuttering severity instrument versions three and four for samples of different length and for different types of speech material.

PubMed

Todd, Helena; Mirawdeli, Avin; Costelloe, Sarah; Cavenagh, Penny; Davis, Stephen; Howell, Peter

2014-12-01

Riley stated that the minimum speech sample length necessary to compute his stuttering severity estimates was 200 syllables. This was investigated. Procedures supplied for the assessment of readers and non-readers were examined to see whether they give equivalent scores. Recordings of spontaneous speech samples from 23 young children (aged between 2 years 8 months and 6 years 3 months) and 31 older children (aged between 10 years 0 months and 14 years 7 months) were made. Riley's severity estimates were scored on extracts of different lengths. The older children provided spontaneous and read samples, which were scored for severity according to reader and non-reader procedures. Analysis of variance supported the use of 200-syllable-long samples as the minimum necessary for obtaining severity scores. There was no significant difference in SSI-3 scores for the older children when the reader and non-reader procedures were used. Samples that are 200-syllables long are the minimum that is appropriate for obtaining stable Riley's severity scores. The procedural variants provide similar severity scores.
A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

PubMed

Magnotti, John F; Beauchamp, Michael S

2017-02-01

Audiovisual speech integration combines information from auditory speech (talker's voice) and visual speech (talker's mouth movements) to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga), that are integrated to produce a fused percept ("da"). This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba). We describe a simplified model of causal inference in multisensory speech perception (CIMS) that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.
Speech Deficits in Serious mental Illness: A Cognitive Resource Issue?

PubMed Central

Cohen, Alex S.; McGovern, Jessica E.; Dinzeo, Thomas J.; Covington, Michael A.

2014-01-01

Speech deficits, notably those involved in psychomotor retardation, blunted affect, alogia and poverty of content of speech, are pronounced in a wide range of serious mental illnesses (e.g., schizophrenia, unipolar depression, bipolar disorders). The present project evaluated the degree to which these deficits manifest as a function of cognitive resource limitations. We examined natural speech from 52 patients meeting criteria for serious mental illnesses (i.e., severe functional deficits with a concomitant diagnosis of schizophrenia, unipolar and/or bipolar affective disorders) and 30 non-psychiatric controls using a range of objective, computer-based measures tapping speech production (“alogia”), variability (“blunted vocal affect”) and content (“poverty of content of speech”). Subjects produced natural speech during a baseline condition and while engaging in an experimentally-manipulated cognitively-effortful task. For correlational analysis, cognitive ability was measured using a standardized battery. Generally speaking, speech deficits did not differ as a function of SMI diagnosis. However, every speech production and content measure was significantly abnormal in SMI versus control groups. Speech variability measures generally did not differ between groups. For both patients and controls as a group, speech during the cognitively-effortful task was sparser and less rich in content. Relative to controls, patients were abnormal under cognitive load with respect only to average pause length. Correlations between the speech variables and cognitive ability were only significant for this same variable: average pause length. Results suggest that certain speech deficits, notably involving pause length, may manifest as a function of cognitive resource limitations. Implications for treatment, research and assessment are discussed. PMID:25464920
Simulating Sli: General Cognitive Processing Stressors Can Produce a Specific Linguistic Profile.

ERIC Educational Resources Information Center

Hayiou-Thomas, Marianna E.; Bishop, Dorothy V.M.; Plunkett, Kim

2004-01-01

This study attempted to model specific language impairment (SLI) in a group of 6year-old children with typically developing language by introducing cognitive stress factors into a grammaticality judgment task. At normal speech rate, all children had near-perfect performance. When the speech signal was compressed to 50% of its original rate, to…
Evidence for a Familial Speech Sound Disorder Subtype in a Multigenerational Study of Oral and Hand Motor Sequencing Ability

ERIC Educational Resources Information Center

Peter, Beate; Raskind, Wendy H.

2011-01-01

Purpose: To evaluate phenotypic expressions of speech sound disorder (SSD) in multigenerational families with evidence of familial forms of SSD. Method: Members of five multigenerational families (N = 36) produced rapid sequences of monosyllables and disyllables and tapped computer keys with repetitive and alternating movements. Results: Measures…
An fMRI Investigation of Covertly and Overtly Produced Mono- And Multisyllabic Words

ERIC Educational Resources Information Center

Shuster, Linda I.; Lemieux, Susan K.

2005-01-01

Studies suggest that the left insula may play an important role in speech motor programming. We used functional magnetic resonance imaging to investigate the role of the left insula in the production of monosyllabic or multisyllabic words during overt and covert speech conditions. The left insula did not show a BOLD response for multisyllabic…
Toward a Systematic Evaluation of Vowel Target Events across Speech Tasks

ERIC Educational Resources Information Center

Kuo, Christina

2011-01-01

The core objective of this study was to examine whether acoustic variability of vowel production in American English, across speaking tasks, is systematic. Ten male speakers who spoke a relatively homogeneous Wisconsin dialect produced eight monophthong vowels (in hVd and CVC contexts) in four speaking tasks, including clear-speech, citation form,…
How Children and Adults Produce and Perceive Uncertainty in Audiovisual Speech

ERIC Educational Resources Information Center

Krahmer, Emiel; Swerts, Marc

2005-01-01

We describe two experiments on signaling and detecting uncertainty in audiovisual speech by adults and children. In the first study, utterances from adult speakers and child speakers (aged 7-8) were elicited and annotated with a set of six audiovisual features. It was found that when adult speakers were uncertain they were more likely to produce…

Readability Statistics of Patient Information Leaflets in a Speech and Language Therapy Department

ERIC Educational Resources Information Center

Pothier, Louise; Day, Rachael; Harris, Catherine; Pothier, David D.

2008-01-01

Background: Information leaflets are commonly used in Speech and Language Therapy Departments. Despite widespread use, they can be of variable quality. Aims: To revise current departmental leaflets using the National Health Service (NHS) Toolkit for Producing Patient Information and to test the effect that this has on the readability scores of the…
Age-Related Changes to Speech Breathing with Increased Vocal Loudness

ERIC Educational Resources Information Center

Huber, Jessica E.; Spruill, John, III

2008-01-01

Purpose: The present study examines the effect of normal aging on respiratory support for speech when utterance length is controlled. Method: Fifteen women (M = 71 years of age) and 10 men (M = 73 years of age) produced 2 sentences of different lengths in 4 loudness conditions while respiratory kinematics were measured. Measures included those…
The Different Time Course of Phonotactic Constraint Learning in Children and Adults: Evidence from Speech Errors

ERIC Educational Resources Information Center

Smalle, Eleonore H. M.; Muylle, Merel; Szmalec, Arnaud; Duyck, Wouter

2017-01-01

Speech errors typically respect the speaker's implicit knowledge of language-wide phonotactics (e.g., /t/ cannot be a syllable onset in the English language). Previous work demonstrated that adults can learn novel experimentally induced phonotactic constraints by producing syllable strings in which the allowable position of a phoneme depends on…
Suprasegmental Characteristics of Spontaneous Speech Produced in Good and Challenging Communicative Conditions by Talkers Aged 9-14 Years

ERIC Educational Resources Information Center

Hazan, Valerie; Tuomainen, Outi; Pettinato, Michèle

2016-01-01

Purpose: This study investigated the acoustic characteristics of spontaneous speech by talkers aged 9-14 years and their ability to adapt these characteristics to maintain effective communication when intelligibility was artificially degraded for their interlocutor. Method: Recordings were made for 96 children (50 female participants, 46 male…
Widening the temporal window: Processing support in the treatment of aphasic language production

PubMed Central

Linebarger, Marcia; McCall, Denise; Virata, Telana; Berndt, Rita Sloan

2007-01-01

Investigations of language processing in aphasia have increasingly implicated performance factors such as slowed activation and/or rapid decay of linguistic information. This approach is supported by studies utilizing a communication system (SentenceShaper™) which functions as a “processing prosthesis.” The system may reduce the impact of processing limitations by allowing repeated refreshing of working memory and by increasing the opportunity for aphasic subjects to monitor their own speech. Some aphasic subjects are able to produce markedly more structured speech on the system than they are able to produce spontaneously, and periods of largely independent home use of SentenceShaper have been linked to treatment effects, that is, to gains in speech produced without the use of the system. The purpose of the current study was to follow up on these studies with a new group of subjects. A second goal was to determine whether repeated, unassisted elicitations of the same narratives at baseline would give rise to practice effects, which could undermine claims for the efficacy of the system. PMID:17069883
The Representation and Execution of Articulatory Timing in First and Second Language Acquisition.

PubMed

Redford, Melissa A; Oh, Grace E

2017-07-01

The early acquisition of language-specific temporal patterns relative to the late development of speech motor control suggests a dissociation between the representation and execution of articulatory timing. The current study tested for such a dissociation in first and second language acquisition. American English-speaking children (5- and 8-year-olds) and Korean-speaking adult learners of English repeatedly produced real English words in a simple carrier sentence. The words were designed to elicit different language-specific vowel length contrasts. Measures of absolute duration and variability in single vowel productions were extracted to evaluate the realization of contrasts (representation) and to index speech motor abilities (execution). Results were mostly consistent with a dissociation. Native English-speaking children produced the same language-specific temporal patterns as native English-speaking adults, but their productions were more variable than the adults'. In contrast, Korean-speaking adult learners of English typically produced different temporal patterns than native English-speaking adults, but their productions were as stable as the native speakers'. Implications of the results are discussed with reference to different models of speech production.
Gender Identification Using High-Frequency Speech Energy: Effects of Increasing the Low-Frequency Limit.

PubMed

Donai, Jeremy J; Halbritter, Rachel M

The purpose of this study was to investigate the ability of normal-hearing listeners to use high-frequency energy for gender identification from naturally produced speech signals. Two experiments were conducted using a repeated-measures design. Experiment 1 investigated the effects of increasing high-pass filter cutoff (i.e., increasing the low-frequency spectral limit) on gender identification from naturally produced vowel segments. Experiment 2 studied the effects of increasing high-pass filter cutoff on gender identification from naturally produced sentences. Confidence ratings for the gender identification task were also obtained for both experiments. Listeners in experiment 1 were capable of extracting talker gender information at levels significantly above chance from vowel segments high-pass filtered up to 8.5 kHz. Listeners in experiment 2 also performed above chance on the gender identification task from sentences high-pass filtered up to 12 kHz. Cumulatively, the results of both experiments provide evidence that normal-hearing listeners can utilize information from the very high-frequency region (above 4 to 5 kHz) of the speech signal for talker gender identification. These findings are at variance with current assumptions regarding the perceptual information regarding talker gender within this frequency region. The current results also corroborate and extend previous studies of the use of high-frequency speech energy for perceptual tasks. These findings have potential implications for the study of information contained within the high-frequency region of the speech spectrum and the role this region may play in navigating the auditory scene, particularly when the low-frequency portion of the spectrum is masked by environmental noise sources or for listeners with substantial hearing loss in the low-frequency region and better hearing sensitivity in the high-frequency region (i.e., reverse slope hearing loss).
An Analysis of the Variations from Standard English Pronunciation in the Phonetic Performance of Two Groups of Nonstandard-English-Speaking Children. Final Report.

ERIC Educational Resources Information Center

Williams, Frederick, Ed.; And Others

In this second of two studies conducted with portions of the National Speech and Hearing Survey data, the investigators analyzed the phonetic variants from standard American English in the speech of two groups of nonstandard English speaking children. The study used samples of free speech and performance on the Gold-Fristoe Test of Articulation…
Risk and protective factors associated with speech and language impairment in a nationally representative sample of 4- to 5-year-old children.

PubMed

Harrison, Linda J; McLeod, Sharynne

2010-04-01

To determine risk and protective factors for speech and language impairment in early childhood. Data are presented for a nationally representative sample of 4,983 children participating in the Longitudinal Study of Australian Children (described in McLeod & Harrison, 2009). Thirty-one child, parent, family, and community factors previously reported as being predictors of speech and language impairment were tested as predictors of (a) parent-rated expressive speech/language concern and (b) receptive language concern, (c) use of speech-language pathology services, and (d) low receptive vocabulary. Bivariate logistic regression analyses confirmed 29 of the identified factors. However, when tested concurrently with other predictors in multivariate analyses, only 19 remained significant: 9 for 2-4 outcomes and 10 for 1 outcome. Consistent risk factors were being male, having ongoing hearing problems, and having a more reactive temperament. Protective factors were having a more persistent and sociable temperament and higher levels of maternal well-being. Results differed by outcome for having an older sibling, parents speaking a language other than English, and parental support for children's learning at home. Identification of children requiring speech and language assessment requires consideration of the context of family life as well as biological and psychosocial factors intrinsic to the child.
Differential effects of speech situations on mothers' and fathers' infant-directed and dog-directed speech: An acoustic analysis.

PubMed

Gergely, Anna; Faragó, Tamás; Galambos, Ágoston; Topál, József

2017-10-23

There is growing evidence that dog-directed and infant-directed speech have similar acoustic characteristics, like high overall pitch, wide pitch range, and attention-getting devices. However, it is still unclear whether dog- and infant-directed speech have gender or context-dependent acoustic features. In the present study, we collected comparable infant-, dog-, and adult directed speech samples (IDS, DDS, and ADS) in four different speech situations (Storytelling, Task solving, Teaching, and Fixed sentences situations); we obtained the samples from parents whose infants were younger than 30 months of age and also had pet dog at home. We found that ADS was different from IDS and DDS, independently of the speakers' gender and the given situation. Higher overall pitch in DDS than in IDS during free situations was also found. Our results show that both parents hyperarticulate their vowels when talking to children but not when addressing dogs: this result is consistent with the goal of hyperspeech in language tutoring. Mothers, however, exaggerate their vowels for their infants under 18 months more than fathers do. Our findings suggest that IDS and DDS have context-dependent features and support the notion that people adapt their prosodic features to the acoustic preferences and emotional needs of their audience.
Gender typicality in children's speech: A comparison of boys with and without gender identity disorder.

PubMed

Munson, Benjamin; Crocker, Laura; Pierrehumbert, Janet B; Owen-Anderson, Allison; Zucker, Kenneth J

2015-04-01

This study examined whether boys with gender identity disorder (GID) produced less prototypically male speech than control boys without GID, a possibility that has been suggested by clinical observations. Two groups of listeners participated in tasks where they rated the gender typicality of single words (group 1) or sentences (group 2) produced by 15 5-13 year old boys with GID and 15 age-matched boys without GID. Detailed acoustic analyses of the stimuli were also conducted. Boys with GID were rated as less boy-like than boys without GID. In the experiment using sentence stimuli, these group differences were larger than in the experiment using single-word stimuli. Listeners' ratings were predicted by a variety of acoustic parameters, including ones that differ between the two groups and ones that are stereotypically associated with adult men's and women's speech. Future research should examine how these variants are acquired.
Cohesion and Joint Speech: Right Hemisphere Contributions to Synchronized Vocal Production

PubMed Central

Jasmin, Kyle M.; McGettigan, Carolyn; Agnew, Zarinah K.; Lavan, Nadine; Josephs, Oliver; Cummins, Fred

2016-01-01

Synchronized behavior (chanting, singing, praying, dancing) is found in all human cultures and is central to religious, military, and political activities, which require people to act collaboratively and cohesively; however, we know little about the neural underpinnings of many kinds of synchronous behavior (e.g., vocal behavior) or its role in establishing and maintaining group cohesion. In the present study, we measured neural activity using fMRI while participants spoke simultaneously with another person. We manipulated whether the couple spoke the same sentence (allowing synchrony) or different sentences (preventing synchrony), and also whether the voice the participant heard was “live” (allowing rich reciprocal interaction) or prerecorded (with no such mutual influence). Synchronous speech was associated with increased activity in posterior and anterior auditory fields. When, and only when, participants spoke with a partner who was both synchronous and “live,” we observed a lack of the suppression of auditory cortex, which is commonly seen as a neural correlate of speech production. Instead, auditory cortex responded as though it were processing another talker's speech. Our results suggest that detecting synchrony leads to a change in the perceptual consequences of one's own actions: they are processed as though they were other-, rather than self-produced. This may contribute to our understanding of synchronized behavior as a group-bonding tool. SIGNIFICANCE STATEMENT Synchronized human behavior, such as chanting, dancing, and singing, are cultural universals with functional significance: these activities increase group cohesion and cause participants to like each other and behave more prosocially toward each other. Here we use fMRI brain imaging to investigate the neural basis of one common form of cohesive synchronized behavior: joint speaking (e.g., the synchronous speech seen in chants, prayers, pledges). Results showed that joint speech recruits additional right hemisphere regions outside the classic speech production network. Additionally, we found that a neural marker of self-produced speech, suppression of sensory cortices, did not occur during joint synchronized speech, suggesting that joint synchronized behavior may alter self-other distinctions in sensory processing. PMID:27122026
Cohesion and Joint Speech: Right Hemisphere Contributions to Synchronized Vocal Production.

PubMed

Jasmin, Kyle M; McGettigan, Carolyn; Agnew, Zarinah K; Lavan, Nadine; Josephs, Oliver; Cummins, Fred; Scott, Sophie K

2016-04-27

Synchronized behavior (chanting, singing, praying, dancing) is found in all human cultures and is central to religious, military, and political activities, which require people to act collaboratively and cohesively; however, we know little about the neural underpinnings of many kinds of synchronous behavior (e.g., vocal behavior) or its role in establishing and maintaining group cohesion. In the present study, we measured neural activity using fMRI while participants spoke simultaneously with another person. We manipulated whether the couple spoke the same sentence (allowing synchrony) or different sentences (preventing synchrony), and also whether the voice the participant heard was "live" (allowing rich reciprocal interaction) or prerecorded (with no such mutual influence). Synchronous speech was associated with increased activity in posterior and anterior auditory fields. When, and only when, participants spoke with a partner who was both synchronous and "live," we observed a lack of the suppression of auditory cortex, which is commonly seen as a neural correlate of speech production. Instead, auditory cortex responded as though it were processing another talker's speech. Our results suggest that detecting synchrony leads to a change in the perceptual consequences of one's own actions: they are processed as though they were other-, rather than self-produced. This may contribute to our understanding of synchronized behavior as a group-bonding tool. Synchronized human behavior, such as chanting, dancing, and singing, are cultural universals with functional significance: these activities increase group cohesion and cause participants to like each other and behave more prosocially toward each other. Here we use fMRI brain imaging to investigate the neural basis of one common form of cohesive synchronized behavior: joint speaking (e.g., the synchronous speech seen in chants, prayers, pledges). Results showed that joint speech recruits additional right hemisphere regions outside the classic speech production network. Additionally, we found that a neural marker of self-produced speech, suppression of sensory cortices, did not occur during joint synchronized speech, suggesting that joint synchronized behavior may alter self-other distinctions in sensory processing. Copyright © 2016 Jasmin et al.
An account of the Speech-to-Song Illusion using Node Structure Theory.

PubMed

Castro, Nichol; Mendoza, Joshua M; Tampke, Elizabeth C; Vitevitch, Michael S

2018-01-01

In the Speech-to-Song Illusion, repetition of a spoken phrase results in it being perceived as if it were sung. Although a number of previous studies have examined which characteristics of the stimulus will produce the illusion, there is, until now, no description of the cognitive mechanism that underlies the illusion. We suggest that the processes found in Node Structure Theory that are used to explain normal language processing as well as other auditory illusions might also account for the Speech-to-Song Illusion. In six experiments we tested whether the satiation of lexical nodes, but continued priming of syllable nodes may lead to the Speech-to-Song Illusion. The results of these experiments provide evidence for the role of priming, activation, and satiation as described in Node Structure Theory as an explanation of the Speech-to-Song Illusion.
The feasibility of miniaturizing the versatile portable speech prosthesis: A market survey of commercial products

NASA Technical Reports Server (NTRS)

Walklet, T.

1981-01-01

The feasibility of a miniature versatile portable speech prosthesis (VPSP) was analyzed and information on its potential users and on other similar devices was collected. The VPSP is a device that incorporates speech synthesis technology. The objective is to provide sufficient information to decide whether there is valuable technology to contribute to the miniaturization of the VPSP. The needs of potential users are identified, the development status of technologies similar or related to those used in the VPSP are evaluated. The VPSP, a computer based speech synthesis system fits on a wheelchair. The purpose was to produce a device that provides communication assistance in educational, vocational, and social situations to speech impaired individuals. It is expected that the VPSP can be a valuable aid for persons who are also motor impaired, which explains the placement of the system on a wheelchair.
The effect of botulinum toxin A (Botox) injections used to treat limb spasticity on speech patterns in children with dysarthria and cerebral palsy: A report of two cases.

PubMed

Workinger, Marilyn Seif; Kent, Raymond D; Meilahn, Jill R

2017-05-19

Botulinum toxin A (Btx-A) injections are used to treat limb spasticity in children with cerebral palsy (CP) resulting in improved gross and fine motor control. This treatment has also been reported to have additional functional effects, but the effect of treatment on speech has not been reported. This report presents results of longitudinal speech evaluation of two children with CP given injections of Btx-A for treatment of limb spasticity. Speech evaluations were accomplished at baseline (date of injections) and 4- and 10-weeks post-injections. Improvements in production of consonants, loudness control, and syllables produced per breath were found. Parental survey also suggested improvements in subjects' speech production and willingness to speak outside the testing situation. Future larger studies are warranted to assess the nature of the changes observed related to Btx-A.
Integrating hidden Markov model and PRAAT: a toolbox for robust automatic speech transcription

NASA Astrophysics Data System (ADS)

Kabir, A.; Barker, J.; Giurgiu, M.

2010-09-01

An automatic time-aligned phone transcription toolbox of English speech corpora has been developed. Especially the toolbox would be very useful to generate robust automatic transcription and able to produce phone level transcription using speaker independent models as well as speaker dependent models without manual intervention. The system is based on standard Hidden Markov Models (HMM) approach and it was successfully experimented over a large audiovisual speech corpus namely GRID corpus. One of the most powerful features of the toolbox is the increased flexibility in speech processing where the speech community would be able to import the automatic transcription generated by HMM Toolkit (HTK) into a popular transcription software, PRAAT, and vice-versa. The toolbox has been evaluated through statistical analysis on GRID data which shows that automatic transcription deviates by an average of 20 ms with respect to manual transcription.
Effortful echolalia.

PubMed

Hadano, K; Nakamura, H; Hamanaka, T

1998-02-01

We report three cases of effortful echolalia in patients with cerebral infarction. The clinical picture of speech disturbance is associated with Type 1 Transcortical Motor Aphasia (TCMA, Goldstein, 1915). The patients always spoke nonfluently with loss of speech initiative, dysarthria, dysprosody, agrammatism, and increased effort and were unable to repeat sentences longer than those containing four or six words. In conversation, they first repeated a few words spoken to them, and then produced self initiated speech. The initial repetition as well as the subsequent self initiated speech, which were realized equally laboriously, can be regarded as mitigated echolalia (Pick, 1924). They were always aware of their own echolalia and tried to control it without effect. These cases demonstrate that neither the ability to repeat nor fluent speech are always necessary for echolalia. The possibility that a lesion in the left medial frontal lobe, including the supplementary motor area, plays an important role in effortful echolalia is discussed.
Longitudinal follow-up to evaluate speech disorders in early-treated patients with infantile-onset Pompe disease.

PubMed

Zeng, Yin-Ting; Hwu, Wuh-Liang; Torng, Pao-Chuan; Lee, Ni-Chung; Shieh, Jeng-Yi; Lu, Lu; Chien, Yin-Hsiu

2017-05-01

Patients with infantile-onset Pompe disease (IOPD) can be treated by recombinant human acid alpha glucosidase (rhGAA) replacement beginning at birth with excellent survival rates, but they still commonly present with speech disorders. This study investigated the progress of speech disorders in these early-treated patients and ascertained the relationship with treatments. Speech disorders, including hypernasal resonance, articulation disorders, and speech intelligibility, were scored by speech-language pathologists using auditory perception in seven early-treated patients over a period of 6 years. Statistical analysis of the first and last evaluations of the patients was performed with the Wilcoxon signed-rank test. A total of 29 speech samples were analyzed. All the patients suffered from hypernasality, articulation disorder, and impairment in speech intelligibility at the age of 3 years. The conditions were stable, and 2 patients developed normal or near normal speech during follow-up. Speech therapy and a high dose of rhGAA appeared to improve articulation in 6 of the 7 patients (86%, p = 0.028) by decreasing the omission of consonants, which consequently increased speech intelligibility (p = 0.041). Severity of hypernasality greatly reduced only in 2 patients (29%, p = 0.131). Speech disorders were common even in early and successfully treated patients with IOPD; however, aggressive speech therapy and high-dose rhGAA could improve their speech disorders. Copyright © 2016 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.
Developing a weighted measure of speech sound accuracy.

PubMed

Preston, Jonathan L; Ramsdell, Heather L; Oller, D Kimbrough; Edwards, Mary Louise; Tobin, Stephen J

2011-02-01

To develop a system for numerically quantifying a speaker's phonetic accuracy through transcription-based measures. With a focus on normal and disordered speech in children, the authors describe a system for differentially weighting speech sound errors on the basis of various levels of phonetic accuracy using a Weighted Speech Sound Accuracy (WSSA) score. The authors then evaluate the reliability and validity of this measure. Phonetic transcriptions were analyzed from several samples of child speech, including preschoolers and young adolescents with and without speech sound disorders and typically developing toddlers. The new measure of phonetic accuracy was validated against existing measures, was used to discriminate typical and disordered speech production, and was evaluated to examine sensitivity to changes in phonetic accuracy over time. Reliability between transcribers and consistency of scores among different word sets and testing points are compared. Initial psychometric data indicate that WSSA scores correlate with other measures of phonetic accuracy as well as listeners' judgments of the severity of a child's speech disorder. The measure separates children with and without speech sound disorders and captures growth in phonetic accuracy in toddlers' speech over time. The measure correlates highly across transcribers, word lists, and testing points. Results provide preliminary support for the WSSA as a valid and reliable measure of phonetic accuracy in children's speech.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.